# Friday, June 04, 2010

If so then my team would like to know as we are looking for another business intelligence consultant.

Full details on our careers web site: SQL / BI Consultant (722749).

You can either post your details directly on the site or send me your CV at [jsnape at microsoft.com] and I’ll enter it for you (disclosure: if you send it via me I will get a bounty when you are hired).

Good luck.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, June 04, 2010 12:18:29 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, June 30, 2009

ssisdataflowsample Whilst doing some design work today for a customer project I realised there are a set of principals I try and adhere to when creating SQL Server Integration Services packages. The list is no doubt incomplete but this is what I have so far.

Minimise IO

This is a general data processing principal. Usually disk and, to a lesser extent, network performance determine the overall processing speed. Reducing the amount of IO in a solution will therefore increase performance.

Solutions that consist of multiple read-process-write steps should be redesigned into a single read-process-process-process-write step.

Prefer Sequential IO to Random IO

Disks perform at their best when sequentially reading or writing large chunks of data. Random IO (and poor performance) manifests when procedural style programming occurs - signs to look out for are SQL statements modifying/returning only few rows but being executed repeatedly.

Watch out for hidden random IO - for example, if you are reading from one table and writing to another in a sequential manor then disk access will still be random if both tables are stored on the same spindles.

Avoid data flow components that pool data

Data flow components work on batches of data called buffers. In most instances buffers are modified in place and passed down stream. Some components, such as "Sort" cannot process data like this and effectively hang on to buffers until the entire data stream is in memory (or spooled to disk in low memory situations). This increased memory pressure will affect performance.

Sometimes SQL is the better solution

Whilst the SSIS data flow has lots of useful and flexible components, it is sometimes more efficient to perform the equivalent processing in a SQL batch. SQL Server is extremely good at sorting, grouping and data manipulation (insert, update, delete) so it is unlikely you will match it for raw performance on a single read-process-write step.

SSIS does not handle hierarchical data well

Integration Services is a tabular data processing system. Buffers are tabular and the components and associated APIs are tabular. Consequently it is difficult to process hierarchical data such as the contents of an XML document. There is an XML source component but it's output is a collection of tabular data streams that need to joined to make sense.

Execute SSIS close to where you wish to write your data

Reading data is relatively easy and possible from a wide variety of locations. Writing data, on the other hand, can involve complex locking and other issues which are difficult to optimise on a network protocol. In particular when writing data to a local SQL Server instance, SSIS automatically used the Shared Memory transport for direct inter-process transfer.

Don't mess with the data flow metadata at runtime

It's very difficult to do this anyway but worth mentioning that SSIS gets it's stellar performance from being able to setup a data flow at runtime safe in the knowledge that buffers are of a fixed format and component dependencies will not change.

The only time this is acceptable is when you need to build a custom data flow programmatically. You should use the SSIS API's and not attempt to write the package XML directly.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, June 30, 2009 7:23:10 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, April 07, 2009

This is the second part in a series of data warehouse presentations I’ve been giving. This one concentrates on the physical design of the underlying SQL Server database, some information on the SQL Server Fast Track Data Warehouse and finally a one slide guide to tuning the storage engine performance. The tuning slide is only really a first step and I plan a more in depth session some time in the future.

Also I apologize, this deck is a little word heavy. I prefer more graphics in a presentation but there is a lot of specific guidance to list here.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, April 07, 2009 12:02:55 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, February 24, 2009

One of the big problems I have with some new customers is the knowledge gap between where they are and where they need to be in order to be successful. Given that people can read much faster than they can receive presented information it makes sense to have a reading list.

I’ve blogged previously some Analysis Services resources but new and better content is appearing all the time. In addition to that article here are my current reading recommendations for anyone planning a BI or Analysis Services project. Know this lot backwards and you should have a good head start.

Everyone

SQL Performance Tuning with Waits and Queues

Analysis Services Performance Guide

Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services

The Data Loading Performance Guide

Report Server Catalog Best Practices

Reporting Services Performance Optimizations

New Data Warehouse Scalability Features in SQL Server 2008

Scaling Up Your Data Warehouse with SQL Server 2008

Architects and Developers

OLAP Design Best Practices for Analysis Services 2005

Best Practices for Data Warehousing with SQL Server 2008

Analysis Services Processing Best Practices

Top 10 SQL Server Integration Services Best Practices

Top 10 Best Practices for Building a Large Scale Relational Data Warehouse

Operations and Support

Storage Top 10 Best Practices

SQL Server Best Practices

IIS Performance Tuning

Resolving Common Connectivity Issues in SQL Server 2005 Analysis Services Connectivity Scenarios

In fact pretty much anything on the sqlcat website is pure gold.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, February 24, 2009 10:15:39 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Monday, December 22, 2008

There are no hard and fast rules but the goal is to reduce the time taken to extract data from a source system and reduce the amount of work you have to do with the extracted data. The numbers quoted here are the ones I use as a starting point but you need to measure to determine the best values.

Don't do an incremental extract if:

  • There isn't much data in the source table (less than 100k rows)
  • There is enough change in the source table to require that you read most of it each time (for example if more than half the rows change between extracts)
  • The data in the source table is used for periodic snapshots (for example a balance sheet) and you need to track how a table changes at particular points in time

Do an incremental extract if:

  • There is a lot of data in the source table
  • Rows are only ever added to the source table (i.e. rows are not updated)
  • You need to track each and every change to a source row
  • The source data is updated a number of times before being closed and once closed is never updated again (also known as an accumulating snapshot)

In general dimension tables match the first set of rules and are not extracted incrementally where as fact tables normally match the second set of rules.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, December 22, 2008 6:51:04 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Wednesday, August 01, 2007

TechReady 5 It's been a few weeks since my last post and I've been busy with customers and Outlook add-ins. Last week, however, I was at Microsoft's internal conference. It's like a TechEd proving ground where presenters get to try out their talks on an internal audience. There is some amazing stuff coming down the line in Windows Server, Visual Studio and SQL Server. I'm really pleased about the data warehousing enhancements across the SQL Server product line.

Way back in 2004 I went to TechEd and blogged a series of articles about new features that would arrive in SQL Server 2005. I plan to do a similar series for SQL Server 2008 but since a lot of stuff we saw last week is still internal I'll only blog about bits in publicly released CTP's.

duck Finally, the Popfly mashup site is now allowing invitations so the first five people to leave a comment telling me what you plan to do with your mashup gets an invite. You need to leave me your e-mail address as well.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Wednesday, August 01, 2007 1:33:24 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Monday, February 05, 2007

OK, I need your help... I'm giving a talk at the next SQL Community meeting about SQL Server development in an ISV. In particular the challenge of creating databases that can be installed many times and have to be supported, patched and upgraded. In contrast, enterprise SQL development is relatively straightforward as there is one master schema and code - the live server. All you need to do is ensure you sync with the master before making any changes.

What I would like is any experiences, hindsight, methods and processes you have related to this. Either leave a comment or e-mail me directly using the link at the top of the site. Some of the things I want to cover are:

  • Development, source control and build
  • Visual Studio Team Edition for Database Professionals
  • Installations
  • Patching, fixes and upgrades
  • Analysis Services, Integration Services, Reporting Services
  • End user schema/cube customisation
  • Performance tuning

If I get enough input I'll post the talk online.

 

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, February 05, 2007 11:32:44 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Friday, October 20, 2006

Time to clean up all the shortcuts to interesting web pages sitting around on my desktop... In no particular order:

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, October 20, 2006 3:23:20 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, July 27, 2006
This is the third update since release and contains both new and updated material. See "New and Updated Books Online Topics" for details. Get it the update here.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, July 27, 2006 5:10:07 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Monday, July 17, 2006

Some links to keep:

The last one should be of particular interest to anyone doing multi-tenanted systems (Doug?).

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, July 17, 2006 3:16:24 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, June 17, 2006

It seems a number of people are having problems installing this service pack. There are a couple of solutions (here, here and here) that may work. There is also anecdotal evidence that exctacting the files from the SP executable using WinRAR andf running HotFix.exe manually may work.

Alternatively, I have found that the following works for me:

  1. Download the service pack and start the installation by double-clicking the downloaded executable.
  2. Once product enumeration is complete click Next at the Welcome page.
  3. At the End User License Agreement page, check the box to accept the terms and conditions before clicking Next.
  4. Select All Instances, Windows Authentication at the Authentication Mode page and click Next.
  5. Click Install to begin the installation.
  6. During the install you will be told about pending file operations. Click Yes.
    Pendingfileops
  7. A further message appears complaining about processes locking files. At this point, stop the SQL Server Full Text Search service and click Try Again. You may also need to click Try Again additional times depending on what SQL Server processes happen to be running.
    Lockedfiles
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, June 17, 2006 12:01:48 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Monday, January 16, 2006
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, January 16, 2006 11:03:13 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Sunday, September 25, 2005

If you are having problems installing the CTP with a complaint about "beta components detected", first try the Build Uninstall Wizard located on the tools CD. However if that fails to do anything then try installing the .NET framework directly (it's located in the redist\2.0 folder on the server CD) as the error message is much more explicit and should tell you what components you have forgotten to remove.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, September 25, 2005 10:39:46 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, June 21, 2005
Back at the beginning of the year I talked about optimising user/group hierarchies in SQL but missed off one important part of the puzzle - how to delete the memberships. Well Vikram has put some thought towards the problem and come up with a solution.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, June 21, 2005 5:18:12 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, June 04, 2005

It's been a while since I posted last due to our current work schedule. It's hard work preparing a new product version when that release is due to go to a customer before it's GA'd.

Anyway, in this post about "Uniqueidentifiers as primary key columns" I talked about some of the benefits to ditching the traditional IDENTITY columns. I did point out that their use in data warehouse applications is questionable as key sizes become important when dealing with huge amounts of data. Herein lies the problem - we have an application that uses uniqueidentifiers wherever possible but dimension tables still use integer keys. No problem so far but we have recently tried to implement parent-child dimensions within Analysis Services and our folder structure is using uniqueidentifiers as keys.

As you may know, Analysis Server uses a simple arrangement for parent-child dimensions where each member has a key column and a parent key column as per the adjacency list model. To implement this, lets assume two tables Office and Folder where Office is a dimension and Folder is a hierarchy into which we place each office. The two tables look like:

create table Folder (
    folder_id int not null primary key,
    parent_id int not null,
    name nvarchar(50) not null
)

create table Office (
    office_id int not null primary key,
    folder_id int not null,
    name nvarchar(50) not null
)

There is a foreign key between folder_id in the two tables to tie them together. Note, I've used integer primary keys here so creating a parent-child dimension is easy as follows:

create view OfficeTree
as
    select folder_id + (select max(office_id) from Office) as item_id,
           parent_id + (select max(office_id) from Office) as parent_id,
           name
    from Folder
    union all
    select office_id,
           folder_id + (select max(office_id) from Office), 
           name
    from Office

Note, that the Folder keys must be offset to ensure they don't clash with the Office keys. Now if we try and change the folder_id and parent_id fields in Folder to uniqueidentifiers we end up with a problem - the data types are not compatible when creating the view OfficeTree and you get the message "Operand type clash: uniqueidentifier is incompatible with int".

The solution to this problem is of course to either go back to using ints somehow. I was about ready to remove the uniqueidentifiers when Koan Bremner came up with the following solution. The trick is to realize that the references between the two tables are only required when processing the dimension and Analysis Server doesn't care if the IDs change between cycles just as long as the keys joined to fact tables remain constant. So we still need to add an integer key column to the Folder table as below:

create table Folder (
    folder_id uniqueidentifier not null primary key,
    parent_id uniqueidentifier not null,
    alt_id int null,
    name nvarchar(50) not null
)

We allow the column to be null because the keys are only required before we process the dimension and allowing NULLs ensures easy inserts in a replicated environment. To update the column before processing requires a little magic (again from Koan).

declare @counter int
select @counter = isnull(max(alt_id), 0)
from Folder

update Folder
set @counter = alt_id = @counter + 1 -- Ed: Typo fixed (see comments).
where alt_id is null
option (maxdop 1)

The maxdop option ensures a sequential update to the column which forces an incrementing value. Now we can rewrite the view to use the new field. Note that we have to join back to Folder in order to resolve the parent_id field.

create view OfficeTree
as
    select f1.alt_id + (select max(office_id) from Office) as item_id,
           f2.alt_id + (select max(office_id) from Office) as parent_id,
           f1.name
    from Folder f1
    left join Folder f2 on f2.folder_id = f1.parent_id
    union all
    select o1.office_id, 
           f2.alt_id + (select max(office_id) from Office), 
           o1.name
    from Office o1
    left join Folder f2 on f2.folder_id = o1.folder_id
 

 

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, June 04, 2005 1:37:31 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback
# Saturday, April 23, 2005
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, April 23, 2005 1:12:10 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

I removed the old Betas of Visual Studio and Yukon from my home machine this morning and to check everything was still OK I fired up Query Analyser to fire a quick query – “SELECT TOP 5 * FROM sysobjects”. This failed with a syntax error complaining about the TOP 5 part of the clause.

I was about to bite the bullet and reinstall SQL 2000 as well but noticed the database I ran the query in was an old one. A quick switch to the master database and the same query works fine. It turns out the previous database had a compatibility level of 65 so none of the new syntax will work.

One thing did get hosed on the uninstall – SQLDMO which is crucial for Enterprise Manger to run. To fix it execute the following:

regsvr32.exe “C:\Program Files\Microsoft SQL Server\80\Tools\Binn\sqldmo.dll”

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, April 23, 2005 12:58:21 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, January 16, 2005

Just a quick reminder that there is no point manually checking database constraints before modifying your data. Take this code for example:

create table Users (
    UserID uniqueidentifier
        primary key not null default newid(),

    LoginName nvarchar(50) not null unique
)

if not exists (
    select * from Users
    where LoginName = 'JohnSmith'
)
begin
    insert into Users (LoginName)
    values ('JohnSmith')
end

There is a race condition between the if not exists check and the actual insertion of data. Another connection could insert the same data before this insert gets a chance to complete. We could wrap the check and insert in a transaction to ensure consistency as follows (note the isolation level because we need to place a range lock on data that could be there):

begin tran
set transaction isolation level repeatable read
if not exists (
    select * from Users
    where LoginName = 'JohnSmith'
)
begin
    insert into Users (LoginName)
    values ('JohnSmith')
end
commit

This works but we’ve got the added expense of a transaction. There is no point in doing this though as there is already a unique constraint on the column and databases are very good at checking constraints. With this in mind, the new and improved version looks like:

insert into Users (LoginName) 
values ('JohnSmith')

Much simpler. There is only one minor problem, for which I don’t have an answer, and that is if you have multiple constraints on a table then it’s very difficult to determine which one failed. The @@error variable will be set to 547 (constraint violation) and a suitable raiserror is generated which could be parsed by the caller but not easily.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, January 16, 2005 10:20:35 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Saturday, January 15, 2005

This one can be expanded to selling anyone on an idea:

The usual set of MSDN articles:

Some notable downloads:

Finally, a couple of excellent articles by Michele Leroux Bustamante (dasblonde)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, January 15, 2005 10:48:09 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback

In the role I currently do for Exony, it’s a rare day when I get to do some good old fashioned problem solving in code. Fortunately, today was one of those days. The problem I’ve been working at is how to efficiently model user and groups in SQL where a user can be a member of multiple groups and groups can contain both users and other groups. I needed to get a list of all groups a user is either directly or indirectly a member of. In procedural languages this problem is easy with recursion. In SQL it’s a real pain because you need to use a set based solution for optimum performance – cursors are out.

I started with a semi-normalised schema modelling the users, groups and group memberships as show below:

UserGroupMember1

It’s de-normalised because as far as security goes I want to treat users and groups the same so I’m storing both types in the Users table, differentiated by UserType. Unfortunately, the only way to navigate this structure is with procedural code – not good, time for a different approach.

If you draw a sample set of users and groups on a piece of paper with lines representing membership it looks like a forest of trees so I started wondering if I should model the members table in a similar manor. There are a number of well known idioms for storing trees in SQL. The most widely known is Joe Celko’s nested sets model but I prefer a path model because inserts and sub-tree moves are easier (If you fancy finding out how little you know about SQL and why the adjacency list solution is completely wrong then just ask Joe the question in news://microsoft.public.sqlserver.programming). I added a new table for this because the normalised members table is great for storing the essential data and I believed the tree should be a pre-calculated optimisation.

Unfortunately, as soon as I coded this up I ran into a problem – lack of brain power. One of the goals I set was I must be able to recreate the tree from the members table. I’m sure it’s possible but I spent far too much time staring at the ceiling trying to come up with an answer. One thing did stick though – why not maintain a  pre-calculated list of all parent-child relationships in a format similar to the existing members table. As the data is the same as the existing members table, I just added a flag to indicate the row is calculated as shown below:

UserGroupMember2

So the trick now is to fill this table with every possible combination of user and group, all the way up each hierarchy. The SQL actually took a couple of goes until the light went on and I realised I needed to be selecting not parent-child relationships (I already have them in the table as non-calculated rows) but grandparent-child relationships. A single pass will not get all the rows required – you need to do it until no more rows are added. Time for some code:

declare @rowcount int
set @rowcount = 1

while @rowcount > 0
begin
    insert into dbo.Members (ParentID, ChildID, Calculated)
    select distinct p.UserID, c.UserID, 1
    from dbo.Members pm
    join dbo.Users p ON p.UserID = pm.ParentID
    join dbo.Members cm ON cm.ParentID = pm.ChildID
    join dbo.Users c ON c.UserID = cm.ChildID
    where not exists (
        select *
        from dbo.Members
        where ParentID = p.UserID
        and ChildID = c.UserID
    )

    select @rowcount = @@rowcount
end

From here, selecting the complete set of groups any user is a member of is easy:

declare @userid uniqueidentifier

select @userid = UserID
from dbo.Users
where Name = 'Administrator'

select ParentID as UserID
from dbo.Members
where ChildID = @userid
union
select @userid

Just add an index to Members.ChildID and everything is ready to go. All I need to do now is implement the rest of the security system…

[Edit: Because I wrote this late last night after a couple of glasses of wine I had a panic this morning thinking there was a flaw in my thinking so I coded up a test script. It turns out that everything works, but you have to watch for loops in the Members table.]

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, January 15, 2005 12:44:14 AM (GMT Standard Time, UTC+00:00)  #    Comments [1] Trackback
# Monday, January 10, 2005

I’ve been a bit lazy with blogging recently so I wanted to talk about a question that came up in the office today. The question was about the utility and performance of unique identifier columns as primary keys.

A unique identifier or GUID is a 16 byte data type; four times the size of an integer normally used for primary keys. This means that any index built using a GUID will be larger and contain less records per page. I managed to find this thread on p2p.wrox.com which seems to indicate there are some special index optimisations (cascading GUID blocks) in use for GUIDs. This is unconfirmed though so maybe someone from the SQL Server team can provide a definitive answer.

The size of the datatype can add up when used in data warehouse fact tables because each fact may have a large number of dimensions. For this reason I would probably stick to integer dimension keys.

GUIDs, by their very nature, are non-sequential which completely screws clustered indexes. There is a new sequential generation algorithm in Yukon which solves the issue although I can’t think of a reason why you would use a clustered index on a GUID because they are better suited for range queries. At Exony we hardly ever create a clustered index on the primary key as it’s normally reserved for datetime or date key columns.

The above seems quite negative but I think there are some good points to remember. There are some definite ease of use considerations with GUIDs when compared to IDENTITY columns:

  • Identities need to be reseeded in replicated systems to ensure inserts don’t clash
  • For GUID columns, you can either supply a value or rely on a default safe in the knowledge you won’t clash values
  • GUIDs can be generated at build time and embedded in SQL or load files simplifying installation code
  • GUIDs can be generated on remote systems or in higher system layers without issue

To end I think the general ease of GUID use outweighs any performance degradation in normal operations, although probably not in data warehouses. I see that the SQL Server team use GUIDs for primary keys (see the Reporting Services database for details) so there can’t be too much wrong with them.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, January 10, 2005 9:57:21 PM (GMT Standard Time, UTC+00:00)  #    Comments [3] Trackback
# Saturday, October 30, 2004

Koan has written an excellent article about Microsoft PSS entitled "A Bug's Life (or, how to get a hotfix out of Microsoft Product Support Services)". It's long but well worth reading.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, October 30, 2004 3:02:46 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Wednesday, October 20, 2004
Microsoft has decided to delay the release of SQL Server 2005 from the first half of 2005 until later in the summer of that year. A Community Technical Preview will be released as an interim beta with possibly more to come before the final beta and the product’s eventual release.

(via The ServerSide.NET)

I guess my previous post was correct in it's assumptions. The SQL Server delay has allowed the Visual Studio team time to add C# Edit & Continue.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Wednesday, October 20, 2004 10:05:34 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback
# Friday, October 08, 2004

So I haven't had time to blog much. There are, however, a number of links on my desktop that I've collected and meant to take a look at. So I can tidy up I'll list them here instead.

Plus some useful articles on MSDN:

(*) The title is based on a quote by Jesse from the Fast Show. Spoken in a West Country accent.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, October 08, 2004 5:12:28 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, October 02, 2004

Good news for those like me who lamented the demise of the database diagramming tool in Visual Studio. It's going to make a return in the near future. We just need to get it added back into the SQL Server Management/BI Studio.

Due to popular demand, starting in the October CTP you will be able to create database diagrams using Whidbey with Yukon databases. There are a few changes that have been made to the Whidbey database diagramming capability:

  • Diagrams are now assigned a database user as an owner. Individual database users can only see their own diagrams unless they’re a member of the db_owner role. Members of the db_owner role can see any diagram in the database.
  • Only SQL Server Yukon is supported for Whidbey diagramming.
  • Many minor UI enhancements, additional shortcut keys, and better IAccessibility support.

(via The Return of Database Diagramming!)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, October 02, 2004 10:58:08 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, September 30, 2004

Just a quick one before I go to bed... From SQL Pass in Orlando, Microsoft have announced three report packs for SQL Server Reporting Services. I downloaded and tried them out today and to tell the truth I'm a little disappointed. Each pack does contain a database you can attach to SQL Server which makes them immediately accessible but that's all I found useful. What I really need from the various add-ons that Microsoft make available is both inspiration and examples of things I don't know how to do. I got neither from these packs.

Understandably, I'm not the target audience for the downloads. They are for people who need a pre-built solution they can tweak. I expect the packs fulfil this need well. If you're looking for novel Reporting Services solutions then you can overlook these downloads.

The thing I'm really exited about is the additional news that Active Views will be included in Yukon Beta 3. I can't wait to get my hands on it. In fact I was on the preview list but, due to work commitments, couldn't make it to Redmond for the design review earlier this month. Now I have to wait like everyone else.

Every month we have to deliver lots of reports to customers and it's a time consuming task. Reporting Services Report Builder, as it's now known, could be of use to me right now (bugs and all) because currently report writing is a skilled activity. The author must know SQL and MDX but with this new tool I can get anyone in the office to create the basic reports and have them completed by the old authors.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, September 30, 2004 10:08:07 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, September 26, 2004

For some reason I've found other things to blog about recently and have been ignoring Yukon. I was going to get around to writing about partitioned tables but there is no need now. Kimberly Tripp has released a fantastic paper on the subject:

While getting ready for the PASS Community Summit in Orlando this week (where I plan to talk about SQL Server 2005 Partitioned Tables/Indexes), I realized we were close but not quite ready to release the post-beta II whitepaper on MSDN.

So, in preparation for PASS we decided to post a pre-release of the partitioning paper here.

(via Kimberly L. Tripp: Improving your SQL skills)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, September 26, 2004 6:05:58 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, September 23, 2004

So you spend the odd evening and weekend playing with the brand new technologies emerging from Microsoft. Whether it's Yukon, Whidbey or Longhorn, there is plenty to mess around with. The more you play with these tools, the more you can dream of new ways to deliver exciting products in next to no time. Life is rosy.

However, we all need to pay the bills and that means going to work. Work is not about dreams, it's about delivering real solutions to customers now. Here is where the misery starts. You can't actually use these tech previews and betas for production code so you have to do things the old way. The old way is no fun. It's depressing and we are still at least a year away from a more pleasant life. It's not that the current tool set is bad, in fact it's great. Compared to the new versions, though, the difference is immense.

Personally, I think it's worse as I can't see any of our customers (apart from one very progressive one) upgrading to Yukon until it's been on the shelves for at least a year. In fact Cisco ICM is a product we depend on and their recent release has only just caught up with SQL Server 2000. Understandably there has to be a business benefit for them to upgrade. Maybe there is but at the moment all I see is developer benefit.

I talk primarily about Whidbey and Yukon as I'm not even going to look at Longhorn until it's a bit more tangible. Blog land is full of posts wondering if it's worth using WinForms for development. In fact Microsoft has recently been on the offensive trying to persuade customers that WinForms is not dead.

On a final note, I do actually like what Microsoft is doing with tech previews. It helps me align my architecture and development plans with theirs. It just makes going to work a bit of a drag in the meantime.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, September 23, 2004 10:48:55 PM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback
# Sunday, August 29, 2004

OUTPUT Clause

This one gives you access to the inserted and deleted tables in normal SQL so you can write statements like:

DECLARE @changes TABLE (
    ProjectId int,
    Name nvarchar(max),
    Description nvarchar(max)
);

update dbo.Projects
set Billable = 1
output
    inserted.ProjectId,
    inserted.Name,
    inserted.Description
into @changes
where Billable = 0

WRITE Clause

Allows more efficient updates of varchar(max), nvarchar(max) and varbinary(max) columns. For example:

declare @update_text nvarchar(max)
set @update_text = ' ... this is added to the end'

update dbo.Projects
set    Description.write(@update_text, null, len(@update_text))
where  Name = 'Timesheet'

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, August 29, 2004 7:05:47 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, August 28, 2004
You can sign up for beta-place access if you have found any bugs in SQL Server 2005. Details are at http://msdn.microsoft.com/sql/bugs/.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 28, 2004 11:06:24 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

Aaron Bertrand has started collecting content for a SQL Server 2005 FAQ. There is some useful stuff in there but needs more content so here you go Aaron.

Why do new query windows use the master database?

This behaviour is different from Query Analyzer and can often cause you to execute scripts in the wrong database. This is especially unproductive as there is no select multiple in Object Explorer to clean up the damage quickly.

In actual fact the old Query Analyzer behaviour is still there, just under a different button. You need to select New Query with Current Connection  which is just to the left of the database selection list in the SQL Editor tool bar. Alternatively, CTRL-Q (Standard Keyboard Scheme) or CTRL-N (SQL Server 2000 Scheme) will do the same.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 28, 2004 10:59:41 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, August 24, 2004

Your natural dimension primary key maybe too large to use in a data-mart with hundreds of millions of rows so you might need to create a smaller alternate key. Sometimes it's not always best to use IDENTITY columns.

Imagine the case where you are importing data from an external system. The usual method to add the correct foreign keys is to either create a lookup (in DTS) which can be very slow, or try and join the imported data to your dimension table and extract the dimension key (which is also pretty slow). I've used a date lookup here as an example (time of day is not used):

update dbo.ImportedData
set DateID = d.DateID
from dbo.ImportedData t join dbo.Dates d
    on d.Date = cast(floor(cast(t.Date as float)) as smalldatetime)

For dimensions like this it maybe much more efficient to create keys using a calculation performed on the common fields. For example, the number of days since a point in time:

update dbo.ImportedData
set DateID = datediff(day, '2000-01-01', Date)
from dbo.ImportedData

This avoids any sort of join and, in fact, could be performed on the data stream as it is loaded into the database. Also, if you use a smallint for the key then you get a 50% saving from the original smalldatetime type and are still able to use dates past 2175.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 24, 2004 5:36:44 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

OK, it crashes a bit because it's beta software. But at least it has this:

Auto Recovery Dialog

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 24, 2004 4:39:06 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

In order to optimise queries fully in previous versions of SQL Server you might have had to create a covering index by adding all the select and where columns to the index definition. This enabled an index to completely satisfy a query and avoid a bookmark lookup on the underlying table.

In Yukon there is a new keyword on the CREATE INDEX syntax - INCLUDE. This allows you to store additional non-key columns with the index but avoid the cost of them being in the actual non-leaf index structures. So for example:

create index IX_Dates_Date on dbo.Dates(Date)
include (
    Year, Month, Day,
    MonthName, DayOfWeekName, WeekOfYear
)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 24, 2004 1:54:09 PM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback
# Wednesday, August 18, 2004
There is an excellent article in this months MSDN Magazine entitled "Streamline Your Database Setup Process with a Custom Installer". We do something similar in our own product, but nowhere near as complete as this. It covers version checking, upgrades, patches, extended stored procedures and bulk loading of data.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Wednesday, August 18, 2004 1:12:20 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, August 17, 2004

Ralph Kimball has written an excellent series of articles about data warehouse theory and practise. They range from Data Warehouse Fundamentals through Data Warehouse Architecture and Modelling to Data Staging and Data Quality.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 17, 2004 11:40:11 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, August 14, 2004

I discovered another nice new feature today when creating computed columns, the PERSISTED keyword. In previous versions, to avoid the overhead of the calculation you could create an index to materialise the computed column. Now you can just persist the computed column and the result is stored with the table and updated as necessary. For example:

create table dbo.Tasks
(
    StartTime datetime NOT NULL,
    EndTime datetime NOT NULL,
    Duration  AS EndTime - StartTime PERSISTED
)

One other thing to note is that the PERSISTED keyword also helps you to create indexes on the column too. Remember that a computed column must be deterministic and precise in order to create an index on it. Well there are several ways in which that might not be verifiable in SQL Server 2005; for example with CLR functions. Persisting the column ensures its deterministic and hence allows index creation.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 14, 2004 11:31:46 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, August 12, 2004

If you are currently using the nested sets model for storing trees then it may be about time for a change. This model is great for queries but inserts and updates can touch nearly every row in the table. The more natural way of storing trees is via a ParentID known as the adjacency list model as shown below:

create table dbo.Projects

(

ProjectID int not null,

ParentID int null,

Name nvarchar(max) not null

)

The big drawback, however, is that you need to write recursive queries to get anything useful. Help is at hand though as SQL Server 2005 adds some features to automatically unwind the tree: Common Table Expressions or CTEs.

The basic structure of a CTE working with hierarchies is:

with <theCTE>

(

<anchor expression>

union all

<recursive expression>

)

select * from <theCTE>

Where the <anchor expression> determines where to start the recursion and the <recursive expression> does the work. Note that the <recursive expression> should join to the CTE to work properly, for example the following lists all descendants of the project defined by <rootID>:

with Tree (ParentID, ProjectID, Name, Level, Path)

as

(

select ParentID, ProjectID, Name,

0 as [Level], '/' + Name as [Path]

from dbo.Projects

where ParentID = <rootID>

union all

select p.ParentID, p.ProjectID, p.Name,

t.Level + 1, t.Path + '/' + p.Name

from dbo.Projects p

join Tree t on t.ProjectID = p.ParentID

)

select ParentID, ProjectID, Name, Level, Path

from Tree t

order by ParentID, Level, ProjectID

Note how the Level and Path are calculated by adding to the result from the previous iteration.

You can also recurse up the tree:

with Tree (ParentID, ProjectID, Name)

as

(

select ParentID, ProjectID, Name

from dbo.Projects

where ProjectID = <rootID>

union all

select p.ParentID, p.ProjectID, p.Name

from dbo.Projects p

join Tree t on t.ParentID = p.ProjectID

)

select distinct

ProjectID, ParentID, Name

from Tree t

order by ParentID, ProjectID

When you consider you can wrap these queries into table valued functions, there is a lot of scope for writing cohesive reusable SQL.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, August 12, 2004 2:37:48 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, August 10, 2004

It seems that you can't use columns of type uniqueidentifier as parent-child keys because the dialogs (with and without intellicube) only list integer candidates. I hope this gets fixed as its much easier to use uniqueidentifiers as primary keys (no need to mess with scope_identity).

Of course, the reason might be that no one would use a key so large in a fact table and that's why they are not listed...

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 10, 2004 4:31:05 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback

I've expanded the timesheet model a little from my prototype yesterday. Most of the changes are because I want to try specific SQL 2005 features:

  • The Comments field in TimesheetEntries is to see how the UDM handles non-numeric measures
  • The ReviewedByID field in Users is for trying Common Table Expressions (hierarchies)
  • I've added a Clients table to move away from a star schema
  • Teams and TeamMembers have been added to test many-to-many relationships in the UDM
  • Teams work on Projects to see how the UDM handles loops (this isn't great design BTW)

Also there a a couple of other new things because I want them for specific features in the solution:

  • Activities allow slicing by what you where doing (coding, testing etc)
  • FirstName and LastName make reports look a little better
  • WeekOfYear has been added because it's how timesheets are collected

The complete model so far is shown below. Note, I don't believe in adding everything you can think of into the schema. Invariably it never gets used and you only have to maintain it. This is one of the main XP tenets. It's a little more difficult with databases but still possible. One for a future post...

 Initial Timesheet Schema

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 10, 2004 12:02:53 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Monday, August 09, 2004

I've just spent the day tinkering with SQL 2005 to build a very quick prototype of my Timesheet Project. It only has four tables but was enough to enable me to create a SQL database, load some data with DTS and build an analysis cube. Here are some initial thoughts...

The lack of a database diagram tool in all three studios (Management, BI and Visual) is a pain in the neck as I usually start a database by creating a few tables on a diagram, adding some relationships and changing them around until I'm happy. It's quite an iterative process and very difficult to achieve when you can't see the whole database at once.

The table editing features in the Management Studio are quite easy to use. It's functionality is almost the same as the old Enterprise Manager in this respect but it feels more natural. On the other hand, I don't like the database project feature because it's a poor effort when compared to the facilities provided in other projects (DTS, C#, etc). It was hard work as there is no "create folder" and more importantly the scripts created as the result of "script to new query window" can't be added to the project. You have to create a query file in the project and paste into it. Visual Studio has had "add to project" on the right click menu since before VC6.

The Universal Data Model concept is going to take a little getting used to. It was extremely easy to create the cube using the intellicube wizard but they look strange when compared to SQL 2000 cubes. The dimensions have multiple hierarchies - one for each field in the dimension table. There is no neat star/snowflake schema, just a bunch of tables that look more like an OLTP database.

Finally, DTS development is a dream. The environment is great and there are loads of useful components to use. I did notice, however, that CSV files written by the File Destination component can't be read by the File Source component due to the final CR/LF. I have to assume this is a bug - one for Beta Place...

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, August 09, 2004 9:33:55 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

Three new blogs about SQL Server 2005:

  • Bob Beauchemin's Blog
  • Hitachi Consulting Yukon BI
  • Scalability Experts Blog

Well worth reading...

[Edit: well they were worth reading but all have been discontinued now.]
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, August 09, 2004 11:11:18 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, August 07, 2004

I was just tinkering with our schema and thinking about time - specifically UTC offsets and the time dimension. Our current scheme is for two dimensions; one to hold time of day and the other to hold dates. It works pretty well as we are normally querying either inter-day or intra-day. Not both. It also keeps the dimensioning code and tables under control.

As we are currently using UK time, the only issue to be aware of is the hour when the clocks go back in October. It doesn't affect queries though - we just get some data put in the wrong time-slot.

As the next version is going to be culture aware and support multiple time zones then this model fails because any query needs to take into account the possibility that the time zone offset may cause a difference in date members.

So for example, assuming the data is stored in the database in UTC, to get the facts for a particular day is:

select * 
from Facts f 
join Date d on d.DateID = f.DateID 
where d.Date = '2004-07-17' 

But with TZ offset (e.g. PST = -8)

select * 
from Facts f
join Date d on d.DateID = f.DateID
join Time d on d.TimeID = f.TimeID
where d.Year = 2004
and   d.Month = 7
and   (   
          (d.Day = 17 and t.Hour < 24 - 8)
       or (d.Day = 16 and t.Hour > 24 - 8)        
      )

Note that this can spill into months and years as well (imagine a time zone offset query around New Year's Eve). I don't think it gets any simpler with MDX.

If, on the other hand, we were to combine the date and time dimensions then the query becomes a lot simpler (here we can offset globally):

select * 
from Facts f
join DateTime d on d.DateTimeID = f.DateTimeID
where cast(floor((cast(d.Date as float) - 
    (8.0 / 24.0))) as smalldatetime) = '2004-07-17'

I think it's even easier in MDX with the Lag() function.

One problem to consider though is the explosion of size in the time dimension. The time dimension now has 525600 members per year as opposed to the old scheme with 365 (Date) + 1440 (Time) members.

Also, the SQL Server 2005 Intellicube wizard only has room for a single time dimension.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 07, 2004 9:33:23 AM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback
# Sunday, August 01, 2004

I noticed this whilst browsing the SQL Server 2005 online help. It creates a permanent (until you DROP SYNONYM) alias to an object in SQL Server. This could be very useful in shortening the amount SQL you need to type. Imagine you have a remote database containing a schema and table. You can either:

SELECT * FROM REMOTE_SERVER.SCHMEA_NAME.THE_TABLE_IN_QUESTION

Or:

CREATE SYNONYM THE_TABLE
    FOR REMOTE_SERVER.SCHMEA_NAME.THE_TABLE_IN_QUESTION
GO
SELECT * FROM THE_TABLE

Useful? Yes, but even more so when you need to use a database that has daft object naming conventions like Remedy managed systems.

CREATE SYNONYM Customers
    FOR dbo.C0183
GO
SELECT * FROM Customers
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, August 01, 2004 10:51:09 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, July 31, 2004
If you are a little confused about the new Data Transformation Services package that comes with SQL Server 2005, there is a list of frequently asked questions on the MSDN web site which covering a range of topics. Included are a number of answers on why the upgrade tool doesn't do a terribly good job.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, July 31, 2004 10:39:58 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, July 27, 2004
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, July 27, 2004 4:17:04 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Monday, July 26, 2004
I've just downloaded it from the MSDN subscriber downloads site... If you have the Visual Studio Beta you should un-install because there is a later .NET framework version in the SQL Server bits and it complains. Thats all for now - I'm off to play with Yukon. Check back later for more as I discover it.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, July 26, 2004 8:21:18 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, July 25, 2004

I'm in the middle of writing a custom DTS transform to perform fast key lookups for dimensioning facts as they are streamed into the database. I'll post more when I have completed it but for now I'm disappointed to find out that there is no way to provide any sort of user interface beyond the standard property dialog.

This strikes me as a major oversight on Microsoft's part considering they managed to do it for custom tasks. I have to question why they didn't add the capability. A number of the built in transforms have custom dialogs that appear when you click the properties button so it must be possible.

There are two reasons that I can think of. The first is that there is a way to provide custom transform dialogs and this is how they implement their own transform UIs but the interface is undocumented. The second is that the main transform task also implements all the UIs for the transforms.

Subscribers to the "evil empire" theory will probably favour the first reason. I can't agree because keeping the interface private would only benefit Microsoft if they were selling their own custom transforms. If the second is true then they are responsible for some pretty poor design. This is especially true because they managed to get the design right for custom tasks. The teams can't be that big that communication breaks down so it must be those little private offices they all have.

Maybe someone on the DTS team can enlighten me?

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, July 25, 2004 8:47:29 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, July 11, 2004
I have a number of days holiday I must take before the end of September or I'll lose them, so when the Yukon Beta arrives I plan to use some of them investigating the various parts of the new version along with its companion - Visual Studio 2005`. I had planned to try porting our reporting product but there would be issues with blogging and the company IPR, so I'm looking to find a smallish project I can complete in a month and write about. Ideas on a postcard please - just use the comments at the end of this article to add your suggestions.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, July 11, 2004 10:48:05 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Friday, July 09, 2004

Remember to backup your Analysis Server repository. Especially if you are on site and the customer is looking over your shoulder...

Fortunately, we run our systems in parallel so I was able to to resurrect it from the other side but not without forgetting to change the data source otherwise I would have processed the cubes against the wrong database.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, July 09, 2004 9:33:23 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Friday, July 02, 2004
What is the Service Broker? It's a technology for developing asynchronous queued database applications. The queues are first class database objects with T-SQL support. The broker priovides full SQL Server transaction support enabling reliable distributed computing. Part of this transactional support means that queues and messages are fully supported in the underlying subsystems - transaction log, backups, mirroring and failover. Other features include: message routing between intermediate nodes, multi-reader queues ith support for dynamic reader activation based on queue length.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, July 02, 2004 4:48:10 PM (GMT Daylight Time, UTC+01:00)  #    Comments [4] Trackback
It no longer has a workload govenor.... Nuff said!
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, July 02, 2004 4:47:32 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, July 01, 2004

Two words - "Database Mirroring"... Almost instant stand-by with very fast (~3 seconds) automatic or manual fail over. The client libraries also detect and fail over automatically too. To ensure this happens correctly a third server called the "Witness" provides majority logic to ensure that only a single side is up at any one time. I also get the feeling that Microsoft would prefer you to not attempt to use replication as a fail over solution as there are a number of issues to overcome - data loss, client fail over, recovery and the amount of custom code required to make it work properly. One thing you can't do on a mirror is use it. Kind of obvious really, but you can't even connect in a read-only mode.

Help is at and though - another interesting feature that will be available is a "Database Snapshot" which uses copy on write semantics to provide efficient views of a database at a point in time. A snapshot does not need to copy the database as it will reference the original database pages until they are modified. These snapshots can be created on mirrors which provide ideal access for reporting. Those that have all their data stored in a single database may want to think about splitting it into reporting databases and online databases. Basically, stick the information into a reporting database but keep the OLTP and application runtime data in another.

One last honourable mention is for SNAPHOT ISOLATION LEVEL - similar to Oracle's row versioning. But remember that tempdb is used to store the version information whilst a transaction is in progress.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, July 01, 2004 4:49:20 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Wednesday, June 30, 2004

Data Transformation Services been completely re-written for SQL Server 2005. The main difference is the separation of workflow and dataflow. But additional niceties include for/foreach looping, debugging, and OR constraints which mean daft hacks are not required. In fact modifying the package at runtime is impossible.

The new dataflow pipeline deserves a special mention as its no longer one row in one row out. The transform can consume data from multiple sources and send to multiple destinations in a single task - this allows for some interesting merging and splitting of data. Set operations are also supported - sort, aggregate,  split, multicast amongst others.

DTS has grown up and can play with the big boys. If DTS 2000 is not enough for your needs right now then consider waiting for SQL Server 2005 and not trying to implement yourself.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Wednesday, June 30, 2004 7:23:22 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback