# Tuesday, August 24, 2004

Your natural dimension primary key maybe too large to use in a data-mart with hundreds of millions of rows so you might need to create a smaller alternate key. Sometimes it's not always best to use IDENTITY columns.

Imagine the case where you are importing data from an external system. The usual method to add the correct foreign keys is to either create a lookup (in DTS) which can be very slow, or try and join the imported data to your dimension table and extract the dimension key (which is also pretty slow). I've used a date lookup here as an example (time of day is not used):

update dbo.ImportedData
set DateID = d.DateID
from dbo.ImportedData t join dbo.Dates d
    on d.Date = cast(floor(cast(t.Date as float)) as smalldatetime)

For dimensions like this it maybe much more efficient to create keys using a calculation performed on the common fields. For example, the number of days since a point in time:

update dbo.ImportedData
set DateID = datediff(day, '2000-01-01', Date)
from dbo.ImportedData

This avoids any sort of join and, in fact, could be performed on the data stream as it is loaded into the database. Also, if you use a smallint for the key then you get a 50% saving from the original smalldatetime type and are still able to use dates past 2175.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 24, 2004 5:36:44 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

OK, it crashes a bit because it's beta software. But at least it has this:

Auto Recovery Dialog

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 24, 2004 4:39:06 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

In order to optimise queries fully in previous versions of SQL Server you might have had to create a covering index by adding all the select and where columns to the index definition. This enabled an index to completely satisfy a query and avoid a bookmark lookup on the underlying table.

In Yukon there is a new keyword on the CREATE INDEX syntax - INCLUDE. This allows you to store additional non-key columns with the index but avoid the cost of them being in the actual non-leaf index structures. So for example:

create index IX_Dates_Date on dbo.Dates(Date)
include (
    Year, Month, Day,
    MonthName, DayOfWeekName, WeekOfYear
)

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 24, 2004 1:54:09 PM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback
# Monday, August 23, 2004

They only seem to care because we, as software developers, make them play the requirements game. In reality, the only thing that that matters is how the final product matches up to their expectations. The big problem is that you can't capture them in a document to use as the blueprint for development. There are some things you can to ensure your customer is happy with the end result:

  1. Write a short "Vision" document describing the key features to get everyone on the same page early on
  2. Program by feature to produce functionality early, don't spend too much time on infrastructure
  3. When you've completed a feature, show it to the customer and get feedback
  4. Allow the customer to change their minds

These points will only work when both you and the customer share the project risk; either with fixed rates/variable time or fixed time/variable functionality.

If it's a fixed price/fixed functionality contract then you'd better start writing those requirements...

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, August 23, 2004 7:06:50 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Wednesday, August 18, 2004
There is an excellent article in this months MSDN Magazine entitled "Streamline Your Database Setup Process with a Custom Installer". We do something similar in our own product, but nowhere near as complete as this. It covers version checking, upgrades, patches, extended stored procedures and bulk loading of data.
by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Wednesday, August 18, 2004 1:12:20 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, August 17, 2004

Ralph Kimball has written an excellent series of articles about data warehouse theory and practise. They range from Data Warehouse Fundamentals through Data Warehouse Architecture and Modelling to Data Staging and Data Quality.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 17, 2004 11:40:11 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, August 15, 2004

I've just noticed a very nice side affect from having the refactoring functionality in VS2005. When you rename something using the property window, it automatically updates all the references. So you know what has just happened this appears in the refactor window:

Rename
d:\work\dataentry\form1.designer.cs(3,16): updated definition
d:\work\dataentry\form1.cs(14,16): updated definition
d:\work\dataentry\form1.cs(16,10): updated reference
D:\Work\DataEntry\Program.cs(21,24): updated reference

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, August 15, 2004 12:23:16 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, August 14, 2004

When I'm in front of my computer all day I usually listen to music. Since getting broadband I've discovered the Radio 1 On Demand Player. It's a great way to catch all the specialist shows they do in the evenings plus my favourite, The Essential Mix - recorded live in a club in the wee small hours of Sunday morning.

Radio 1 have recently changed their line up and I'm really enjoying one of the new shows: Annie Mac who plays a pretty cool mix of house, hip hop, bass and beats.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 14, 2004 2:13:05 PM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback

I discovered another nice new feature today when creating computed columns, the PERSISTED keyword. In previous versions, to avoid the overhead of the calculation you could create an index to materialise the computed column. Now you can just persist the computed column and the result is stored with the table and updated as necessary. For example:

create table dbo.Tasks
(
    StartTime datetime NOT NULL,
    EndTime datetime NOT NULL,
    Duration  AS EndTime - StartTime PERSISTED
)

One other thing to note is that the PERSISTED keyword also helps you to create indexes on the column too. Remember that a computed column must be deterministic and precise in order to create an index on it. Well there are several ways in which that might not be verifiable in SQL Server 2005; for example with CLR functions. Persisting the column ensures its deterministic and hence allows index creation.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 14, 2004 11:31:46 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Friday, August 13, 2004
Apologies to those using RSS readers to view this blog. The RSS content for yesterday's post was completely messed up by the RSS generator. It seems it doesn't get along too well with HTML coloured text pasted from Visual Studio. Trust me, it looks a lot better online...
by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, August 13, 2004 9:40:11 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, August 12, 2004

If you are currently using the nested sets model for storing trees then it may be about time for a change. This model is great for queries but inserts and updates can touch nearly every row in the table. The more natural way of storing trees is via a ParentID known as the adjacency list model as shown below:

create table dbo.Projects

(

ProjectID int not null,

ParentID int null,

Name nvarchar(max) not null

)

The big drawback, however, is that you need to write recursive queries to get anything useful. Help is at hand though as SQL Server 2005 adds some features to automatically unwind the tree: Common Table Expressions or CTEs.

The basic structure of a CTE working with hierarchies is:

with <theCTE>

(

<anchor expression>

union all

<recursive expression>

)

select * from <theCTE>

Where the <anchor expression> determines where to start the recursion and the <recursive expression> does the work. Note that the <recursive expression> should join to the CTE to work properly, for example the following lists all descendants of the project defined by <rootID>:

with Tree (ParentID, ProjectID, Name, Level, Path)

as

(

select ParentID, ProjectID, Name,

0 as [Level], '/' + Name as [Path]

from dbo.Projects

where ParentID = <rootID>

union all

select p.ParentID, p.ProjectID, p.Name,

t.Level + 1, t.Path + '/' + p.Name

from dbo.Projects p

join Tree t on t.ProjectID = p.ParentID

)

select ParentID, ProjectID, Name, Level, Path

from Tree t

order by ParentID, Level, ProjectID

Note how the Level and Path are calculated by adding to the result from the previous iteration.

You can also recurse up the tree:

with Tree (ParentID, ProjectID, Name)

as

(

select ParentID, ProjectID, Name

from dbo.Projects

where ProjectID = <rootID>

union all

select p.ParentID, p.ProjectID, p.Name

from dbo.Projects p

join Tree t on t.ParentID = p.ProjectID

)

select distinct

ProjectID, ParentID, Name

from Tree t

order by ParentID, ProjectID

When you consider you can wrap these queries into table valued functions, there is a lot of scope for writing cohesive reusable SQL.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, August 12, 2004 2:37:48 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, August 10, 2004

It seems that you can't use columns of type uniqueidentifier as parent-child keys because the dialogs (with and without intellicube) only list integer candidates. I hope this gets fixed as its much easier to use uniqueidentifiers as primary keys (no need to mess with scope_identity).

Of course, the reason might be that no one would use a key so large in a fact table and that's why they are not listed...

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 10, 2004 4:31:05 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback