# Saturday, June 04, 2005
« Test Developer Job at Exony | Main | What Video Game Character Are You? »

It's been a while since I posted last due to our current work schedule. It's hard work preparing a new product version when that release is due to go to a customer before it's GA'd.

Anyway, in this post about "Uniqueidentifiers as primary key columns" I talked about some of the benefits to ditching the traditional IDENTITY columns. I did point out that their use in data warehouse applications is questionable as key sizes become important when dealing with huge amounts of data. Herein lies the problem - we have an application that uses uniqueidentifiers wherever possible but dimension tables still use integer keys. No problem so far but we have recently tried to implement parent-child dimensions within Analysis Services and our folder structure is using uniqueidentifiers as keys.

As you may know, Analysis Server uses a simple arrangement for parent-child dimensions where each member has a key column and a parent key column as per the adjacency list model. To implement this, lets assume two tables Office and Folder where Office is a dimension and Folder is a hierarchy into which we place each office. The two tables look like:

create table Folder (
    folder_id int not null primary key,
    parent_id int not null,
    name nvarchar(50) not null
)

create table Office (
    office_id int not null primary key,
    folder_id int not null,
    name nvarchar(50) not null
)

There is a foreign key between folder_id in the two tables to tie them together. Note, I've used integer primary keys here so creating a parent-child dimension is easy as follows:

create view OfficeTree
as
    select folder_id + (select max(office_id) from Office) as item_id,
           parent_id + (select max(office_id) from Office) as parent_id,
           name
    from Folder
    union all
    select office_id,
           folder_id + (select max(office_id) from Office), 
           name
    from Office

Note, that the Folder keys must be offset to ensure they don't clash with the Office keys. Now if we try and change the folder_id and parent_id fields in Folder to uniqueidentifiers we end up with a problem - the data types are not compatible when creating the view OfficeTree and you get the message "Operand type clash: uniqueidentifier is incompatible with int".

The solution to this problem is of course to either go back to using ints somehow. I was about ready to remove the uniqueidentifiers when Koan Bremner came up with the following solution. The trick is to realize that the references between the two tables are only required when processing the dimension and Analysis Server doesn't care if the IDs change between cycles just as long as the keys joined to fact tables remain constant. So we still need to add an integer key column to the Folder table as below:

create table Folder (
    folder_id uniqueidentifier not null primary key,
    parent_id uniqueidentifier not null,
    alt_id int null,
    name nvarchar(50) not null
)

We allow the column to be null because the keys are only required before we process the dimension and allowing NULLs ensures easy inserts in a replicated environment. To update the column before processing requires a little magic (again from Koan).

declare @counter int
select @counter = isnull(max(alt_id), 0)
from Folder

update Folder
set @counter = alt_id = @counter + 1 -- Ed: Typo fixed (see comments).
where alt_id is null
option (maxdop 1)

The maxdop option ensures a sequential update to the column which forces an incrementing value. Now we can rewrite the view to use the new field. Note that we have to join back to Folder in order to resolve the parent_id field.

create view OfficeTree
as
    select f1.alt_id + (select max(office_id) from Office) as item_id,
           f2.alt_id + (select max(office_id) from Office) as parent_id,
           f1.name
    from Folder f1
    left join Folder f2 on f2.folder_id = f1.parent_id
    union all
    select o1.office_id, 
           f2.alt_id + (select max(office_id) from Office), 
           o1.name
    from Office o1
    left join Folder f2 on f2.folder_id = o1.folder_id
 

 

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, June 04, 2005 1:37:31 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback
Related posts:
Do you have great business intelligence skills?
Integration Services Design Principals
Physical Data Warehouse Design
Analysis Services Essential Reading
When should you do an incremental extract?
Post TechReady and Popfly Invites
Sunday, June 05, 2005 10:24:36 AM (GMT Daylight Time, UTC+01:00)
Sorry, Jim, spotted a little typo in your UPDATE statement; it should be:

update Folder
set @counter = alt_id = @counter + 1
where alt_id is null
option (maxdop 1)

If you leave it as it was, then no rows get updated (because NULL + 1 is NULL).
Sunday, June 05, 2005 10:32:58 AM (GMT Daylight Time, UTC+01:00)
Koan,

Thanks for the spot. Have fixed the typo.

Jim.
Comments are closed.