# Saturday, August 14, 2004

When I'm in front of my computer all day I usually listen to music. Since getting broadband I've discovered the Radio 1 On Demand Player. It's a great way to catch all the specialist shows they do in the evenings plus my favourite, The Essential Mix - recorded live in a club in the wee small hours of Sunday morning.

Radio 1 have recently changed their line up and I'm really enjoying one of the new shows: Annie Mac who plays a pretty cool mix of house, hip hop, bass and beats.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 14, 2004 2:13:05 PM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback

I discovered another nice new feature today when creating computed columns, the PERSISTED keyword. In previous versions, to avoid the overhead of the calculation you could create an index to materialise the computed column. Now you can just persist the computed column and the result is stored with the table and updated as necessary. For example:

create table dbo.Tasks
(
    StartTime datetime NOT NULL,
    EndTime datetime NOT NULL,
    Duration  AS EndTime - StartTime PERSISTED
)

One other thing to note is that the PERSISTED keyword also helps you to create indexes on the column too. Remember that a computed column must be deterministic and precise in order to create an index on it. Well there are several ways in which that might not be verifiable in SQL Server 2005; for example with CLR functions. Persisting the column ensures its deterministic and hence allows index creation.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 14, 2004 11:31:46 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Friday, August 13, 2004
Apologies to those using RSS readers to view this blog. The RSS content for yesterday's post was completely messed up by the RSS generator. It seems it doesn't get along too well with HTML coloured text pasted from Visual Studio. Trust me, it looks a lot better online...
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, August 13, 2004 9:40:11 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, August 12, 2004

If you are currently using the nested sets model for storing trees then it may be about time for a change. This model is great for queries but inserts and updates can touch nearly every row in the table. The more natural way of storing trees is via a ParentID known as the adjacency list model as shown below:

create table dbo.Projects

(

ProjectID int not null,

ParentID int null,

Name nvarchar(max) not null

)

The big drawback, however, is that you need to write recursive queries to get anything useful. Help is at hand though as SQL Server 2005 adds some features to automatically unwind the tree: Common Table Expressions or CTEs.

The basic structure of a CTE working with hierarchies is:

with <theCTE>

(

<anchor expression>

union all

<recursive expression>

)

select * from <theCTE>

Where the <anchor expression> determines where to start the recursion and the <recursive expression> does the work. Note that the <recursive expression> should join to the CTE to work properly, for example the following lists all descendants of the project defined by <rootID>:

with Tree (ParentID, ProjectID, Name, Level, Path)

as

(

select ParentID, ProjectID, Name,

0 as [Level], '/' + Name as [Path]

from dbo.Projects

where ParentID = <rootID>

union all

select p.ParentID, p.ProjectID, p.Name,

t.Level + 1, t.Path + '/' + p.Name

from dbo.Projects p

join Tree t on t.ProjectID = p.ParentID

)

select ParentID, ProjectID, Name, Level, Path

from Tree t

order by ParentID, Level, ProjectID

Note how the Level and Path are calculated by adding to the result from the previous iteration.

You can also recurse up the tree:

with Tree (ParentID, ProjectID, Name)

as

(

select ParentID, ProjectID, Name

from dbo.Projects

where ProjectID = <rootID>

union all

select p.ParentID, p.ProjectID, p.Name

from dbo.Projects p

join Tree t on t.ParentID = p.ProjectID

)

select distinct

ProjectID, ParentID, Name

from Tree t

order by ParentID, ProjectID

When you consider you can wrap these queries into table valued functions, there is a lot of scope for writing cohesive reusable SQL.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, August 12, 2004 2:37:48 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, August 10, 2004

It seems that you can't use columns of type uniqueidentifier as parent-child keys because the dialogs (with and without intellicube) only list integer candidates. I hope this gets fixed as its much easier to use uniqueidentifiers as primary keys (no need to mess with scope_identity).

Of course, the reason might be that no one would use a key so large in a fact table and that's why they are not listed...

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 10, 2004 4:31:05 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback

I don't normally link other stuff, but seeing as I'm at home on a day off sitting in front of the computer playing with a Microsoft product I thought it was quite relevant (although I'm not married)...

A good reminder to keep your work life balance in check... Bill Gates Took My Baby Away. [from Lenn]

(via SimpleGeek)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 10, 2004 2:55:11 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

I've expanded the timesheet model a little from my prototype yesterday. Most of the changes are because I want to try specific SQL 2005 features:

  • The Comments field in TimesheetEntries is to see how the UDM handles non-numeric measures
  • The ReviewedByID field in Users is for trying Common Table Expressions (hierarchies)
  • I've added a Clients table to move away from a star schema
  • Teams and TeamMembers have been added to test many-to-many relationships in the UDM
  • Teams work on Projects to see how the UDM handles loops (this isn't great design BTW)

Also there a a couple of other new things because I want them for specific features in the solution:

  • Activities allow slicing by what you where doing (coding, testing etc)
  • FirstName and LastName make reports look a little better
  • WeekOfYear has been added because it's how timesheets are collected

The complete model so far is shown below. Note, I don't believe in adding everything you can think of into the schema. Invariably it never gets used and you only have to maintain it. This is one of the main XP tenets. It's a little more difficult with databases but still possible. One for a future post...

 Initial Timesheet Schema

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, August 10, 2004 12:02:53 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Monday, August 09, 2004

I've just spent the day tinkering with SQL 2005 to build a very quick prototype of my Timesheet Project. It only has four tables but was enough to enable me to create a SQL database, load some data with DTS and build an analysis cube. Here are some initial thoughts...

The lack of a database diagram tool in all three studios (Management, BI and Visual) is a pain in the neck as I usually start a database by creating a few tables on a diagram, adding some relationships and changing them around until I'm happy. It's quite an iterative process and very difficult to achieve when you can't see the whole database at once.

The table editing features in the Management Studio are quite easy to use. It's functionality is almost the same as the old Enterprise Manager in this respect but it feels more natural. On the other hand, I don't like the database project feature because it's a poor effort when compared to the facilities provided in other projects (DTS, C#, etc). It was hard work as there is no "create folder" and more importantly the scripts created as the result of "script to new query window" can't be added to the project. You have to create a query file in the project and paste into it. Visual Studio has had "add to project" on the right click menu since before VC6.

The Universal Data Model concept is going to take a little getting used to. It was extremely easy to create the cube using the intellicube wizard but they look strange when compared to SQL 2000 cubes. The dimensions have multiple hierarchies - one for each field in the dimension table. There is no neat star/snowflake schema, just a bunch of tables that look more like an OLTP database.

Finally, DTS development is a dream. The environment is great and there are loads of useful components to use. I did notice, however, that CSV files written by the File Destination component can't be read by the File Source component due to the final CR/LF. I have to assume this is a bug - one for Beta Place...

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, August 09, 2004 9:33:55 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback

Three new blogs about SQL Server 2005:

  • Bob Beauchemin's Blog
  • Hitachi Consulting Yukon BI
  • Scalability Experts Blog

Well worth reading...

[Edit: well they were worth reading but all have been discontinued now.]
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, August 09, 2004 11:11:18 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, August 07, 2004

I was just tinkering with our schema and thinking about time - specifically UTC offsets and the time dimension. Our current scheme is for two dimensions; one to hold time of day and the other to hold dates. It works pretty well as we are normally querying either inter-day or intra-day. Not both. It also keeps the dimensioning code and tables under control.

As we are currently using UK time, the only issue to be aware of is the hour when the clocks go back in October. It doesn't affect queries though - we just get some data put in the wrong time-slot.

As the next version is going to be culture aware and support multiple time zones then this model fails because any query needs to take into account the possibility that the time zone offset may cause a difference in date members.

So for example, assuming the data is stored in the database in UTC, to get the facts for a particular day is:

select * 
from Facts f 
join Date d on d.DateID = f.DateID 
where d.Date = '2004-07-17' 

But with TZ offset (e.g. PST = -8)

select * 
from Facts f
join Date d on d.DateID = f.DateID
join Time d on d.TimeID = f.TimeID
where d.Year = 2004
and   d.Month = 7
and   (   
          (d.Day = 17 and t.Hour < 24 - 8)
       or (d.Day = 16 and t.Hour > 24 - 8)        
      )

Note that this can spill into months and years as well (imagine a time zone offset query around New Year's Eve). I don't think it gets any simpler with MDX.

If, on the other hand, we were to combine the date and time dimensions then the query becomes a lot simpler (here we can offset globally):

select * 
from Facts f
join DateTime d on d.DateTimeID = f.DateTimeID
where cast(floor((cast(d.Date as float) - 
    (8.0 / 24.0))) as smalldatetime) = '2004-07-17'

I think it's even easier in MDX with the Lag() function.

One problem to consider though is the explosion of size in the time dimension. The time dimension now has 525600 members per year as opposed to the old scheme with 365 (Date) + 1440 (Time) members.

Also, the SQL Server 2005 Intellicube wizard only has room for a single time dimension.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, August 07, 2004 9:33:23 AM (GMT Daylight Time, UTC+01:00)  #    Comments [1] Trackback
# Monday, August 02, 2004

I have a meeting tomorrow to discuss a number of deployments we have in the next couple of weeks. One thing to note is that all installations come with their own set of problems and I don't always have the answers. However, preparation is key; as they say - "pathetic preparation makes for a piss poor presentation".

One thing I'm going to try is a bit of brainstorming. I'm going to draw up each installation on a white-board and we'll list up anything that could possibly go wrong. From that we should be able to plan for nearly every eventuality. Hopefully it won't be a too depressing experience .

The main stay of my deployment plans is practise. If you have done it in the office without a customer breathing down your neck then the real thing is usually much easier. We try and get hold of a backup from the customer as they usually have datasets containing unforeseen combinations of data (especially as the source database doesn't have any constraints - shame on you Cisco!).

The last thing that commonly trips us up is firewalls - most of our customers are in retail or banking so security is sometimes taken to the extreme. It would be costly for us to reproduce their firewall configuration (one customer has DMZ's around each server), so a quick tip is to use personal firewalls such as ZoneAlarm to lock down your test servers similar to a DMZ deployment.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, August 02, 2004 10:18:09 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, August 01, 2004

It's August and it's hot so, by tradition, the British public head to the beach. As a slight alternative I went to Hurst Castle which stands on a spit of land between the south cost and the Isle of Wight. It was originally built by Henry VIII as part of a coastal defence chain. The castle was used right though to the Second World War with various upgrades and additions along the way.

As the castle on a shingle spit there is no road and you have to walk 1½ miles down loose rocks. Very tiring but worth it. I took the camera and have loads of pictures so see what its all about.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, August 01, 2004 8:46:15 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback