# Sunday, January 16, 2005

Just a quick reminder that there is no point manually checking database constraints before modifying your data. Take this code for example:

create table Users (
    UserID uniqueidentifier
        primary key not null default newid(),

    LoginName nvarchar(50) not null unique
)

if not exists (
    select * from Users
    where LoginName = 'JohnSmith'
)
begin
    insert into Users (LoginName)
    values ('JohnSmith')
end

There is a race condition between the if not exists check and the actual insertion of data. Another connection could insert the same data before this insert gets a chance to complete. We could wrap the check and insert in a transaction to ensure consistency as follows (note the isolation level because we need to place a range lock on data that could be there):

begin tran
set transaction isolation level repeatable read
if not exists (
    select * from Users
    where LoginName = 'JohnSmith'
)
begin
    insert into Users (LoginName)
    values ('JohnSmith')
end
commit

This works but we’ve got the added expense of a transaction. There is no point in doing this though as there is already a unique constraint on the column and databases are very good at checking constraints. With this in mind, the new and improved version looks like:

insert into Users (LoginName) 
values ('JohnSmith')

Much simpler. There is only one minor problem, for which I don’t have an answer, and that is if you have multiple constraints on a table then it’s very difficult to determine which one failed. The @@error variable will be set to 547 (constraint violation) and a suitable raiserror is generated which could be parsed by the caller but not easily.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, January 16, 2005 10:20:35 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Saturday, January 15, 2005

This one can be expanded to selling anyone on an idea:

The usual set of MSDN articles:

Some notable downloads:

Finally, a couple of excellent articles by Michele Leroux Bustamante (dasblonde)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, January 15, 2005 10:48:09 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback

OK, everyone else in blog-land has done this so here’s mine:

I am nerdier than 79% of all people. Are you nerdier? Click here to find out!

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, January 15, 2005 10:12:05 AM (GMT Standard Time, UTC+00:00)  #    Comments [2] Trackback

In the role I currently do for Exony, it’s a rare day when I get to do some good old fashioned problem solving in code. Fortunately, today was one of those days. The problem I’ve been working at is how to efficiently model user and groups in SQL where a user can be a member of multiple groups and groups can contain both users and other groups. I needed to get a list of all groups a user is either directly or indirectly a member of. In procedural languages this problem is easy with recursion. In SQL it’s a real pain because you need to use a set based solution for optimum performance – cursors are out.

I started with a semi-normalised schema modelling the users, groups and group memberships as show below:

UserGroupMember1

It’s de-normalised because as far as security goes I want to treat users and groups the same so I’m storing both types in the Users table, differentiated by UserType. Unfortunately, the only way to navigate this structure is with procedural code – not good, time for a different approach.

If you draw a sample set of users and groups on a piece of paper with lines representing membership it looks like a forest of trees so I started wondering if I should model the members table in a similar manor. There are a number of well known idioms for storing trees in SQL. The most widely known is Joe Celko’s nested sets model but I prefer a path model because inserts and sub-tree moves are easier (If you fancy finding out how little you know about SQL and why the adjacency list solution is completely wrong then just ask Joe the question in news://microsoft.public.sqlserver.programming). I added a new table for this because the normalised members table is great for storing the essential data and I believed the tree should be a pre-calculated optimisation.

Unfortunately, as soon as I coded this up I ran into a problem – lack of brain power. One of the goals I set was I must be able to recreate the tree from the members table. I’m sure it’s possible but I spent far too much time staring at the ceiling trying to come up with an answer. One thing did stick though – why not maintain a  pre-calculated list of all parent-child relationships in a format similar to the existing members table. As the data is the same as the existing members table, I just added a flag to indicate the row is calculated as shown below:

UserGroupMember2

So the trick now is to fill this table with every possible combination of user and group, all the way up each hierarchy. The SQL actually took a couple of goes until the light went on and I realised I needed to be selecting not parent-child relationships (I already have them in the table as non-calculated rows) but grandparent-child relationships. A single pass will not get all the rows required – you need to do it until no more rows are added. Time for some code:

declare @rowcount int
set @rowcount = 1

while @rowcount > 0
begin
    insert into dbo.Members (ParentID, ChildID, Calculated)
    select distinct p.UserID, c.UserID, 1
    from dbo.Members pm
    join dbo.Users p ON p.UserID = pm.ParentID
    join dbo.Members cm ON cm.ParentID = pm.ChildID
    join dbo.Users c ON c.UserID = cm.ChildID
    where not exists (
        select *
        from dbo.Members
        where ParentID = p.UserID
        and ChildID = c.UserID
    )

    select @rowcount = @@rowcount
end

From here, selecting the complete set of groups any user is a member of is easy:

declare @userid uniqueidentifier

select @userid = UserID
from dbo.Users
where Name = 'Administrator'

select ParentID as UserID
from dbo.Members
where ChildID = @userid
union
select @userid

Just add an index to Members.ChildID and everything is ready to go. All I need to do now is implement the rest of the security system…

[Edit: Because I wrote this late last night after a couple of glasses of wine I had a panic this morning thinking there was a flaw in my thinking so I coded up a test script. It turns out that everything works, but you have to watch for loops in the Members table.]

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, January 15, 2005 12:44:14 AM (GMT Standard Time, UTC+00:00)  #    Comments [1] Trackback
# Tuesday, January 11, 2005
I see my Google Page Rank has moved up to 5. I guess that’s down to links from Robert Scoble and Scott Hanselman. Thanks guys.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, January 11, 2005 10:03:51 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Monday, January 10, 2005

I’ve been a bit lazy with blogging recently so I wanted to talk about a question that came up in the office today. The question was about the utility and performance of unique identifier columns as primary keys.

A unique identifier or GUID is a 16 byte data type; four times the size of an integer normally used for primary keys. This means that any index built using a GUID will be larger and contain less records per page. I managed to find this thread on p2p.wrox.com which seems to indicate there are some special index optimisations (cascading GUID blocks) in use for GUIDs. This is unconfirmed though so maybe someone from the SQL Server team can provide a definitive answer.

The size of the datatype can add up when used in data warehouse fact tables because each fact may have a large number of dimensions. For this reason I would probably stick to integer dimension keys.

GUIDs, by their very nature, are non-sequential which completely screws clustered indexes. There is a new sequential generation algorithm in Yukon which solves the issue although I can’t think of a reason why you would use a clustered index on a GUID because they are better suited for range queries. At Exony we hardly ever create a clustered index on the primary key as it’s normally reserved for datetime or date key columns.

The above seems quite negative but I think there are some good points to remember. There are some definite ease of use considerations with GUIDs when compared to IDENTITY columns:

  • Identities need to be reseeded in replicated systems to ensure inserts don’t clash
  • For GUID columns, you can either supply a value or rely on a default safe in the knowledge you won’t clash values
  • GUIDs can be generated at build time and embedded in SQL or load files simplifying installation code
  • GUIDs can be generated on remote systems or in higher system layers without issue

To end I think the general ease of GUID use outweighs any performance degradation in normal operations, although probably not in data warehouses. I see that the SQL Server team use GUIDs for primary keys (see the Reporting Services database for details) so there can’t be too much wrong with them.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, January 10, 2005 9:57:21 PM (GMT Standard Time, UTC+00:00)  #    Comments [3] Trackback
# Friday, December 24, 2004

That’s all for this year. I’m off to London to spend Christmas with my folks. Merry Christmas (or whatever occasion you are celebrating) and a happy New Year. See you then.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, December 24, 2004 12:13:08 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Monday, December 20, 2004

If you always wondered why playing Halo 2 over satellite broadband is a pain in the neck then check out http://stuffo.howstuffworks.com/halo-network.htm for the implementation details with Chris Butcher.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, December 20, 2004 10:58:29 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Saturday, December 18, 2004

By way of Euan Garden:

Hey guys, how about a nice article on automated SSIS deployment for us ISVs that need to write installers for our products?

Also, don't forget to check out Darren and Allan's SQLDTS.com site reincarnated as SQLIS.com - SQL Server Integration Services on the Web.

Now playing: - Ryan's Radio

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, December 18, 2004 2:35:34 PM (GMT Standard Time, UTC+00:00)  #    Comments [1] Trackback

My feed list is starting to get out of hand. It's so large that I've started adding feeds already subscribed to because I can't remember if they are new or not. Robert, I don't know how you keep up with all your feeds.

To help, I've knocked together a little tool to clean duplicates out of my OPML file. It's a .NET console application so just supply the relevant filename on the command line. It will backup the file before processing. Source is included but it's nothing special. Enjoy.

Now playing: - Ryan's Radio

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, December 18, 2004 2:15:46 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Monday, December 13, 2004

In a previous post I talked about how difficult it is to blog in a small company and promised a follow up on the kind of things you could blog about. Well I think corporate blogging has to be about customers. If a blog doesn't have any customer focused content then it's a personal site; possibly very interesting, but not corporate. Microsoft bloggers are fortunate because, for most of them, they are doing the same job as their customers i.e. software development. This means they get to talk about their daily activities. For those of us whose customers are in a different field, we have to look for content that may not be immediately available.

It's time for some examples. I work for a company whose main business is selling software to help customers optimise their contact centres. What do our customers want to hear about? They probably want to know about how they can save money, keep their customers happy, integrate new systems and new developments in the field. To get specific, most of Exony's customers use Cisco ICM so they are probably interested in how to do great things with it. As I haven't found any Cisco bloggers (let me know if there are any) then we could fill that gap along with how to get the best out of the marriage between ICM and Exony.

To end, I tend to get more motivated by promising a post on a subject so look out for more on Exony Reporting and Cisco ICM here in the near future.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, December 13, 2004 11:06:03 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Sunday, December 12, 2004
I see Clemens is having issues with referral spam too. So until Omar checks in the 1.7 stuff I’ll make my white-list changes available. Just unzip onto the latest source from the Workspace and rebuild.
This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, December 12, 2004 2:18:44 PM (GMT Standard Time, UTC+00:00)  #    Comments [1] Trackback