I’ve been a bit lazy with blogging recently so I wanted to talk about a question that came up in the office today. The question was about the utility and performance of unique identifier columns as primary keys.
A unique identifier or GUID is a 16 byte data type; four times the size of an integer normally used for primary keys. This means that any index built using a GUID will be larger and contain less records per page. I managed to find this thread on p2p.wrox.com which seems to indicate there are some special index optimisations (cascading GUID blocks) in use for GUIDs. This is unconfirmed though so maybe someone from the SQL Server team can provide a definitive answer.
The size of the datatype can add up when used in data warehouse fact tables because each fact may have a large number of dimensions. For this reason I would probably stick to integer dimension keys.
GUIDs, by their very nature, are non-sequential which completely screws clustered indexes. There is a new sequential generation algorithm in Yukon which solves the issue although I can’t think of a reason why you would use a clustered index on a GUID because they are better suited for range queries. At Exony we hardly ever create a clustered index on the primary key as it’s normally reserved for datetime or date key columns.
The above seems quite negative but I think there are some good points to remember. There are some definite ease of use considerations with GUIDs when compared to IDENTITY columns:
To end I think the general ease of GUID use outweighs any performance degradation in normal operations, although probably not in data warehouses. I see that the SQL Server team use GUIDs for primary keys (see the Reporting Services database for details) so there can’t be too much wrong with them.
Page rendered at Saturday, February 04, 2012 11:39:37 AM (GMT Standard Time, UTC+00:00)
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.