# Friday, June 04, 2010

If so then my team would like to know as we are looking for another business intelligence consultant.

Full details on our careers web site: SQL / BI Consultant (722749).

You can either post your details directly on the site or send me your CV at [jsnape at microsoft.com] and I’ll enter it for you (disclosure: if you send it via me I will get a bounty when you are hired).

Good luck.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, June 04, 2010 12:18:29 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, February 06, 2010

image

Asking for a running sum in a report is quite a common thing but this week I was asked to create a running sum for a particular customer against number of facts. What I mean here is to create a graph of count vs amount (sort of like a Pareto except in transaction order). So something that looks like graph above.

This is a well rehearsed subject in MDX. You can either use recursion:

With Member [Measures].[Running Sum]
As
    [Internet Sales Order Details].[Sales Order Number].PrevMember
    +
    [Measures].[Internet Gross Profit]

Member [Measures].[Running Count]
As
    [Internet Sales Order Details].[Sales Order Number].PrevMember
    +
    1

Select {
    [Measures].[Running Count],
    [Measures].[Internet Gross Profit],
    [Measures].[Running Sum]
} On 0, Non Empty {
    [Internet Sales Order Details].[Sales Order Number].Children
} On 1
From [Adventure Works]

Where (
    [Customer].[Customer].[Brian Watson]
)

Or, iteration (thanks to Chris Webb for some help on relative performance) which should perform better, especially on AS2008.

With Member [Measures].[Running Sum]
As
    Sum(
        {Null : [Internet Sales Order Details].[Sales Order Number].CurrentMember},
        [Measures].[Internet Gross Profit]
    )
Member [Measures].[Running Count]
As
    Count(
        {Null : [Internet Sales Order Details].[Sales Order Number].CurrentMember}
    )

Select {
    [Measures].[Running Count],
    [Measures].[Internet Gross Profit],
    [Measures].[Running Sum]
} On 0, Non Empty {
    [Internet Sales Order Details].[Sales Order Number].Children
} On 1
From [Adventure Works]

Where (
    [Customer].[Customer].[Brian Watson]
)

[However, on my x64 laptop the second version takes much longer to execute YMMV.]

This is OK for AdventureWorks but my real degenerate dimension has many millions of members and this just doesn’t scale. I contemplated using Reporting Services RunningValue() function but as far as I can tell you can’t use it to generate a category axis.

I needed a way of generating the running count for the x-axis in a way that uses Analysis Services’ excellent aggregation ability.

imageThe solution I ended up with is to create an artificial hierarchy and bucket transactions. That way I can create an attribute relation for aggregation and, importantly, control the number of cells in the iteration.

The next problem was how to assign values to this bucket – some customers had only a few transactions yet others had millions. They all needed to be spread over a fixed set of buckets.

The answer lies in a SQL Server RANK() function:

update dw.Sales
set TradeBucket = x.TradeBucket
from (
    select TradeKey,
    rank() over(partition by CustomerKey order by t.TradeKey asc) /
    case
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 1000 then 1
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 10000 then 10
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 100000 then 100
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 1000000 then 1000
        else 10000
    end as TradeBucket
    from dw.Sales
) x
where dw.Sales.TradeKey = x.TradeKey

Effectively, were are generating an incrementing number on a per customer basis and then dividing that number to compress the range. This is surprisingly fast to execute.

Once everything is processed, my new MDX looks like:

With Member [Measures].[Running Sum]
As    
    Sum(
        {Null : [Internet Sales Order Details].[Trade Bucket].CurrentMember},
        [Measures].[Internet Gross Profit]   
    )

Member [Measures].[Running Count]
As
    Sum(
        {Null : [Internet Sales Order Details].[Trade Bucket].CurrentMember},
        [Measures].[Sales Count]   
    )

Select {
    [Measures].[Running Count],
    [Measures].[Internet Gross Profit],
    [Measures].[Running Sum]
} On 0, Non Empty {
    [Internet Sales Order Details].[Trade Bucket].Children
} On 1
From [Adventure Works]

Where (
    [Customer].[Customer].[Brian Watson]
)

It works on aggregated data; there are still around 1000 points which is just fine on the graph and it executes in around 3 seconds. So all good?

Well, for now yes but I can see a problem looming – every time I do an import I update every fact row and fully reprocess the cube. That isn’t going to scale long term. I will probably have to implement some sort of bucket partition strategy.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, February 06, 2010 1:47:11 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Thursday, April 16, 2009

Every data warehouse needs a date dimension and at some point it needs to be populated. Most use some sort of a SQL script that loops though the dates and add rows to the destination table but this is pretty slow to execute. You might even try cross joining a year, month and day temporary tables to produce a set based solution but don’t forget to filter out the illegal days.

I prefer to fill my date tables by generating the correct stream of values from a SQL Server Integration Services script source component. This has a number of benefits:

  • It executes very quickly
  • The data can be bulk loaded
  • CultureInfo supplies the correct translations of day and month names
  • It is easy to add custom columns such as fiscal years and quarters

I haven’t wrapped this in a pre-compiled component as it is so easy to do in script from. Also, I haven’t got around to generalizing the fiscal date offsets for different companies so they usually have to be custom coded.

Script Component Type Dialog

First drop a “Script Component” onto your Data Flow.

Select “Source” as the Script Component Type and click OK.

Then double-click the newly added component to edit the properties.

Note that you need to add the correct output columns before adding the script or else it won’t compile.


Script Source Outputs

I’ve renamed the output here to “Dates” to help further down the Data Flow.

Click the “Add Column” button to add new columns as show here. Note that I’ve also changed the data type of each column to match my source table. It required casts in script but it’s easier than conversions in the data pipeline.

Finally go back to the script part of the dialog and click the “Edit Script” button to launch Visual Studio for Applications.

In the resulting window, add your code to generate the date stream to the CreateNewOutputRows() function.
The general form is of:

var output = this.DatesBuffer;  // Get the output buffer

while (/*loop though your dates*?)
{

output.AddRow();

// Set the various column values e.g.
output.CalendarYear = date.Year

// Increment the date
date = date.AddDays(1);
}

The full script is in the attached sample package where I’ve also added a script destination that does nothing with the data. Attach a data viewer to see what output is generated.

Date Data

From here you can manipulate the data, and pipe it to your dimension table from within the pipeline.

DateSourceSample.zip (27.08 KB)

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, April 16, 2009 7:52:34 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
DateSourceSample.zip (27.08 KB)
# Tuesday, April 07, 2009

This is the second part in a series of data warehouse presentations I’ve been giving. This one concentrates on the physical design of the underlying SQL Server database, some information on the SQL Server Fast Track Data Warehouse and finally a one slide guide to tuning the storage engine performance. The tuning slide is only really a first step and I plan a more in depth session some time in the future.

Also I apologize, this deck is a little word heavy. I prefer more graphics in a presentation but there is a lot of specific guidance to list here.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, April 07, 2009 12:02:55 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, March 17, 2009

I recently gave a presentation at our community event on Dimensional Modeling. It covers the logical design and user model phase of a data warehouse project.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, March 17, 2009 6:17:24 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Tuesday, February 24, 2009

One of the big problems I have with some new customers is the knowledge gap between where they are and where they need to be in order to be successful. Given that people can read much faster than they can receive presented information it makes sense to have a reading list.

I’ve blogged previously some Analysis Services resources but new and better content is appearing all the time. In addition to that article here are my current reading recommendations for anyone planning a BI or Analysis Services project. Know this lot backwards and you should have a good head start.

Everyone

SQL Performance Tuning with Waits and Queues

Analysis Services Performance Guide

Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services

The Data Loading Performance Guide

Report Server Catalog Best Practices

Reporting Services Performance Optimizations

New Data Warehouse Scalability Features in SQL Server 2008

Scaling Up Your Data Warehouse with SQL Server 2008

Architects and Developers

OLAP Design Best Practices for Analysis Services 2005

Best Practices for Data Warehousing with SQL Server 2008

Analysis Services Processing Best Practices

Top 10 SQL Server Integration Services Best Practices

Top 10 Best Practices for Building a Large Scale Relational Data Warehouse

Operations and Support

Storage Top 10 Best Practices

SQL Server Best Practices

IIS Performance Tuning

Resolving Common Connectivity Issues in SQL Server 2005 Analysis Services Connectivity Scenarios

In fact pretty much anything on the sqlcat website is pure gold.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, February 24, 2009 10:15:39 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Monday, February 05, 2007

OK, I need your help... I'm giving a talk at the next SQL Community meeting about SQL Server development in an ISV. In particular the challenge of creating databases that can be installed many times and have to be supported, patched and upgraded. In contrast, enterprise SQL development is relatively straightforward as there is one master schema and code - the live server. All you need to do is ensure you sync with the master before making any changes.

What I would like is any experiences, hindsight, methods and processes you have related to this. Either leave a comment or e-mail me directly using the link at the top of the site. Some of the things I want to cover are:

  • Development, source control and build
  • Visual Studio Team Edition for Database Professionals
  • Installations
  • Patching, fixes and upgrades
  • Analysis Services, Integration Services, Reporting Services
  • End user schema/cube customisation
  • Performance tuning

If I get enough input I'll post the talk online.

 

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, February 05, 2007 11:32:44 AM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Thursday, November 09, 2006

It's been a while since I last blogged for real and you may have noticed the recent entries looking more like a link blog if you subscribe to the feed. This is down to a feature of FeedBurner that allows me to splice my del.icio.us links with these blog entries. The occasional Flickr image will get spliced in too.

So, back to subcube queries... I've been doing some Analysis Services 2005 performance work for a customer. Eventually I want to talk about some sort of a performance process for AS2005 but right now Chris Webb has a pretty good reference on designing effective aggregations. This is just a quick description of what the 11100101010,1010001000000 means in SQL Server Profiler.

Since SQL Profiler now supports Analysis Services you can monitor queries as they execute to find the poorly performing ones. Chances are you will see a Progress Report event similar to Started reading data from the 'XXX' partition closely followed by a Query Subcube event with TextData looking like a string of ones and zeros for example:
'00001000000000000,10111100000011111100,10'
The progress report tells you that data was read directly from the partition and no aggregation was used.

If you turn on the Query Subcube Verbose event then a more complete description is printed but quite often you will receive traces with just the subcube strings as the verbose event is hidden from the default set of options.

The subcube string tells you which dimensions and attributes are involved. Each group of digits, separated by a comma, denotes a dimension and each digit denotes an attribute within that dimension. The digit is '1' if that attribute is part of the subcube and '0' otherwise. Some things to note:

  • These are cube dimensions so the same dimension, e.g. Date, can be represented many times as a role playing dimension, e.g. Ship Date
  • The order is defined by the actual order of dimensions and attributes in the measure group, not alphabetical or any other sort order.

Since the order of dimensions and attributes is not immediately obvious, it's better to write some code to print them out in the correct order. The following prints all the dimensions and attributes of the supplied measure group object in order:

private static void PrintMeasureGroupDimensions(MeasureGroup mg) {
  for (int j = 0; j < mg.Dimensions.Count; ++j) {
    CubeDimension dim = mg.Dimensions[j].CubeDimension;
    Console.WriteLine("DIM:\t{0} ({1})", dim.Name, dim.Attributes.Count);

    for (int k = 0; k < dim.Attributes.Count; ++k) {
      CubeAttribute attr = dim.Attributes[k];
      Console.WriteLine("ATT:\t\t{0}", attr.Attribute.Name);
    }
  }
}

I've attached a piece of sample code that compiles into a console application to either print the entire set of dimensions for a measure group or, if you pass a subcube string, will just print those involved in the query. Execute with no args or '/h' to get some help.

Program.cs.txt (9.39 KB)

Edit: corrected some grammar and reformatted the code.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, November 09, 2006 5:18:23 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Friday, July 14, 2006

If you want to get up to speed with Analysis Services 2005 then the following resources may be of use:

Learning MDX Books

The first book, Fast Track to MDX, is an excellent starter and learning guide. The second, MDX Solutions, contains much more advanced topics.

Analysis Server 2005 Books

The first book is a beginner reference and guide to Analysis Services. The second, Applied Microsoft Analysis Services 2005 provides a more BI focused approach that integrates reporting and analytics. The final book is probably the best as it is very detailed and provides guidance in relation to the Kimball method to creating data warehouses.

Project REAL

Project REAL is a best practices implementation for creating BI applications based on SQL Server 2005.

Webcasts and Online Learning

Kimball Webcasts on Data Warehousing and Business Intelligence
Course 2942: New Features of Microsoft® SQL Server™ 2005 Analysis Services
Course 2943: Updating Your Data ETL Skills to Microsoft® SQL Server™ 2005 Integration Services
Course 2944: Updating Your Reporting Skills to Microsoft® SQL Server™ 2005 Reporting Services

Web Sites and Forums

Analysis Services Homepage
Microsoft Business Intelligence Homepage
Business Intelligence Developer Center at MSDN
Analysis Services MSDN Forum
Reporting Services MSDN Forum

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, July 14, 2006 1:29:46 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback