# Monday, December 22, 2008
« Downtown Citi | Main | Nautical Crossroad »

There are no hard and fast rules but the goal is to reduce the time taken to extract data from a source system and reduce the amount of work you have to do with the extracted data. The numbers quoted here are the ones I use as a starting point but you need to measure to determine the best values.

Don't do an incremental extract if:

  • There isn't much data in the source table (less than 100k rows)
  • There is enough change in the source table to require that you read most of it each time (for example if more than half the rows change between extracts)
  • The data in the source table is used for periodic snapshots (for example a balance sheet) and you need to track how a table changes at particular points in time

Do an incremental extract if:

  • There is a lot of data in the source table
  • Rows are only ever added to the source table (i.e. rows are not updated)
  • You need to track each and every change to a source row
  • The source data is updated a number of times before being closed and once closed is never updated again (also known as an accumulating snapshot)

In general dimension tables match the first set of rules and are not extracted incrementally where as fact tables normally match the second set of rules.

by This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Monday, December 22, 2008 6:51:04 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
Related posts:
Do you have great business intelligence skills?
Time to Change the Business Cards
Integration Services Design Principals
Using Integration Services to populate a Date Dimension
Physical Data Warehouse Design
Dimensional Modeling
Comments are closed.