Handling SCD2 Dimensions and Facts with PowerPivot

Having worked a lot with Analysis Services Multidimensional Model in the past it has always been a pain when building models on facts and dimensions that are only valid for a given time-range e.g. following a Slowly Changing Dimension Type 2 historization. Opposed to relational engines, Analysis Services Multidimensional Model does not support complex relationships that would create more than one matching row like in terms of SCD2 a join using T-SQL’s BETWEEN. In Multidimensional Models each fact-row has to be linked to exactly one member in a dimension. There are some techniques to mimic BETWEEN-Joins like abusing ManyToMany-Relationships but all of them have several drawbacks and are just workarounds that do not solve the actual problem.

Fortunately PowerPivot (and in the future Analysis Services Tabular Models that are part of SQL Server 2012) offer more flexibility on modeling data and relationships that we can leverage to solve such scenarios.

In this post I will cover a warehouse-scenario where articles are stored in warehouses for a given period of time. The model should allow analysis of the current stock on a daily base. All tables follow a SCD2 historization and shall always display the currently valid records for the selected time.

Our model consists of 4 tables:

Article:

Article_SID Article_BK ArticleGroup VALID_FROM VALID_TO
1 A1 AG1 2012-01-01 2012-01-03
2 A2 AG2 2012-01-01  
3 A1 AG2 2012-01-03  

Warehouse:

Warehouse_SID Warehouse_BK WarehouseOwner_BK VALID_FROM VALID_TO
1 WH1 John 2012-01-01  
2 WH2 Dave 2012-01-01 2012-01-04
3 WH2 John 2012-01-04  

Date:

Contains Date-Values from 2012-01-01 till 2012-01-15

Inventory (Facts):

Article_BK Warehouse_BK Amount_BASE VALID_FROM VALID_TO
A1 WH1 100 2012-01-01 2012-01-03
A1 WH2 50 2012-01-02 2012-01-04
A1 WH1 20 2012-01-01  
A2 WH1 10 2012-01-01 2012-01-04
A2 WH2 70 2012-01-03 2012-01-05
A2 WH2 30   2012-01-10

As you can see our facts are related using the BusinessKey (BK) and are thereby independent of any historical changes within any dimension.

This means that SCD2-changes in dimension have no impact on our fact-table reducing most of the complex ETL that is usually required to propagate those changes to the fact-table to allow in-time analysis

Once you have loaded these tables into PowerPivot and try to create relationships you will get errors as none of the tables can be related as the BKs are not unique in any table. But that’s OK as all tables have to be filtered on the ReferenceDate first to show the structure at selected date.

So, how can we find out which rows are currently active?

First of all we have to create a measure that returns the LastDate in the current time-slice (assuming that we always want to get the most current structures, e.g. 31st of December when doing analysis over a whole year etc.)

Using LASTDATE()-function this can be accomplished very easily:

All other tables have to be filtered on this ReferenceDate to get only the valid rows at this point in time.

I used a combination of COUNTROWS() and FILTER() to create a measure that identifies a value as valid or not:

For our Article-Table:

For our Warehouse-Table:

And also for our Fact-Table:

Using a pivot-table we can see the WarehouseOwner changing over time. The owner of WH2 changes to John with 2012-01-04, as Dave does not own any warehouse by that time anymore, his entry also gets invalidated:

PT_Warehouse_IsValid_thumb[2]

To get the valid Amount at a given ReferenceDate we have to sum only the valid rows for that date:

This calculation is not linked to any other table at the moment but this can be achieved by adding further criterias to the FILTER-function.

Using CONTAINS()-function we can simulate relationships based on the BusinessKeys:

For each table that we want to relate we have to add another CONTAINS()-function to our calculation to mimic a relationship.

Taking the relationship to table ‘Warehouse’ as an example:

VALUES(Warehouse[Warehouse_BK]) returns only rows/BKs within the current context of table Warehouse.

Those rows are further filtered by the ReferenceDate using our [Warehouse_IsValid]-Measure returning only rows that are valid for the selected ReferenceDate.

The two other parameters of the CONTAINS()-function define the columns that are matched.

Doing this for all tables allows us to analyze our current stock-level with the current structures at any given point in time:

PT_Amount_by_Warehouse_thumb[1]

As empty rows are suppressed from pivot-tables by default, we get the desired result when filtered on a specific date:

PT_Amount_by_Warehouse_SingleDate_thumb[2]

I have not tested his approach yet in terms of performance when it comes to bigger volumes of data, this will be part of an upcoming post.

Further I will show how this approach can be extended to support snowflaked schemas as they are very often found in data warehouses.

 

Part2: Handling SCD2 snowflake-schemas with PowerPivot

 

Download Final Model

8 Replies to “Handling SCD2 Dimensions and Facts with PowerPivot”

  1. Instead of making ReferenceDate:=LASTDATE(‘Date'[Date]), is there a way to pick a date that is selected on the UI. ReferenceDate:= (“Date selected by the user”).

  2. Got it. Now I know how to achieve this. Thanks. This really works. Did you do any performance analysis on this for a bigger dataset

  3. Hello Gerhard, I have to say you consistently amaze me with the work that you do. I have a dataset where I am trying to figure out what our payout percentage would be for our partners. I have what I think based on the posts that I have read a slowly changing dimensions table for the payout percentage for any given time period. I have been able to recreate the above in my data set. However, when I try to apply the percentage for a particular week, for a particular product, with a few other variables I cannot seem to calculate the correct percentage. It is adding up the percentages for most of the data set. I want to be able to calculate my past payout at any point for any product and also roll the data up to year etc. I think this is a very interesting problem but it is difficult to describe without the data. For one order there could be three different payouts and this is not directly related to any where else in the data set which may be giving me my problem. I really hope that you would have some time to take a look. I would be happy to send you some sample data if you are willing to provide some much appreciated help. I will send the data when I have cleaned up a bit.

    Kind Regards,

    John Bradley

Leave a Reply

Your email address will not be published. Required fields are marked *

*