Consolidation and Intercompany Elimination made easy using PowerPivot and DAX

In my past years as a consultant for Microsoft BI I had to deal with financial models quite a lot. They are usually small in size but very complex in terms of calculations like currency conversion, margin-calculations, benchmarks and so on. Another topic I came across very frequently was consolidation and intercompany eliminations. Recently I had to deal with this topic again and thought of how this could be solved in PowerPivot/DAX.

First of all a little background information on what consolidation and intercompany eliminations are. In bigger companies with several subsidiaries those subsidiaries are usually structured in some way – e.g. by legal entity, by country etc..

Here is a little example of a corporate group called “My Whole Company” with subsidiaries in Germany, France, USA and Canada, structured by geography:

These companies also make business with each other and also with external customers. The French Company and the Canadian Company sell something to the Germany Company, US Company sells goods to the Canadian Company and French Company and so on.

To get a consolidated value for higher nodes (e.g. Europe), transactions between companies within Europe have to be eliminated. Otherwise sales within Europe would be aggregated up even though the goods never left Europe and therefor cannot be added to Europe’s sales.

(I am not a financial guy so I focused on the technical stuff rather then on the business backgrounds. This whole example just demonstrates a general approach to solve this issue technically.)

Our fact-table contains columns for the SenderCompany, ReceiverCompany and Value:

SenderIDReceiverIDValue
FRGER100
CANGER50
GEREXTERNAL70
USACAN30
USAFR10
USAEXTERNAL90
CANUSA50
CANEXTERNAL10

For our Sender and Receiver-Table we can use the same source-table (details on this follow):

SenderIDParentSenderIDName
TOT My Whole Company
EUROPETOTEurope
GEREUROPEGermany Company
FREUROPEFrench Company
NATOTNorth America
USANAUS Company
CANNACanadian Company
EXTERNAL EXTERNAL

Note that we use a parent-child hierarchy here as most financial applications use them to represent hierarchical structures. The parent-child hierarchy is resolved using PATH(), PATHITEM() and LOOKUPVALUE() functions as described here by Alberto Ferrari. The final table is added twice to our model, for Sender and Receiver:

(for simplicity I removed all other columns of the fact-table and their corresponding dimension-tables).

The next step is to implement the business logic for consolidation and intercompany elimination. From the description above we can break this logic down into smaller pieces. To get a consolidated value, we have to:

  1. Select only Senders below the currently selected SenderHierarchy-Node
  2. Select only Receivers, that are NOT below the ReceiverHierarchy-Node that is equal to the currently selected SenderHierarchy-Node
  3. Sum up the remaining fact-rows
Value_SUM:=SUM(Facts[Value])
Value_Internal:=
 CALCULATE(
      [Value_SUM],
      FILTER(
           'Receiver',
           CONTAINS(
                'Sender', 
                'Sender'[SenderID], 
                'Receiver'[ReceiverID])))

The Receiver-table is filtered using the CONTAINS-function. As FILTER operates row-by-row, each ‘Receiver’[ReceiverID] is checked whether its value can also be found in the currently selected SenderIDs. In T-SQL this would be similar to:

SELECT SUM([Value]
 FROM Facts
 INNER JOIN Sender
     ON Facts.SenderID = Sender.SenderID
 INNER JOIN Receiver
     ON Facts.ReceiverID = Receiver.ReceiverID
 WHERE Receiver.ReceiverID IN (SELECT SenderID 
                               FROM Sender 
                               WHERE Level2='Europe')
 AND Sender.Level2 = 'Europe'

Once you understand how the T-SQL, the DAX-calculation should also be clear.

Value_External:=
 CALCULATE(
      [Value_SUM],
      FILTER(
           'Receiver',
           NOT(
                CONTAINS(
                     'Sender', 
                     'Sender'[SenderID], 
                     'Receiver'[ReceiverID]))))

And this is already our final calculation to eliminate intercompany sales and get a consolidated value for any Sender-node we select!

Europe has total sales of 170 of which 100 have been internal (FR to GER) and 70 have been external (GER to EXTERNAL). North America has a total value of 240 of which 80 have been internal (USA to CAN 30 and CAN to USA 50) and 160 have been sold external to either Europe or external customers.

The calculation can be further simplified in terms of usability. You may already realized that we did not refer to our hierarchy within any of our calculation. We always used the leaf-level (SenderID/ReceiverID) to calculated our values. To solve the original problem in a multidimensional model we had to create a Receiver-dimension and select/deselect members based on the current selection in the Sender-hierarchy. This is not necessary anymore as we only use leaf-elements in our calculation. So we do not need a Receiver-hierarchy and further also no Receiver-table in our PowerPivot-model as we already have the ReceiverID in our Facts-table:

Value_External_Facts:=
 CALCULATE(
      [Value_SUM]; 
      FILTER(
           'Facts'; 
           NOT(
                CONTAINS(
                     'Sender'; 
                     'Sender'[SenderID];
                     'Facts'[ReceiverID]))))

There may also be reasons that the user wants to select the Receiver-Node on its own. For example to see sales from European to external customers (EXTERNAL).  To do this you have to adopt the calculation as follows:

Value_Selected:=CALCULATE(
      [Value_SUM];
       FILTER(
           ALL(Receiver); 
           NOT(
                CONTAINS(
                     'Receiver'; 
                     'Receiver'[ReceiverID];
                     'Receiver'[ReceiverID]))))

This calculation may look strange on first sight but does exactly what we need. It removes all filters from the Receiver-table and applies a new filter that is the opposite of the currently existing filter using a combination of NOT() and CONTAINS(). The 3rd parameter of the CONTAINS-function always refers to the row-context of the FILTER-function whereas the 1st and the 2nd parameter are always based on the current query-context.

These are the results:

You could also use the Value_SUM-calculation to get the opposite value – what has been sold from the current Sender-node to any Receiver located in Europe. Our Canadian Company for example has sold 50 to GER which is below Europe and 60 to companies outside Europe – in this case 50 to USA and 10 to EXTERNAL.

As you can see using DAX makes these kind of calculations very easy compared to multidimensional models. An other important aspect is that financial models are usually very small and therefor should fit into memory very easily. Also the independence of hierarchies is an important aspect here.

You can also download the Excel-Workbook with the sample-calculations here:

Consolidation.xlsx

UPDATE:

In Part2 I will show how to handle multiple parallel hierarchies.

Resolving Many to Many relationships leveraging DAX Cross Table Filtering

If you ever had to deal with many-to-many relationships in PowerPivot then I am quite sure that you came across the blog-post Many-to-Many relationships in PowerPivot by Marco Russo and PowerPivot and Many to Many Relationships by Alberto Ferrari. Both posts describe how this issue can be solved using DAX and provide examples and also very good background information on that topic.

I recently had to struggle with many-to-many relationships again at one of my customers who complained that many-to-many relationships are just too complex. So I rethought of the problem and searched for alternatives. During my investigations I  also found Jeffrey Wang’s blog-post The Logic behind the Magic of DAX Cross Table Filtering again – a must read blog for all people interested in BISM (tabular and multidimensional), DAX or MDX. In the middle of the post he describes the single operations the CALCULATE-function performs:

Calculate function performs the following operations:
1. Create a new filter context by cloning the existing one.
2. Move current rows in the row context to the new filter context one by one and apply blocking semantics against all previous tables.
3. Evaluate each setfilter argument in the old filter context and then add setfilter tables to the new filter context one by one and apply blocking semantics against all tables that exist in the new filter context before the first setfilter table is added.
4. Evaluate the first argument in the newly constructed filter context.

(the single steps are described in more details in his post)

Important for us is the fact, that you can pass tables as arguments to the CALCULATE-function and those tables are automatically filtered by the current context. Even more important is that this filtering works in both directions of a relationships. So adding a table that has an exiting relationship with any table in the current context is similar to a JOIN in terms of SQL. Filters applied to the newly joined tables are also propagated through all other tables, regardless of the direction of the relationship.

In his Question #1 Jeffrey counts the number of Subcategories for a given Product as an example (which is always 1 as there is a 1:n relationship between DimSubCategory and DimProduct). To get the correct value he uses the following calculation to extend the filter context by DimProduct and thereby also filtering DimProductSubcategory indirectly:

CALCULATE(COUNTROWS(DimProductSubcategory), DimProduct)

 

Knowing that we can use CALCULATE to resolve 1:n relationships in both directions we can also use this approach to solve m:n relationships pretty easy!

Alberto Ferrari uses an example where the facts are related to Individuals. Those Individuals can be assigned to 1 or more Targets. This mapping is done using a bridge table to model the many-to-many relationship:

M2M_AlbertoFerrari

As you can see there is no "real" many-to-many relationship in the model as it has already been split up into a bridge-table using 1:n and n:1 relationships. Adding the information from above to this model we come up with a pretty easy DAX calculations which resolves the many-to-many relationship. Here is a little example where we simply count the rows in our Audience table:

RowCount:=COUNTROWS('Audience')

This RowCount is currently not filtered by table Targets as there is no chain of 1:n relationships between Targets and Audience. Only filters applied to directly related tables (Individuals, Time, Calendar and Networks)  are considered when the calculation is evaluated.

By wrapping a CALCULATE-function around our calculation and adding the tables that participate in the many-to-many relationship as parameters we explicitly extend the filter context for our calculation. As filters on those "extended tables" also impact the current filter-context, the value for our Targets also changes according to the Individuals belonging to the current Target:

RowCount_M2M:=CALCULATE(
     [RowCount], 
     'Individuals', 
     'TargetsForIndividuals', 
     'Targets')

Finally, to resolve the many-to-many relationship for our calculation all we have to do is to explicitly add all tables of the many-to-many relationship to the filter-context of our calculation by using the CALCULATE-function. The rest is done automatically by DAX’s Cross Table Filtering Logic!

The calculation can be further extended to only apply this logic when there is a filter on table Targets, otherwise we do not have to resolve the many-to-many relationship:

RowCount_M2M:=IF(ISCROSSFILTERED('Targets'[Target]),
CALCULATE(
     [RowCount], 
     'Individuals', 
     'TargetsForIndividuals'),
[RowCount])

Doing this ensures that the more complex calculation is only executed when there is a filter on table Targets. Further we already know that Targets is already part of the current context and therefore does not have to be added again for our calculation.

In the end we come up with a solution where we only have to add the intermediate table (Individuals) and the bridge table (TargetsForIndividuals) to our CALCULATE-functions to resolve the many-to-many relationship – pretty nice, isn’t it?

I think this approach should also be very easy to understand for people that are familiar with SQL and relational databases and just switched to tabular modeling.

The PowerPivot workbook with samples for all approaches can be downloaded here:

ISO 8601 Week in DAX

I recently built a PowerPiovt model where I had to display weeks according to ISO 8601. As I came across this frequently in the past when I developed SQL Server databases (prior to 2008) I was aware that ISO weeks can also be calculated using some complex logics. When I discussed this with some of my colleagues during a training, one of the attendees told me, that this can be solved in Excel very easily using Excels WEEKNUM()-function. This function takes to arguments:

Serial_num is a date within the week. Dates should be entered by using the DATE function, or as results of other formulas or functions. For example, use DATE(2008,5,23) for the 23rd day of May, 2008. Problems can occur if dates are entered as text.

Return_type is a number that determines on which day the week begins. The default is 1.

Return_type Week Begins
1 Week begins on Sunday. Weekdays are numbered 1 through 7.
2 Week begins on Monday. Weekdays are numbered 1 through 7.

According to Excels Online-Help the second parameter only supports values 1 and 2. But this is not 100% true. You can also use the value 21 as second parameter, and guess what – it now returns weeknumbers according to ISO 8601. As many DAX-functions are derived from Excel (including WEEKNUM()-function), this also works with DAX’s WEEKNUM()-function!

So by creating a calculated column as

=WEEKNUM([Date], 21)

you get the ISO week number for the current [Date].

If you also want to calculate “ISO years” to build clean hierarchies you may want to use this formula in an other calculated column:

=IF([ISOWeek]<5 && [CalendarWeek] > 50;
     [Year]+1;
     IF([ISOWeek]>50 && [CalendarWeek]<5; 
          [Year]-1; 
          [Year]))

 

I think we can learn a lot from experienced Excel-users here as most tricks also work in PowerPivot/DAX!

If you want to do the same in Power Query already, here is some sample code (original source)

let
    StartDate        = #date (2009,1,1),
    EndDate          = #date (2024,12,31),
    ListOfDates      = List.Dates(StartDate, DurationDays, #duration(1, 0, 0, 0)),
    DurationDays     = Duration.Days (EndDate - StartDate) + 1,
    TableOfDates     = Table.FromList(ListOfDates, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    DateColText      = Table.RenameColumns(TableOfDates,{{"Column1", "Date"}}),
    DateCol          = Table.TransformColumnTypes(DateColText,{{"Date", type date}}),
    WeekdayCol       = Table.AddColumn(DateCol, "Weekday Number", each Date.DayOfWeek([Date], Day.Monday) + 1,Int64.Type),
    IsoWeekYearCol   = Table.AddColumn(WeekdayCol, "Iso Week Year", each Date.Year(Date.AddDays([Date], 4-[Weekday Number])), Int64.Type),
    IsoWeekNrCol     = Table.AddColumn(IsoWeekYearCol, "Iso Week Nummer", each (Duration.Days(Date.AddDays([Date], 4-[Weekday Number]) - #date([Iso Week Year], 1 , 7 - Date.DayOfWeek(#date([Iso Week Year],1,4), Day.Monday)))/7)+1, Int64.Type),
    IsoWeekdayExtCol = Table.AddColumn(IsoWeekNrCol, "Iso Weekday Extended", each Text.From([Iso Week Year]) & "-W" & Text.End("0" &     Text.From([Iso Week Nummer]), 2) & "-"&  Text.From([Weekday Number]))
in
    IsoWeekdayExtCol

Handling SCD2 snowflake-schemas with PowerPivot

In my previous post I wrote about how to use PowerPivot on top of a relational database that is modeled as a star-schema with slowly changing dimension type 2 (SCD2) historization. In this post I will advance this approach by adding other tables tables that are not related to our facts but have a relationship to our current dimension tables based on an other attribute.

I will take the example from my previous post and extend it by a new table called "WarehouseOwner". This table looks as follows:

WarehouseOwner

Warehouse
Owner_SID
Warehouse
Owner_BK
Manager VALID_FROM VALID_TO
1 John Peter 2012-01-01 2012-01-03
2 Dave Peter 2012-01-01  
3 John Dan 2012-01-03  

This table holds information about the warehouse owner and its manager. In this example the manager of John changed from Peter to Dan with the 3rd of January. Remember that we have a WarehouseOwner-Column also in our Warehouse-table.

Logically this new table can be related to our Warehouse-table but as PowerPivot only supports 1:n relationships this is natively not supported and we have to use DAX to simulate this relationship. Most of the steps are similar to the steps that we have already done for our other SCD2-tables. First of all we create a ‘IsValid’-measure to identify rows that are valid for the selected referencedate:

WarehouseOwner_IsValid:=COUNTROWS(
FILTER(
     WarehouseOwner; 
     (WarehouseOwner[VALID_FROM] <= 'Date'[ReferenceDate] 
        || ISBLANK(WarehouseOwner[VALID_FROM]))
     && (WarehouseOwner[VALID_TO] > 'Date'[ReferenceDate] 
        || ISBLANK(WarehouseOwner[VALID_TO]))))>0

The next step would be to mimic a relationship between our Inventory-table and our WarehouseOwner-table. But opposed to our old star-schema where each table was directly related to the Inventory-table, this is not possible with our new snowflake-schema anymore. So we have to relate our WarehouseOwner-table with our facts using the Warehouse-table. Without any relationship the result would look as follows:

WarehouseOwner_unrelated

This result is similar to a crossjoin of our Managers and our Warehouse-Owners regardless of time and validity. But we need to analyze which Manager was responsible for which warehouse owners and respectively their warehouse at any given point in time. To relate our tables we have to identify the combinations that are valid for the selected referencedate. If you reconsider these requirements, you will realize that this is similar to the relationships that we had to define to link our other tables to the inventory-table.  Again we can use CONTAINTS()-function in combination with FILTER() and a IsValid-Measure but in this case we use them in our Warehouse-table to extend our Warehouse_IsValid-Measure:

Warehouse_IsValid_New:=COUNTROWS(
FILTER(
     Warehouse; 
     (Warehouse[VALID_FROM] <= 'Date'[ReferenceDate] 
		|| ISBLANK(Warehouse[VALID_FROM]))
     && (Warehouse[VALID_TO] > 'Date'[ReferenceDate] 
		|| ISBLANK(Warehouse[VALID_TO]))
     && CONTAINS(
		FILTER(
			VALUES(WarehouseOwner[WarehouseOwner_BK]);
			WarehouseOwner[WarehouseOwner_IsValid]); 
		WarehouseOwner[WarehouseOwner_BK]; 
		Warehouse[WarehouseOwner_BK])))>0

Using our new Warehouse_IsValid-measure in a PivotTable reveals which combinations are valid for a given time:

WarehouseOwner_IsValid

Actually this is already all we have to do here to get the desired result, we do not have to touch any other formula in our model! By replacing our Warehouse_IsValid-Measure with our previous Amount-Measure we can see the final result:

WarehouseOwner_related

When Dan becomes the Manager of John with the 3rd of January also the facts that are related to warehouses belonging to John are associated with Dan. As with the 4th of January John also becomes the owner of WH2 (now owning both/all warehouses) Dave and also its manager Peter do not have any facts related to them anymore.

As you can see this approach can be very easily extended by new tables even if the they are modeled in a snowflake-schema. All you have to do is to modify the IsValid-Measure of the main-table that relates to the new table.

 

Stay tuned for the next post where I will analyze the performance of this solution with large amounts of data!

Download Final Model

Handling SCD2 Dimensions and Facts with PowerPivot

Having worked a lot with Analysis Services Multidimensional Model in the past it has always been a pain when building models on facts and dimensions that are only valid for a given time-range e.g. following a Slowly Changing Dimension Type 2 historization. Opposed to relational engines, Analysis Services Multidimensional Model does not support complex relationships that would create more than one matching row like in terms of SCD2 a join using T-SQL’s BETWEEN. In Multidimensional Models each fact-row has to be linked to exactly one member in a dimension. There are some techniques to mimic BETWEEN-Joins like abusing ManyToMany-Relationships but all of them have several drawbacks and are just workarounds that do not solve the actual problem.

Fortunately PowerPivot (and in the future Analysis Services Tabular Models that are part of SQL Server 2012) offer more flexibility on modeling data and relationships that we can leverage to solve such scenarios.

In this post I will cover a warehouse-scenario where articles are stored in warehouses for a given period of time. The model should allow analysis of the current stock on a daily base. All tables follow a SCD2 historization and shall always display the currently valid records for the selected time.

Our model consists of 4 tables:

Article:

Article_SID Article_BK ArticleGroup VALID_FROM VALID_TO
1 A1 AG1 2012-01-01 2012-01-03
2 A2 AG2 2012-01-01  
3 A1 AG2 2012-01-03  

Warehouse:

Warehouse_SID Warehouse_BK WarehouseOwner_BK VALID_FROM VALID_TO
1 WH1 John 2012-01-01  
2 WH2 Dave 2012-01-01 2012-01-04
3 WH2 John 2012-01-04  

Date:

Contains Date-Values from 2012-01-01 till 2012-01-15

Inventory (Facts):

Article_BK Warehouse_BK Amount_BASE VALID_FROM VALID_TO
A1 WH1 100 2012-01-01 2012-01-03
A1 WH2 50 2012-01-02 2012-01-04
A1 WH1 20 2012-01-01  
A2 WH1 10 2012-01-01 2012-01-04
A2 WH2 70 2012-01-03 2012-01-05
A2 WH2 30   2012-01-10

As you can see our facts are related using the BusinessKey (BK) and are thereby independent of any historical changes within any dimension.

This means that SCD2-changes in dimension have no impact on our fact-table reducing most of the complex ETL that is usually required to propagate those changes to the fact-table to allow in-time analysis

Once you have loaded these tables into PowerPivot and try to create relationships you will get errors as none of the tables can be related as the BKs are not unique in any table. But that’s OK as all tables have to be filtered on the ReferenceDate first to show the structure at selected date.

So, how can we find out which rows are currently active?

First of all we have to create a measure that returns the LastDate in the current time-slice (assuming that we always want to get the most current structures, e.g. 31st of December when doing analysis over a whole year etc.)

Using LASTDATE()-function this can be accomplished very easily:

ReferenceDate:=LASTDATE('Date'[Date]) 

All other tables have to be filtered on this ReferenceDate to get only the valid rows at this point in time.

I used a combination of COUNTROWS() and FILTER() to create a measure that identifies a value as valid or not:

For our Article-Table:

Article_IsValid:=COUNTROWS(
FILTER(
    Article;
    (Article[VALID_FROM] <= 'Date'[ReferenceDate]
        || ISBLANK(Article[VALID_FROM]))
    && (Article[VALID_TO] > 'Date'[ReferenceDate]
        || ISBLANK(Article[VALID_TO]))))>0

For our Warehouse-Table:

Warehouse_IsValid:=COUNTROWS( 
FILTER(
    Warehouse;
    (Warehouse[VALID_FROM] <= 'Date'[ReferenceDate]
        || ISBLANK(Warehouse[VALID_FROM]))
    && (Warehouse[VALID_TO] > 'Date'[ReferenceDate]
        || ISBLANK(Warehouse[VALID_TO]))))>0

And also for our Fact-Table:

Inventory_IsValid:=COUNTROWS( 
FILTER(
    Inventory;
    (Inventory[VALID_FROM] <= 'Date'[ReferenceDate]
        || ISBLANK(Inventory[VALID_FROM]))
    && (Inventory[VALID_TO] > 'Date'[ReferenceDate]
        || ISBLANK(Inventory[VALID_TO]))))>0

Using a pivot-table we can see the WarehouseOwner changing over time. The owner of WH2 changes to John with 2012-01-04, as Dave does not own any warehouse by that time anymore, his entry also gets invalidated:

PT_Warehouse_IsValid_thumb[2]

To get the valid Amount at a given ReferenceDate we have to sum only the valid rows for that date:

Amount:=CALCULATE( 
SUM( 
      [Amount_BASE]); 
      FILTER(Inventory; Inventory[Inventory_IsValid]))

This calculation is not linked to any other table at the moment but this can be achieved by adding further criterias to the FILTER-function.

Using CONTAINS()-function we can simulate relationships based on the BusinessKeys:

Amount:=CALCULATE( 
SUM(
    [Amount_BASE]); 
    FILTER(
        Inventory;
        Inventory[Inventory_IsValid] 
        && CONTAINS(
            FILTER(VALUES(Article[Article_BK]); Article[Article_IsValid]);
            Article[Article_BK];
            Inventory[Article_BK]) 
        && CONTAINS(
            FILTER(VALUES(Warehouse[Warehouse_BK]);Warehouse[Warehouse_IsValid]);
            Warehouse[Warehouse_BK];
            Inventory[Warehouse_BK])))

For each table that we want to relate we have to add another CONTAINS()-function to our calculation to mimic a relationship.

Taking the relationship to table ‘Warehouse’ as an example:

VALUES(Warehouse[Warehouse_BK]) returns only rows/BKs within the current context of table Warehouse.

Those rows are further filtered by the ReferenceDate using our [Warehouse_IsValid]-Measure returning only rows that are valid for the selected ReferenceDate.

The two other parameters of the CONTAINS()-function define the columns that are matched.

Doing this for all tables allows us to analyze our current stock-level with the current structures at any given point in time:

PT_Amount_by_Warehouse_thumb[1]

As empty rows are suppressed from pivot-tables by default, we get the desired result when filtered on a specific date:

PT_Amount_by_Warehouse_SingleDate_thumb[2]

I have not tested his approach yet in terms of performance when it comes to bigger volumes of data, this will be part of an upcoming post.

Further I will show how this approach can be extended to support snowflaked schemas as they are very often found in data warehouses.

 

Part2: Handling SCD2 snowflake-schemas with PowerPivot

 

Download Final Model