Relationships on columns with non-unique values and how to tune them

If you follow my blog frequently, you may have realized that in many of them I cover scenarios where I need to create relationships on columns with non-unique values. For example when handling SCD2 facts and dimensions or dealing with parallel hierarchies. In all the examples I use a combination of FILTER– and CONTAINS-function to create those non-unique relationships between tables. Recently I ran into performance issues using this approach and was thinking about on how to tune those kind of "relationships".

First of all I was taking a look at the DAX Query Plan using SQL Server Profiler to get some more information on what’s going on inside xVelocity and DAX. I used an calculation from my previous post as an example:

Value_Sender:=
CALCULATE([Value_SUM],
FILTER(
     'Facts',
     CONTAINS(
          'Sender',
          'Sender'[SenderID],
          'Facts'[SenderID]))
)

We had to use this calculation as SenderID was not unique in our Sender-table (and of course also not in our Facts-table). When we tried to create a relationship we got the error

Error

"The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values."

This makes absolutely sense as tabular models natively only support 1:n / n:1 relationships and therefore at least one column has to contain only unique vales.

The calculation from above using FILTER and CONTAINS still allowed us to link our tables on those non-unique column values. The drawback of the calculation is that it may not perform very well as it cannot use the full power of xVelocity that heavily relies on predefined relationships between columns and tables. But in our case we create the relationship on-the-fly not using xVelocity’s full potential. Further, by using CONTAINS, which is not a native xVelocity-function, the engine has to callback to DAX formula engine as we can see in the profiler:

VertiPaq SE Query / 0 – VertiPaq Scan (8 times of which 4 are of subclass Internal)

VertiPaq_SE_Query_CONTAINS

[CallbackDataID(….)] is always an indicator that the xVelocity engine could not resolve the query on its own but had to call functions from the DAX formula engine as Jeffrey Wang described here. This in the end results in poor query performance (at least compared to a pure xVelocity query).

DAX Query Plan / 1 – DAX VertiPaq Logical Plan (just for sake of completeness)

QueryPlan_CONTAINS

 

So the key to tune those kind of relationships is to relate the tables at design time to get full xVelocity performance at query time. As the tables cannot be related directly, we have to add an intermediate table containing only unique values – in our case unique SenderIDs. This table can then be used to create 1:n relationships with our Sender and our Facts table

SenderID
TOT
EUROPE
GER
FR
NA
USA
CAN
EXTERNAL
TOT_2
JV
HOLD

If you are using a SQL database as datasource you can simply create a view using SELECT DISTINCT or write the select statement directly in PowerPivot.

Once the table has been added we can create relationships as shown below:

Relationships

This design is similar to a many-to-many design with the only difference that we combined our intermediate table and our bridge table into one single table. The rest is similar to resolving many-to-many relationship as I described here. So once we have created these relationships we can change our calculations as follows:

Value_Sender_Bridge:=
CALCULATE(
     [Value_SUM],
     'SenderBridge',
     'Sender'
)

This calculation can be resolved by the xVelocity engine without any callbacks to DAX formula engine. Taking a look at the profiler trace proves this:

VertiPaq SE Query / 0 – VertiPaq Scan (4 times of which 2 are of subclass Internal)

VertiPaq_SE_Query_CROSSTABLE

None of the Veripaq SE queries used [CallbackDataID(….)] or any other complex function. Only joins have been used which can be handled by the xVelocity engine very easily:

DAX Query Plan / 1 – DAX VertiPaq Logical Plan (just for sake of completeness)

QueryPlan_CROSSTABLE

Also the logical plan could be drastically simplified containing only Sum_Vertipaq and Scan_Vertipaq operators telling us that the query can be executed in pure Vertiscan mode.

Details on Vertipaq operators and Vertiscan mode can be found here.

 

 

I have not run a lot of tests on bigger tables yet where I could have compared the overall performance but this may follow in a later post. With this post I just wanted to show how to link tables on columns containing non-unique values.

It would be great if tabular models would support those kind of relationships out of the box and that you can create them like any other relationships using drag&drop in the diagram view. This would have made many of my previous workshops at customers much easier Smile.

Resolving Many to Many relationships leveraging DAX Cross Table Filtering

If you ever had to deal with many-to-many relationships in PowerPivot then I am quite sure that you came across the blog-post Many-to-Many relationships in PowerPivot by Marco Russo and PowerPivot and Many to Many Relationships by Alberto Ferrari. Both posts describe how this issue can be solved using DAX and provide examples and also very good background information on that topic.

I recently had to struggle with many-to-many relationships again at one of my customers who complained that many-to-many relationships are just too complex. So I rethought of the problem and searched for alternatives. During my investigations I  also found Jeffrey Wang’s blog-post The Logic behind the Magic of DAX Cross Table Filtering again – a must read blog for all people interested in BISM (tabular and multidimensional), DAX or MDX. In the middle of the post he describes the single operations the CALCULATE-function performs:

Calculate function performs the following operations:
1. Create a new filter context by cloning the existing one.
2. Move current rows in the row context to the new filter context one by one and apply blocking semantics against all previous tables.
3. Evaluate each setfilter argument in the old filter context and then add setfilter tables to the new filter context one by one and apply blocking semantics against all tables that exist in the new filter context before the first setfilter table is added.
4. Evaluate the first argument in the newly constructed filter context.

(the single steps are described in more details in his post)

Important for us is the fact, that you can pass tables as arguments to the CALCULATE-function and those tables are automatically filtered by the current context. Even more important is that this filtering works in both directions of a relationships. So adding a table that has an exiting relationship with any table in the current context is similar to a JOIN in terms of SQL. Filters applied to the newly joined tables are also propagated through all other tables, regardless of the direction of the relationship.

In his Question #1 Jeffrey counts the number of Subcategories for a given Product as an example (which is always 1 as there is a 1:n relationship between DimSubCategory and DimProduct). To get the correct value he uses the following calculation to extend the filter context by DimProduct and thereby also filtering DimProductSubcategory indirectly:

CALCULATE(COUNTROWS(DimProductSubcategory), DimProduct)

 

Knowing that we can use CALCULATE to resolve 1:n relationships in both directions we can also use this approach to solve m:n relationships pretty easy!

Alberto Ferrari uses an example where the facts are related to Individuals. Those Individuals can be assigned to 1 or more Targets. This mapping is done using a bridge table to model the many-to-many relationship:

M2M_AlbertoFerrari

As you can see there is no "real" many-to-many relationship in the model as it has already been split up into a bridge-table using 1:n and n:1 relationships. Adding the information from above to this model we come up with a pretty easy DAX calculations which resolves the many-to-many relationship. Here is a little example where we simply count the rows in our Audience table:

RowCount:=COUNTROWS('Audience')

This RowCount is currently not filtered by table Targets as there is no chain of 1:n relationships between Targets and Audience. Only filters applied to directly related tables (Individuals, Time, Calendar and Networks)  are considered when the calculation is evaluated.

By wrapping a CALCULATE-function around our calculation and adding the tables that participate in the many-to-many relationship as parameters we explicitly extend the filter context for our calculation. As filters on those "extended tables" also impact the current filter-context, the value for our Targets also changes according to the Individuals belonging to the current Target:

RowCount_M2M:=CALCULATE(
     [RowCount], 
     'Individuals', 
     'TargetsForIndividuals', 
     'Targets')

Finally, to resolve the many-to-many relationship for our calculation all we have to do is to explicitly add all tables of the many-to-many relationship to the filter-context of our calculation by using the CALCULATE-function. The rest is done automatically by DAX’s Cross Table Filtering Logic!

The calculation can be further extended to only apply this logic when there is a filter on table Targets, otherwise we do not have to resolve the many-to-many relationship:

RowCount_M2M:=IF(ISCROSSFILTERED('Targets'[Target]),
CALCULATE(
     [RowCount], 
     'Individuals', 
     'TargetsForIndividuals'),
[RowCount])

Doing this ensures that the more complex calculation is only executed when there is a filter on table Targets. Further we already know that Targets is already part of the current context and therefore does not have to be added again for our calculation.

In the end we come up with a solution where we only have to add the intermediate table (Individuals) and the bridge table (TargetsForIndividuals) to our CALCULATE-functions to resolve the many-to-many relationship – pretty nice, isn’t it?

I think this approach should also be very easy to understand for people that are familiar with SQL and relational databases and just switched to tabular modeling.

The PowerPivot workbook with samples for all approaches can be downloaded here: