Applied Basket Analysis in Power Pivot using DAX

Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.

Lets take a look at the initial data model first:Model_Old

First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an inactive relationship to our ‘Internet Sales’ fact table.

After adding the tables the model looks as follows:Model_New

The next and actually last thing to do is to add the following calculated measure:

Sold in same Order :=
CALCULATE (
    COUNTROWS ( 'Internet Sales' ),
    CALCULATETABLE (
        SUMMARIZE (
            'Internet Sales',
            'Internet Sales'[Sales Order Number]
        ),
        ALL ( 'Product' ),
        USERELATIONSHIP ( 'Internet Sales'[ProductKey], 'Filtered Product'[ProductKey] )
    )
)

(formatted using DAX Formatter)

The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.
We could also change ‘Internet Sales’[Sales Order Number] to ‘Internet Sales’[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales’[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!

So what can we finally do with this neat formula?

1) Classic Basket Analysis – “Others also bought …”:
Result_ClassicBasketAnalysis

As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:
Result_ByColor
As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.

2) Basket Analysis Matrix:
What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:
Result_BasketMatrix
Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.
This is what the whole matrix looks like:
Result_BasketMatrixFull
The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.

For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:

BasketMatrix :=
IF (
    MIN ( 'Product'[ProductKey] )
        <> MIN ( 'Filtered Product'[ProductKey] ),
    [Sold in same Order]
)

3) Find Customers that have not bough a common product yet
As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:

Not Sold to same Customer :=
IF (
    NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),
    "Not Sold Yet"
)

The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise  BLANK() which is the default if the third parameter is omitted. That’s the result:
Result_NotSoldToCustomer
Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages? Smile

 

As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.

And that’s it – Basket Analysis made easy using Power Pivot and DAX!

Downloads:

Sample Workbook with all Examples: BasketAnalysis.xlsx

11 thoughts on “Applied Basket Analysis in Power Pivot using DAX

  1. I am buyer for a home improvement retailer in Canada. This technique has given me great insights and a wonderful opportunity to design a cross merchansiding plan for my products throughout the store. I can’t thank you enough for sharing your wisdom.

  2. Hi Gerhard,

    Thank you for this excellent breakdown of DAX and many-to-many relationships!

    I was able to get this example working nicely on test data, however when I attempt to use this technique with 7 million+ rows of sales data I run into performance issues…I am not that familiar with the “Userelationship” and many-to-many techniques and was curious if this hindered performance…

    Thanks,
    Josh

    PS–I am using 64 bit…

    • Hi Josh,
      performance usually depends on the number of e.g. products that you have – how many products do you have?
      7 million+ rows of sales data should be quite ok
      it also depends on the kind of report that you do – the matrix report that i showed above is quite expensive to calculate

      -gerhard

      • Hi Gerhard,

        My products table has about 1,900 rows…

        It is running very slowly in the classic analysis version.

        Thanks,
        Josh

        • I just blew up my fact-table to 4M rows and use customers (19k) instead of products for the calculation and the performance is still OK (<2 seconds)
          an other important aspect is the column that we use within our SUMMARIZE function – in my scenario this is about 2M distinct values but as I said, it still performs OK

          Do you use a COUNTROWS() as the final output or anything else? e.g. DistinctCount or something

          • Hi Gerhard,

            Formula is exactly the same.

            After watching one of Rob’s videos on performance, I think that the issues may be related to slicers. If I utilize a ‘Product Category’ and ‘Model’ slicer for my ‘Filtered_Products’ table to manipulate the data, that is when I run into the worst performance issues.

            Thanks,
            Josh

          • You are right, slicers and filters, especially with multi-select, can cause a huge performance issue
            however, these are not directly related to the formula itself but more the the MDX that the Pivot Table/Excel creates and how this is processed in Power Pivot.
            Would be interesting if you have similar issues if you use Power View and use native DAX to query the model
            apart from that I do not think that there is much we can do here with regards to the formula itself

            -gerhard

    • Hi Wilhelm,
      yes, a workbook with sample data would really help
      it would be even better if you could also provide the expected results for given selections
      just send me the workbook via mail – you can find my email address in the About-page on the top

      -gerhard

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>