Applied Basket Analysis in Power Pivot using DAX

Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.

Lets take a look at the initial data model first:Model_Old

First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an inactive relationship to our ‘Internet Sales’ fact table.

After adding the tables the model looks as follows:Model_New

The next and actually last thing to do is to add the following calculated measure:

Sold in same Order :=
CALCULATE (
COUNTROWS ( ‘Internet Sales’ ),
CALCULATETABLE (
SUMMARIZE (
‘Internet Sales’,
‘Internet Sales’[Sales Order Number]
),
ALL ( ‘Product’ ),
USERELATIONSHIP ( ‘Internet Sales’[ProductKey], ‘Filtered Product’[ProductKey] )
)
)

(formatted using DAX Formatter)

The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.
We could also change ‘Internet Sales'[Sales Order Number] to ‘Internet Sales'[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales'[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!

So what can we finally do with this neat formula?

1) Classic Basket Analysis – “Others also bought …”:
Result_ClassicBasketAnalysis

As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:
Result_ByColor
As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.

2) Basket Analysis Matrix:
What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:
Result_BasketMatrix
Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.
This is what the whole matrix looks like:
Result_BasketMatrixFull
The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.

For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:

BasketMatrix :=
IF (
MIN ( ‘Product’[ProductKey] )
<> MIN ( ‘Filtered Product’[ProductKey] ),
[Sold in same Order]
)

3) Find Customers that have not bough a common product yet
As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:

Not Sold to same Customer :=
IF (
NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),
“Not Sold Yet”
)

The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise  BLANK() which is the default if the third parameter is omitted. That’s the result:
Result_NotSoldToCustomer
Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages? Smile

 

As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.

And that’s it – Basket Analysis made easy using Power Pivot and DAX!

Downloads:

Sample Excel Workbook with all Examples: BasketAnalysis.xlsx
Sample PowerBI Desktop file with all Examples (DAX only, no Visuals): BasketAnalysis.pbix

25 Replies to “Applied Basket Analysis in Power Pivot using DAX”

  1. I am buyer for a home improvement retailer in Canada. This technique has given me great insights and a wonderful opportunity to design a cross merchansiding plan for my products throughout the store. I can’t thank you enough for sharing your wisdom.

  2. Hi Gerhard,

    Thank you for this excellent breakdown of DAX and many-to-many relationships!

    I was able to get this example working nicely on test data, however when I attempt to use this technique with 7 million+ rows of sales data I run into performance issues…I am not that familiar with the “Userelationship” and many-to-many techniques and was curious if this hindered performance…

    Thanks,
    Josh

    PS–I am using 64 bit…

    • Hi Josh,
      performance usually depends on the number of e.g. products that you have – how many products do you have?
      7 million+ rows of sales data should be quite ok
      it also depends on the kind of report that you do – the matrix report that i showed above is quite expensive to calculate

      -gerhard

      • Hi Gerhard,

        My products table has about 1,900 rows…

        It is running very slowly in the classic analysis version.

        Thanks,
        Josh

        • I just blew up my fact-table to 4M rows and use customers (19k) instead of products for the calculation and the performance is still OK (<2 seconds)
          an other important aspect is the column that we use within our SUMMARIZE function - in my scenario this is about 2M distinct values but as I said, it still performs OK

          Do you use a COUNTROWS() as the final output or anything else? e.g. DistinctCount or something

          • Hi Gerhard,

            Formula is exactly the same.

            After watching one of Rob’s videos on performance, I think that the issues may be related to slicers. If I utilize a ‘Product Category’ and ‘Model’ slicer for my ‘Filtered_Products’ table to manipulate the data, that is when I run into the worst performance issues.

            Thanks,
            Josh

            • You are right, slicers and filters, especially with multi-select, can cause a huge performance issue
              however, these are not directly related to the formula itself but more the the MDX that the Pivot Table/Excel creates and how this is processed in Power Pivot.
              Would be interesting if you have similar issues if you use Power View and use native DAX to query the model
              apart from that I do not think that there is much we can do here with regards to the formula itself

              -gerhard

    • Hi Wilhelm,
      yes, a workbook with sample data would really help
      it would be even better if you could also provide the expected results for given selections
      just send me the workbook via mail – you can find my email address in the About-page on the top

      -gerhard

  3. Thank you for the post; it is very helpful. Other programs like R are able to conduct MBA’s while analyzing predicted sales based on combinations of n products (i.e. product d is most often purchased with products a, c, and f). R is great, but not very user-friendly in end-user application. Can DAX accomplish similar iterations for MBA? Thanks!

    • Hi Justin,

      so you want to select 1-n products and see which other products are most oft purchased together with them?
      the formula actually also works for multi-select on the Product dimension.
      The only thing you would need to take care of is to exclude all selected Products from the resulting Products similar to what was done in the sample file using [CustomerMatrix] and [BasketMatrix] measures (where it was only done for one Product)
      you could probably also convert the names of the Top 3 resulting products into a separate measure if thats what you want?

  4. Hi Gerhard,

    How can I find
    Total amount of bundled products.
    Sold to same Sales Amount ?

    Thaks.

    • Hi Emin

      what exactly do you mean by “Total amount of bundled products?”
      the selected product very likely has been placed in multiple baskets/bought by multiple customers – which total amount would you like to calculate and for which customers/baskets?
      can you also elaborate on “Sold to same “Sales Amount””?

      thanks,
      -gerhard

  5. Hello again

    for example

    Hydration Pack – 70 oz> DISTINCTCOUNT (Customer ID) = 1.000
    Mountain Bikes> DISTINCTCOUNT (Customer ID) = 500
    Customers with Both Products> 350

    Hydration Pack – 70 oz> Sum (Sales Amount) = $ 2,500
    Mountain Bikes> Sum (Sales Amount) = $ 1,750
    Customers with Both Products> 900 $ ?

    • I dont actually know how this relates bot Basket Analysis but this DAX Code should solve your issue – though, it is limited to the selection of two products/categories – one by the original Product table and the other one through the Filtered Product table:
      Bought Both:=CALCULATE(
      DISTINCTCOUNT(‘Internet Sales'[CustomerKey]),
      CALCULATETABLE(
      SUMMARIZE(‘Internet Sales’, ‘Internet Sales'[CustomerKey]),
      ALL(‘Product’),
      USERELATIONSHIP(‘Internet Sales'[ProductKey], ‘Filtered Product'[ProductKey])))

      the concept is actually quite similar and should work with any kind of measure

      -gerhard

  6. Hi Gerhard,

    your post is really very cool and well explained. I just have a question regarding the 3rd question (customer has not bought…). What is meant by the expression [Sum SA]? Is it a measure?

    Thank you for your great post!

  7. Good Day!

    If I’d select 1 product from ‘Product’ and 2 product from ‘FilterProduct’ is it possible to see value where all 3 products are in the same order?

    thank you

  8. Thanks for the post. Was extremely easy to follow. I was interested in showing summary results.

    I created a post on Stack Overflow and thought I’d reach out see if you had any thoughts on how to expand on your work.

    https://stackoverflow.com/questions/53546290/dax-advanced-product-grouping-segmentation-question
    (gbrueckl: changed link as it was probably wrong – original link: https://stackoverflow.com/users/2607686/user2607686?tab=questions)

    Great job!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

*