Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.
Lets take a look at the initial data model first:
First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an inactive relationship to our ‘Internet Sales’ fact table.
After adding the tables the model looks as follows:
The next and actually last thing to do is to add the following calculated measure:
CALCULATE (
COUNTROWS ( ‘Internet Sales’ ),
CALCULATETABLE (
SUMMARIZE (
‘Internet Sales’,
‘Internet Sales’[Sales Order Number]
),
ALL ( ‘Product’ ),
USERELATIONSHIP ( ‘Internet Sales’[ProductKey], ‘Filtered Product’[ProductKey] )
)
)
(formatted using DAX Formatter)
The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.
We could also change ‘Internet Sales'[Sales Order Number] to ‘Internet Sales'[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales'[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!
So what can we finally do with this neat formula?
1) Classic Basket Analysis – “Others also bought …”:
As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:
As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.
2) Basket Analysis Matrix:
What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:
Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.
This is what the whole matrix looks like:
The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.
For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:
IF (
MIN ( ‘Product’[ProductKey] )
<> MIN ( ‘Filtered Product’[ProductKey] ),
[Sold in same Order]
)
3) Find Customers that have not bough a common product yet
As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:
IF (
NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),
“Not Sold Yet”
)
The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise BLANK() which is the default if the third parameter is omitted. That’s the result:
Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages?
As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.
And that’s it – Basket Analysis made easy using Power Pivot and DAX!
Downloads:
Sample Excel Workbook with all Examples: BasketAnalysis.xlsx
Sample PowerBI Desktop file with all Examples (DAX only, no Visuals): BasketAnalysis.pbix
This is very cool and pretty simple to implement. Thanks so much for posting it!
I am buyer for a home improvement retailer in Canada. This technique has given me great insights and a wonderful opportunity to design a cross merchansiding plan for my products throughout the store. I can’t thank you enough for sharing your wisdom.
Glad to hear it helped you guys!
Hi Gerhard,
Thank you for this excellent breakdown of DAX and many-to-many relationships!
I was able to get this example working nicely on test data, however when I attempt to use this technique with 7 million+ rows of sales data I run into performance issues…I am not that familiar with the “Userelationship” and many-to-many techniques and was curious if this hindered performance…
Thanks,
Josh
PS–I am using 64 bit…
Hi Josh,
performance usually depends on the number of e.g. products that you have – how many products do you have?
7 million+ rows of sales data should be quite ok
it also depends on the kind of report that you do – the matrix report that i showed above is quite expensive to calculate
-gerhard
Hi Gerhard,
My products table has about 1,900 rows…
It is running very slowly in the classic analysis version.
Thanks,
Josh
I just blew up my fact-table to 4M rows and use customers (19k) instead of products for the calculation and the performance is still OK (<2 seconds)
an other important aspect is the column that we use within our SUMMARIZE function - in my scenario this is about 2M distinct values but as I said, it still performs OK
Do you use a COUNTROWS() as the final output or anything else? e.g. DistinctCount or something
Hi Gerhard,
Formula is exactly the same.
After watching one of Rob’s videos on performance, I think that the issues may be related to slicers. If I utilize a ‘Product Category’ and ‘Model’ slicer for my ‘Filtered_Products’ table to manipulate the data, that is when I run into the worst performance issues.
Thanks,
Josh
You are right, slicers and filters, especially with multi-select, can cause a huge performance issue
however, these are not directly related to the formula itself but more the the MDX that the Pivot Table/Excel creates and how this is processed in Power Pivot.
Would be interesting if you have similar issues if you use Power View and use native DAX to query the model
apart from that I do not think that there is much we can do here with regards to the formula itself
-gerhard
Hi I am trying to solve a similar problem, but with no success so far. I have a table with 3 columns([date],[market],[return]) now I want to calculate the correlation coefficient like they do here http://bitwizards.com/blog/november-2012/correlation-coefficient-dax-formula .
Could you help me? If you want I can provide you a sheet with sample data.
Thank you
Hi Wilhelm,
yes, a workbook with sample data would really help
it would be even better if you could also provide the expected results for given selections
just send me the workbook via mail – you can find my email address in the About-page on the top
-gerhard
Thank you for the post; it is very helpful. Other programs like R are able to conduct MBA’s while analyzing predicted sales based on combinations of n products (i.e. product d is most often purchased with products a, c, and f). R is great, but not very user-friendly in end-user application. Can DAX accomplish similar iterations for MBA? Thanks!
Hi Justin,
so you want to select 1-n products and see which other products are most oft purchased together with them?
the formula actually also works for multi-select on the Product dimension.
The only thing you would need to take care of is to exclude all selected Products from the resulting Products similar to what was done in the sample file using [CustomerMatrix] and [BasketMatrix] measures (where it was only done for one Product)
you could probably also convert the names of the Top 3 resulting products into a separate measure if thats what you want?
Hi Gerhard,
How can I find
Total amount of bundled products.
Sold to same Sales Amount ?
Thaks.
Hi Emin
what exactly do you mean by “Total amount of bundled products?”
the selected product very likely has been placed in multiple baskets/bought by multiple customers – which total amount would you like to calculate and for which customers/baskets?
can you also elaborate on “Sold to same “Sales Amount””?
thanks,
-gerhard
Hello again
for example
Hydration Pack – 70 oz> DISTINCTCOUNT (Customer ID) = 1.000
Mountain Bikes> DISTINCTCOUNT (Customer ID) = 500
Customers with Both Products> 350
Hydration Pack – 70 oz> Sum (Sales Amount) = $ 2,500
Mountain Bikes> Sum (Sales Amount) = $ 1,750
Customers with Both Products> 900 $ ?
I dont actually know how this relates bot Basket Analysis but this DAX Code should solve your issue – though, it is limited to the selection of two products/categories – one by the original Product table and the other one through the Filtered Product table:
Bought Both:=CALCULATE(
DISTINCTCOUNT(‘Internet Sales'[CustomerKey]),
CALCULATETABLE(
SUMMARIZE(‘Internet Sales’, ‘Internet Sales'[CustomerKey]),
ALL(‘Product’),
USERELATIONSHIP(‘Internet Sales'[ProductKey], ‘Filtered Product'[ProductKey])))
the concept is actually quite similar and should work with any kind of measure
-gerhard
Hi Gerhard,
your post is really very cool and well explained. I just have a question regarding the 3rd question (customer has not bought…). What is meant by the expression [Sum SA]? Is it a measure?
Thank you for your great post!
yes, [Sum SA] is a measure and is basically defined as =SUM(‘Internet Sales'[SalesAmount])
regards,
-gerhard
Good Day!
If I’d select 1 product from ‘Product’ and 2 product from ‘FilterProduct’ is it possible to see value where all 3 products are in the same order?
thank you
Hi mikhail,
this goes very much in the direction of replacing the standard logic of a slicer combining the values using OR (bought product A or product B) to an AND (bought product A and product B). This scenario is for example described here: https://www.sqlbi.com/articles/apply-and-logic-to-multiple-selection-in-dax-slicer/
I am sure you can use a very similar logic. For your specific case I think you do not even need to use two product-tables – one would be sufficient
-gerhard
Thank you!
and for your posts too ?
Thanks for the post. Was extremely easy to follow. I was interested in showing summary results.
I created a post on Stack Overflow and thought I’d reach out see if you had any thoughts on how to expand on your work.
https://stackoverflow.com/questions/53546290/dax-advanced-product-grouping-segmentation-question
(gbrueckl: changed link as it was probably wrong – original link: https://stackoverflow.com/users/2607686/user2607686?tab=questions)
Great job!!!
Thanks for your post. It was super easy to follow and very helpful!
I posted a question expanding on your work and thought I’d reach out to get your thoughts.
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/6ac8ce0e-b017-45d8-82f3-87d43bea1274/dax-advanced-product-groupingsegmentation-question?forum=sqlanalysisservices
Thanks and good work!
I will reply directly on StackOverflow/MSDN once I have a solution. Should definitely be doable!