Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.

Lets take a look at the initial data model first:

First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an **inactive** relationship to our ‘Internet Sales’ fact table.

After adding the tables the model looks as follows:

The next and actually last thing to do is to add the following calculated measure:

CALCULATE (

COUNTROWS ( ‘Internet Sales’ ),

CALCULATETABLE (

SUMMARIZE (

‘Internet Sales’,

‘Internet Sales’[Sales Order Number]

),

ALL ( ‘Product’ ),

USERELATIONSHIP ( ‘Internet Sales’[ProductKey], ‘Filtered Product’[ProductKey] )

)

)

(formatted using DAX Formatter)

The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.

We could also change ‘Internet Sales'[Sales Order Number] to ‘Internet Sales'[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales'[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!

So what can we finally do with this neat formula?

**1) Classic Basket Analysis – “Others also bought …”:**

As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:

As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.

**2) Basket Analysis Matrix:**

What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:

Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.

This is what the whole matrix looks like:

The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.

For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:

IF (

MIN ( ‘Product’[ProductKey] )

<> MIN ( ‘Filtered Product’[ProductKey] ),

[Sold in same Order]

)

**3) Find Customers that have not bough a common product yet**

As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:

IF (

NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),

“Not Sold Yet”

)

The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise BLANK() which is the default if the third parameter is omitted. That’s the result:

Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages?

As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.

And that’s it – Basket Analysis made easy using Power Pivot and DAX!

**Downloads:**

Sample Workbook with all Examples: BasketAnalysis.xlsx

This is very cool and pretty simple to implement. Thanks so much for posting it!

I am buyer for a home improvement retailer in Canada. This technique has given me great insights and a wonderful opportunity to design a cross merchansiding plan for my products throughout the store. I can’t thank you enough for sharing your wisdom.

Glad to hear it helped you guys!

Hi Gerhard,

Thank you for this excellent breakdown of DAX and many-to-many relationships!

I was able to get this example working nicely on test data, however when I attempt to use this technique with 7 million+ rows of sales data I run into performance issues…I am not that familiar with the “Userelationship” and many-to-many techniques and was curious if this hindered performance…

Thanks,

Josh

PS–I am using 64 bit…

Hi Josh,

performance usually depends on the number of e.g. products that you have – how many products do you have?

7 million+ rows of sales data should be quite ok

it also depends on the kind of report that you do – the matrix report that i showed above is quite expensive to calculate

-gerhard

Hi Gerhard,

My products table has about 1,900 rows…

It is running very slowly in the classic analysis version.

Thanks,

Josh

I just blew up my fact-table to 4M rows and use customers (19k) instead of products for the calculation and the performance is still OK (<2 seconds)

an other important aspect is the column that we use within our SUMMARIZE function - in my scenario this is about 2M distinct values but as I said, it still performs OK

Do you use a COUNTROWS() as the final output or anything else? e.g. DistinctCount or something

Hi Gerhard,

Formula is exactly the same.

After watching one of Rob’s videos on performance, I think that the issues may be related to slicers. If I utilize a ‘Product Category’ and ‘Model’ slicer for my ‘Filtered_Products’ table to manipulate the data, that is when I run into the worst performance issues.

Thanks,

Josh

You are right, slicers and filters, especially with multi-select, can cause a huge performance issue

however, these are not directly related to the formula itself but more the the MDX that the Pivot Table/Excel creates and how this is processed in Power Pivot.

Would be interesting if you have similar issues if you use Power View and use native DAX to query the model

apart from that I do not think that there is much we can do here with regards to the formula itself

-gerhard

Hi I am trying to solve a similar problem, but with no success so far. I have a table with 3 columns([date],[market],[return]) now I want to calculate the correlation coefficient like they do here http://bitwizards.com/blog/november-2012/correlation-coefficient-dax-formula .

Could you help me? If you want I can provide you a sheet with sample data.

Thank you

Hi Wilhelm,

yes, a workbook with sample data would really help

it would be even better if you could also provide the expected results for given selections

just send me the workbook via mail – you can find my email address in the About-page on the top

-gerhard

Thank you for the post; it is very helpful. Other programs like R are able to conduct MBA’s while analyzing predicted sales based on combinations of n products (i.e. product d is most often purchased with products a, c, and f). R is great, but not very user-friendly in end-user application. Can DAX accomplish similar iterations for MBA? Thanks!

Hi Justin,

so you want to select 1-n products and see which other products are most oft purchased together with them?

the formula actually also works for multi-select on the Product dimension.

The only thing you would need to take care of is to exclude all selected Products from the resulting Products similar to what was done in the sample file using [CustomerMatrix] and [BasketMatrix] measures (where it was only done for one Product)

you could probably also convert the names of the Top 3 resulting products into a separate measure if thats what you want?