In my last post I showed a new approach on how to calculate the median in PowerPivot using DAX. In the comments the question was raised whether it is possible to make that calculation universal for all kind of Quantiles like Median, Quartiles, Percentiles, etc. and that’s what this post is about.
Lets analyze the final Median calculation from the last post:
TOPN(
2-MOD([Cnt_Months], 2),
TOPN(
([Cnt_Months] + 1) / 2,
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA],
1))
/
(2-MOD([Cnt_Months], 2))
The Median defines the value in the middle of an ordered set. To get the first half (+1 to handle even and odd sets) of the whole we are using TOPN function:
([Cnt_Months] + 1) / 2,
VALUES(‘Date’[Month]),
[SumSA]),
The important part here is the “divide by 2” to split the set in the middle to start our Median calculation. Instead of dividing by 2 we could also multiply by 0.5 where 0.5 would be the separator for our quantile (in this case the Median). This expression can be made dynamic. For the first Quartile we would use 0.25 for the second Quartile (=Median) we would use 0.5 and for the third Quartile we would use 0.75.
I created a little linked table to demonstrate the dynamic approach:
Quantile | SortOrder | Percentage |
Median | 1 | 0.5000 |
Quartile 1 | 401 | 0.2500 |
Quartile 2 | 402 | 0.5000 |
Quartile 3 | 403 | 0.7500 |
Sextile 1 | 601 | 0.1667 |
Sextile 2 | 602 | 0.3333 |
Sextile 3 | 603 | 0.5000 |
Sextile 4 | 604 | 0.6667 |
Sextile 5 | 605 | 0.8333 |
We also need to add a calculated measure that returns the percentage-value of the currently selected Quantile:
HASONEVALUE(Quantiles[Percentage]),
VALUES(Quantiles[Percentage]))
Now we can change our old Median-calculation to be dynamic by using the measure defined above:
TOPN(
2 – MOD([Cnt_Months], 2),
TOPN(
ROUNDUP(([Cnt_Months] + 1) * [SelectedQuantile], 0),
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA],
1))
/
(2 – MOD([Cnt_Months], 2))
We also need to explicitly add ROUNDUP() as “([Cnt_Months] + 1) * [SelectedQuantile]” may return any decimal places whereas the previous divide by 2 could only result in a x.5 which was rounded up automatically. And well, that’s all we need to change in order to make the calculation universal for all different Quantiles! The rest is the same logic that I already described for Median calculation.
Download Final Model (Office 2013!)
I have a view that has
Create view view_name as
(
Select * from fact_ table join dim1 on fact_ table.id = dim1.id inner join dim2 on dim2.2_id = fact_table.2_id
)
And in tabular model I select these tables and join them in model instead of selecting the whole view. This approach we had to take because all the individual tables process perfectly with partitions. This was not possible with selecting the single view in the model. This method had processing issues even with partitions.
Now for other requirements we got even more complex issue.
Create view view_name1 as
(
Select * from fact_ table join dim1 on fact_ table.id = dim1.id inner join dim2 on dim2.2_id = fact_table.2_id
Union all
Select * from fact_ table join dim1 on fact_ table.id = dim1.id inner join dim2 on dim2.2_id = fact_table.2_id
Union all
…………………………………
)
How can this be achieved by individually selecting tables in the tabular model. I cannot try the method in this link http://www.sqljason.com/2013/02/union-operation-in-dax-queries.html . See how complex the DAX function gets and with multiple joins. If I decide to make two view of the above union query and select them in the model individually. Still it is not acceptable as .As both views take lot of time to process by any means. In real life I have 5 unions queries. There is no partition on date at all.
so I guess all your tables/views are quite big right and you have Trouble processing them because they do not fit in Memory or because it just takes too Long to load the tables?
an Option may be to add all the base-tables individually and link them in the tabular model.
to work around the issue with the UNION ALL you may try this:
base-calculations are usually based on simple aggregations (SUM, COUNT, …)
to get a SUM over all tables that you want to Union you could simply create a calculated measure as
SumUnion:=SUM(fact_table1[value]) + SUM(fact_table2[value]) + SUM(fact_table3[value])
of course you Need to make sure that all your fact-tables are linked to all the lookup/Dimension tables in the same manner
Hi Gerhard
I am new to DAX and used your quartile formula above with my data, however, I am getting an error when it processes the first ‘2-MOD’ expression in the first topn statement. You do not get this?
Paul
=CALCULATE([dispoMonths],TOPN(2 – MOD([Cnt_Months], 2),TOPN(ROUNDUP(([Cnt_Months] + 1) * [SelectedQuantile], 0),VALUES(‘s4′[iy]),[dispoMonths]),[dispoMonths],1))/(2 – MOD([Cnt_Months], 2))
The expression is not valid or appears to be incomplete. Please review and correct the expression.
The following syntax error occurred during parsing: Invalid token, Line 1, Offset 32, –.
Hi Paul,
the calculation itself works fine for me
I guess you have some type in your calculation – maybe you can completely rewrite it using intellisense?
what version of Power Pivot are you using? maybe it is an older one that does not contain the MOD()-function?
-gerhard
Hi Gerhard
I used intelltype and it resolved that issue, however, it is now barfing on the [dispoMonths] column in the VALUES statement. The ‘dispomonths’ column is a calculated column in the Powerpivot window that just takes a number of days from another column and divides it by 30.5 to get the number of months. This then is the column that I want to perform the quartile calculation on. This is equivalent to your SumSA column?
Paul
=CALCULATE([dispoMonths],TOPN(2-MOD([Cnt_Months],2),TOPN(ROUNDUP(([Cnt_Months]+1)*[SelectedQuantile],0),VALUES(‘s4′[iy]),[dispoMonths]),[dispoMonths],1))/(2-MOD([Cnt_Months],2))
Calculation error in measure ‘s4′[quartile]: Column ‘dispoMonths’ cannot be found or may not be used in this expression.
[SumSA] is actually a calculated measure defined as SumSA:=SUM(‘Internet Sales'[SalesAmount]) and not a reference to a column
it represents the value by which the items are sorted by
While for the median your calculation is fine, I can’t verify your quantile measure for quartile 1 and 3. For the year 2006 Excel returns different results:
Excel Quantile SA Months
649.024,47 € =QUANTIL.INKL(H7:H18;0,75) Quartile 3 457.877,69 €
564.065,35 € =QUANTIL.INKL(H7:H18;0,50) Quartile 2 564.065,35 €
479.121,42 € =QUANTIL.INKL(H7:H18;0,25) Quartile 1 653.913,74 €
Even if I switch Quartile 3 and Quartile 1 there remains a significant difference. Interpolation might be different !?
Hi Frank,
I just did some research on this and there are several methods how to calculate quartiles according to Wikipedia:
http://en.wikipedia.org/wiki/Quartile
Method 1 is similar to QUANTIL.INKL whereas Method 2 is similar to QUANTIL.EXKL
taking Example 1 and Example 2 from Wikipedia you will also realize that Excel gives you “wrong” values from Example 2 and Method 2
Actually Excel seems to use Method 3 here
so in the end its up to your definition and the calculation may needs be adopted accordingly
Hi Gerhard
I am very new to DAX and I tried to use your formula, however your example incorporates dimensions (Cnt_months) that I don’t have in my data set. I am also assuming that SumSA is the sum of the values you are attempting to perform the quartile calculation on?
Would it be possible to post an example of your formula using generic references to columns/data that could be applied to anyone’s data?
Paul
Hi Paul,
you will find some more information in my first post about median calculation in DAX: http://wordpress.gbrueckl.at/2013/04/calculating-median-in-powerpivot-using-dax/
here is a more generic version which calculates the median over the ‘Date'[Month] column:
Median SA Months:=CALCULATE([SumSA],
TOPN(
2-MOD(DISTINCTCOUNT(‘Date'[Month]), 2),
TOPN(
(DISTINCTCOUNT(‘Date'[Month]) + 1) / 2,
VALUES(‘Date'[Month]),
[SumSA]),
[SumSA],
1))
/
(2-MOD(DISTINCTCOUNT(‘Date'[Month]), 2))
[SumSa] is simply defined as SUM(‘Facts'[Value])
-gerhard