Restoring a SSAS Tabular Model to Power Pivot

It is a very common scenario to create a SSAS Tabular Model out of an Power Pivot Model contained in an Excel workbook. Microsoft even created an wizard (or actually a separate Visual Studio project) that supports you doing this. Even further, this process is also a major part of Microsoft’s strategy to cover Personal BI, Team BI and Corporate BI within one technology being xVelocity. This all works just fine but there may also be scenarios where you need to do it the other way round – converting a Tabular model to Power Pivot. Several use-cases come into my mind but I am sure that the most important one is to making data available offline for e.g. sales people to take it with them on their every day work. And in this blog post I will show how this can be done!

But before taking a closer look into how this can be accomplished, lets first see how the import from Power Pivot to SSAS Tabular works. To do this start SQL Server Profiler and connect to your tabular instance. Then create a new Tabular project in Visual Studio based on an existing Power Pivot workbook. At this point you will notice a lot of events happening on our SSAS Tabular server. The most important event for us is “Command End” with the EventSubclass “9 – restore”:
RestoreFromPowerPivot

SSAS actually restores a backup from a “Model.abf” backup file which is located in our project directory that we just created:
BackupExtractedFromPowerPivot

So far so good – but where does this file come from?

Well, the origin of the file has to be our Excel workbook that we imported. Knowing that all new office formats ending with “x” (.xlsx, .docx, …) are basically ZIP files, we can inspect our original Excel workbook by simply rename it to “.zip”. This allows us to browse the Excel file structure:
ExcelUnzipped

We will find a folder called “xl” which contains a sub-folder called “model”. This folder contains one item called “item.data”. If you take a closer look at the file size you may realize that both, the “Model.abf” file that we restored and the “item.data” file from our Excel workbook have the exact same size:
FileProperties

A Coincidence? Not really!

What happens behind the scenes when you import a Power Pivot model into SSAS Tabular is that this “item.data” file gets copied into your project directory and is renamed to “Model.abf” and then restored to the SSAS Tabular workspace instance by using an standard database restore.

Having this information probably makes you think: If it works in one direction, why wouldn’t it also work the other way round? And it does!

So here are the steps that you need to do in order to restore your SSAS Tabular backup into an Excel Power Pivot workbook:

  1. Create a backup of your SSAS Tabular database and rename it to “item.data”
  2. Create an empty Excel workbook and add a simple linked table to the Excel data model (which is actually Power Pivot).
    This is necessary to tell Excel that the workbook contains a Power Pivot model which has to be loaded once the file is opened.
  3. Close the Excel workbook and rename it from “MyFile.xlsx” to “MyFile.xlsx.zip”
  4. Open the .zip-file in Windows Explorer and locate the “\xl\model\”-folder
  5. Replace the “item.data” file with the file that you created in step 1.
  6. Rename the .zip-file back to “MyFile.xlsx”
  7. Open the Excel Workbook
  8. Voilá! You can now work with the data model as with any other Power Pivot model!

I tested this with a SSAS Tabular backup from SQL Server 2012 SP1 being restored to the streamed version of Excel from Office 365 with the latest version of Power Pivot. I assume that it also works with older versions but have not tested all combinations yet.

There are also some features that will not work, for example roles. If your Tabular database contains roles you will not be able to use this approach. Excel will complain that the Power Pivot model is broken. However, other Tabular features like partitions actually work with the little limitation that you cannot change them later on in the Power Pivot model or process them separately:
PowerPivotPartitions
Another thing to note here is that only up to 3 partitions are allowed, otherwise you will get the same error as for roles. I think this is related to the limitation of 3 partitions for SQL Server Analysis Services Standard Edition as Chris Webb described here.

Besides these obvious features there are also some other cool things that you can do in Tabular which are not possible in Power Pivot. Most (or actually all) of them are accessible only by using BIDS Helper – a great THANK YOU to the developers of BIDS Helper at this point!
BIDS Helper enables you to add classical multidimensional features also to Tabular models which is not possible using standard Visual Studio only. Those include:

  • DisplayFolders
  • Translations (metadata only)
  • Actions

I tested it for DisplayFolders and Actions and both are working also in Power Pivot after the backup was restored and I further assume that all the other things will also work just fine.
Simply keep in mind that Power Pivot is basically a fully featured Analysis Services instance running within Excel!

For my (and your) convenience I also created a little PowerShell script that does all the work:

  1. # Load the assembly with the ZipFile class
  2. [System.Reflection.Assembly]::LoadWithPartialName("System.IO.Compression.FileSystem") | Out-Null
  3. # Load the assembly to access Analysis Services
  4. [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices") | Out-Null
  5. # Also install "Analysis Services PowerShell" according to http://technet.microsoft.com/en-us/library/hh213141.aspx
  6.  
  7. # INPUT-Variables, change these to match your environment
  8. $rootFolder = "D:\Test_PowerShell\"
  9. $emptyExcelFile = $rootFolder + "EmptyExcel.xlsx"
  10. $ssasServerName = "localhost\TAB2012"
  11. $ssasDatabaseName = "AdventureWorks"
  12.  
  13. # internal variables
  14. $newExcelFile = $rootFolder + $ssasDatabaseName + ".xlsx"
  15. $newExcelFileZip = $newExcelFile + ".zip"
  16. $unzipFolder = $rootFolder + "TEMP_ExcelUnzipped"
  17. $backupFile = $rootFolder + $ssasDatabaseName + ".abf"
  18. $itemDestination = $unzipFolder + "\xl\model\item.data"
  19.  
  20. # Copy the empty Excel file and rename it to ".zip"
  21. Copy-Item -Path $emptyExcelFile -Destination $newExcelFileZip
  22.  
  23. # Unzip the file using the ZipFile class
  24. [System.IO.Compression.ZipFile]::ExtractToDirectory($newExcelFileZip, $unzipFolder)
  25.  
  26. # Create a backup of the SSAS Tabular database
  27. Backup-ASDatabase -Server $ssasServerName -Name $ssasDatabaseName -BackupFile $backupFile -AllowOverwrite -ApplyCompression
  28.  
  29. # Copy the backup-file to our extracted Excel folder structure
  30. Copy-Item -Path $backupFile -Destination $itemDestination -Force
  31.  
  32. # Check if the target file exists and delete it
  33. if (Test-Path -Path $newExcelFile) { Remove-Item -Path $newExcelFile }
  34.  
  35. # Zip the folder-structure again using the ZipFile class and rename it to ".xlsx"
  36. [System.IO.Compression.ZipFile]::CreateFromDirectory($unzipFolder, $newExcelFile)
  37.  
  38. # Cleanup the unecessary files
  39. Remove-Item -Path $unzipFolder -Recurse
  40. Remove-Item -Path $backupFile
  41. Remove-Item -Path $newExcelFileZip

The last thing to mention here is that I don’t know if this is officially supported in any way by Microsoft – actually I am pretty sure it is not – so watch out what you are doing and don’t complain if something is not working as expected.

Applied Basket Analysis in Power Pivot using DAX

Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.

Lets take a look at the initial data model first:Model_Old

First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an inactive relationship to our ‘Internet Sales’ fact table.

After adding the tables the model looks as follows:Model_New

The next and actually last thing to do is to add the following calculated measure:

Sold in same Order :=
CALCULATE (
COUNTROWS ( ‘Internet Sales’ ),
CALCULATETABLE (
SUMMARIZE (
‘Internet Sales’,
‘Internet Sales’[Sales Order Number]
),
ALL ( ‘Product’ ),
USERELATIONSHIP ( ‘Internet Sales’[ProductKey], ‘Filtered Product’[ProductKey] )
)
)

(formatted using DAX Formatter)

The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.
We could also change ‘Internet Sales'[Sales Order Number] to ‘Internet Sales'[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales'[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!

So what can we finally do with this neat formula?

1) Classic Basket Analysis – “Others also bought …”:
Result_ClassicBasketAnalysis

As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:
Result_ByColor
As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.

2) Basket Analysis Matrix:
What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:
Result_BasketMatrix
Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.
This is what the whole matrix looks like:
Result_BasketMatrixFull
The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.

For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:

BasketMatrix :=
IF (
MIN ( ‘Product’[ProductKey] )
<> MIN ( ‘Filtered Product’[ProductKey] ),
[Sold in same Order]
)

3) Find Customers that have not bough a common product yet
As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:

Not Sold to same Customer :=
IF (
NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),
“Not Sold Yet”
)

The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise  BLANK() which is the default if the third parameter is omitted. That’s the result:
Result_NotSoldToCustomer
Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages? Smile

 

As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.

And that’s it – Basket Analysis made easy using Power Pivot and DAX!

Downloads:

Sample Excel Workbook with all Examples: BasketAnalysis.xlsx
Sample PowerBI Desktop file with all Examples (DAX only, no Visuals): BasketAnalysis.pbix

Optimizing Columnar Storage for Measures

First of all I have to thank Marco Russo for his blog post on Optimizing High Cardinality Columns in Vertipaq and also his great session at SQL PASS Rally Nordic in Stockholm last year which taught me a lot about columnar storage in general. I highly recommend everyone to read the two mentioned resources before continuing here. Most of the ideas presented in this post are based on these concepts and require at least basic knowledge in columnar storage.

When writing one of my recent posts I ended up with a Power Pivot model with roughly 40M rows. It contained internet logs from Wikipedia, how often someone clicked on a given page per month and how many bytes got downloaded. As you can imagine those values can vary very heavily, especially the amount of bytes downloaded. So in the Power Pivot model we end up having a lot of distinct values in our column that we use for our measure. As you know from Marcos posts, the allocated memory and therefore also the  performance of columnar storage systems is directly related to the number of distinct values in a column – the more the worse. Marco already described an approach to split up a single column with a lot of distinct values into several columns with less distinct values to optimize storage. These concepts can also be used on columns that contain measures or numeric values in general. Splitting numeric values is quite easy, assuming your values range from 1 to 1,000,000 you can split this column into two by dividing the value by 1000 and using MOD 1000 for the second column. Instead of one column with the value 123,456 you end up with two columns with the values 123 and 456. In terms of storage this means that instead of 1,000,000 distinct values we only need to store 2 x 1,000 distinct values. Nothing new so far.

The trick is to combine those columns again at query time to get the original results as before the split. For some aggregations like SUM this is pretty straight forward, others are a bit more tricky. Though, in general the formulas are not really very complex and can be adopted very easily to handle any number of columns:

Aggregation DAX Formula
Value_SUM1 =SUMX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_SUM2 =SUM ( ‘1M_Rows_splitted'[Value_1000] ) * 1000
    + SUM ( ‘1M_Rows_splitted'[Value_1] )
Value_MAX =MAXX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_MIN =MINX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_COUNT =COUNTX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_DISTINCTCOUNT =COUNTROWS (
    SUMMARIZE (
        ‘1M_Rows_splitted’,
        ‘1M_Rows_splitted'[Value_1000],
        ‘1M_Rows_splitted'[Value_1]))

As you can see you can still mimic most kind of aggregation even if the [Value]-column is split up.

Though, don’t exaggerate splitting your columns – too many may be a bit inconvenient to handle and may neglect the effect resulting in worse performance. Marco already showed that you can get a reduction of up to 90% in size, during my simple tests I came up with about the same numbers. Though, it very much depends on the number of distinct values that you actually have in your column!

I would not recommend to always use this approach for all your measure column – no, definitely not! First check how many distinct values your data/measures contain and decide afterwards. For 1 million distinct values it is probably worth it, for 10,000 you may reconsider using this approach. Most important here is to test this pattern with your own data, data model and queries! Test it in terms of size and of course also in terms of performance. It may be faster to split up columns but it may also be slower and it may be also different for each query that you execute against the tabular model / Power Pivot. Again, test with your own data, data model and queries to get representative results! 

Here is a little test that you may run on your own to test this behavior. Simple create the following Power Query using M, load the result into Power Pivot and save the workbook. It basically creates a table with 1 million distinct values (0 to 999,999) and splits this column up into two. You can just copy the workbook, remove the last step “Remove Columns” and save it again to get the “original” workbook and Power Pivot model.

let
    List1 = List.Numbers(0, 1000000),

    TableFromList = Table.FromList(List1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    RenamedColumns = Table.RenameColumns(TableFromList,{{"Column1", "Value"}}),
    ChangedType1 = Table.TransformColumnTypes(RenamedColumns,{{"Value", type number}}),
    InsertedCustom = Table.AddColumn(ChangedType1, "Value_1000", each Number.RoundDown([Value] / 1000)),
    InsertedCustom1 = Table.AddColumn(InsertedCustom, "Value_1", each Number.Mod([Value], 1000)),
    ChangedType = Table.TransformColumnTypes(InsertedCustom1,{{"Value_1000", type number}, {"Value_1", type number}}),
    RemovedColumns = Table.RemoveColumns(ChangedType,{"Value"})
in
    RemovedColumns

This is what I ended up with when you compare those two workbooks:

  # distinct values Final Size
single column 1,000,000 14.4 MB
two columns 2 x 1,000 1.4 MB

We also get a reduction in size of 90%! Though, this is a special scenario …
In the real world, taking my previous scenario with the Wikipedia data as an example, I ended up with a reduction to 50%, still very good though. But as you can see the reduction factor varies very much.

Downloads:

Sample workbook with Power Query: 1M_Rows_splitted.xlsx

Reporting Services MDX Field List and Using Measures on rows

When creating a Reporting Services report on top of an Analysis Services cube using the wizard it automatically creates a Field for each column in your MDX query. Those fields can then be used in your report. For reports based on a relational source the definition of these fields is quite simple, it is the same as the the column name of the originating query. For MDX queries this is very different. If you ever checked the definition of an automatically generated MDX field you will see a lengthy XML snippet instead:
FieldsList

The XMLs may look like these:

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Level"
       UniqueName="[Product].[Subcategory].[Subcategory]" />

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Measure"
       UniqueName="[Measures].[Internet Sales Amount]" />

As you can see those two are quite different in terms of xsi:type and UniqueName. The xsi:type “Level” refers to a dimension level whereas “Measure” refers to a measure. Depending on the type of field, different properties are available within the report:
FieldProperties

For example the property BackgroundColor is only populated for fields of type “Measure” whereas the property UniqueName is only populated for fields of type “Level”. Measure properties are tied to the CELL PROPERTIES in your MDX query and Level properties are tied to DIMENSION PROPERTIES:

SELECT
NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS,
NON EMPTY { ([Product].[Subcategory].[Subcategory].ALLMEMBERS ) }
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS
FROM [Adventure Works]
CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

If we remove MEMBER_UNIQUE_NAME from the DIMENSION PROPERTIES we would no longer be able to use Fields!Subcategory.UniqueName in our SSRS expressions, or to be more precise it would simply always return NULL (or NOTHING in terms of Visual Basic). The same of course is also true for the CELL PROPERTIES.

So far this is nothing really new but there are some more things about the fields of MDX queries. There is a third xsi:type called “MemberProperty” which allows you to query member properties without having to define separate measures within your query:

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="MemberProperty"
       LevelUniqueName="[Product].[Subcategory].[Subcategory]"
       PropertyName="Category" />

Once we add a member property to our MDX query SSRS also automatically creates this field for us. NOTE, this is only possible by manually modifying the MDX!

SELECT
NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS,
NON EMPTY { ([Product].[Subcategory].[Subcategory].ALLMEMBERS ) }
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME,
    [Product].[Subcategory].[Subcategory].[Category] ON ROWS
FROM [Adventure Works]
CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

To get the [Category] which is associated to a given [Subcategory] you would usually need to create a separate measure like this:

WITH
MEMBER [Measures].[Category] AS (
[Product].[Subcategory].Properties( "Category" )
)
SELECT

This has the very bad drawback that using the WITH MEMBER clause disables the formula engine cache for the whole query what may result in worse query performance. So you may consider using DIMENSION PROPERTIES instead of a custom Measure next time.

 

There is another very nice “feature” that is also related to the field list. If you ever had the requirement to create a parameter to allow the user to select which measure he wants to see in the report you probably came across this blog post by Chris Webb or this blog post by Rob Kerr. As you know by then, SSRS requires you to put the Measures-dimension on columns, otherwise the query is not valid. This is because the number of Measures is not considered to be dynamic (opposed to e.g. Customers) which allows SSRS to create a static field list. This makes sense as SSRS was originally designed for relational reporting and a table always has a fixed number of columns which are similar to fields in the final SSRS dataset. Using Measures on columns is the way how SSRS enforces this.

As we are all smart SSRS and MDX developers and we know what we are doing we can trick SSRS here. All we need to do is to write a custom MDX query using the expression builder – do not use or even open the Query Designer at this point otherwise your query may get overwritten!
Query_Measures

SSRS also automatically creates the fields for us, but this time the fields are not defined correctly. It creates one field with a very cryptic name and the following XML definition:

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Measure"
       UniqueName="[Measures].[MeasuresLevel]" />

As you can see SSRS thinks that this field is of type “Measure” but it actually is a “Level”. After changing this little thing we can access all field properties that are unique to Level-fields like <Field>.UniqueName

So this is the final MDX query and the associated XML field definition:

SELECT
{} ON 0,
[Measures].members
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON 1
FROM [Adventure Works]

Ensure that you have defined the necessary DIMENSION PROPERTIES here otherwise they will not be available/populated within the report!

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Level"
       UniqueName="[Measures].[MeasuresLevel]" />

In order to make use of this approach in a report parameter we further need to create calculated fields for our parameter label and parameter value:
Fields_Measures
The definition of the parameter is straight forward then:
ParameterDefinition

You can not only use this approach to populate a parameter but you can also use it to crossjoin Measures on rows with any other hierarchy. This way you can avoid writing complex MDX just to work around this nasty SSRS limitation.

Downloads:

Sample SSRS report: CustomFieldList.rdl

Error-handling in Power Query

Data is the daily bread-and-butter for any analyst. In order to provide good results you also need good data. Sometimes this data is very well prepared beforehand and you can use it as it is but it is also very common that you need to prepare and transform the data on your own. To do this Microsoft has introduced Power Query (on tool of the Power BI suite). Power Query can be used to extract, transform and load data into Excel and/or Power Pivot directly.

When using any data you usually know what the data looks like and what to expect from certain columns – e.g. a value is delivered as a number, a text contains exactly 4 characters, etc.
Though, sometimes this does not apply for all rows of that dataset and your transformation logics may cause errors because of that. In order to avoid this and still have a clean data load you need to handle those errors. Power Query offers several options to this which I will elaborate in this post.

This is the sample data I will use for the following samples:

A B C
1 4 AXI23
2 5 TZ560
NA 6 UP945

we will perform simple transformations and type casts on this table to generate some errors:

Prepare_ChangeType

Error-handling on row-level

This is the easiest way of handling errors. Whenever a transformation causes an error, we can simply remove the whole row from the result set:

RowLevel_RemoveErrors

This will generate the following result table:

A B C
1 4 AX312
2 5 TZ560

As you can see, the third row was simply removed. This is the easiest way on how to remove errors from your result set but of course this may not be what you want as those removed rows may contain other important information in other columns! Assume you want to calculate the SUM over all values in column B. As we removed the third row we also removed a value from column B and the SUM is not the same as without the error-handling (9 vs. 15)!

 

Error-handling on cell-level

As we now know that column A may result in an error, we can handle this error during our transformation steps. As “NA” could not be converted to a number we see Error as result for that column. Clicking on it gives use some more details on the error itself. To handle this error we need to create a new calculated column where we first check if the value can be converted to a number and otherwise return a default numeric value:

RowLevel_RemoveErrors_CalcColumn

The M-function that we use is “try <expressions to try> otherwise <default if error>” which is very similar to a try-catch block in C#. If the expression causes an error, the default will be used instead. Details on the try-expression can be found in the Microsoft Power Query for Excel Formula Language Specification (PDF) which can be found here and is a must read for everyone interested in the M language.

CellLevel_HandleError

We could further replace our column A by the new column A_cleaned to hide this error handling transformation.

A B C A_cleaned
1 4 AXI23 1
2 5 TZ560 2
NA 6 UP945 0

 

Error-handling on cell-level with error details

There may also be cases where it is OK to have one of this errors but you need/want to display the cause of the error so that a power user may correct the source data beforehand. Again we can use the try-function, but this time without the otherwise-clause. This will return a record-object for each row:

RowLevel_ShowErrors_CalcColumn

RowLevel_ShowErrors_Result1

After expanding the A_Try column and also the A_Try.Error column we will get all available information on the error:

RowLevel_ShowErrors_Result2

A B C A_Try.HasError A_Try.Value A_Try.Error.Reason A_Try.Error.Message A_Try.Error.Detail
1 4 AXI23 FALSE 1
2 5 TZ560 FALSE 2
6 UP945 TRUE DataFormat.Error Could not convert to Number. NA

As you can see we get quite a lot of columns here. We could e.g. use A_Try.HasError to filter out error rows (similar to error-handling on row-level) or we could use it in a calculated column to mimic error-handling on cell-level. What you want to do with all the information is up to you, but in case you don’t need it you should remove all unnecessary columns.

 

Downloads:

Power Query Error Handling Workbook: Power Query Error Handling.xlsx

Trigger Cube-Processing from Excel using Cube Actions

Recently I faced the requirement to give specific users of my Analysis Services Multidimensional cube the opportunity to process the cube whenever they want. There are several ways to achieve this:
– start a SQL Agent that processes the cube
– create some kind of WebService that does the processing
– create some custom .NET application that does the processing
– run some script (PowerShell, Command Line, ascmd.exe, …)

 


NOTE:
The post describes a solution which requires SSAS Server Admin rights. If you are interested in a solution which triggers an asynchronous processing without requiring SSAS Server Admin Rights please scroll down to the very bottom and download VS_Solution.zip
It basically runs the same script but in a separate thread in the context of the service account so the calling user does not need to have any specific rights on the server.


From a usability point-of-view none of the above is really convenient as all options involve some kind of external process or application. In this post I will show how to process a cube using cube actions which can be triggered from Excel Pivot Tables natively. So this is what we want to achieve:

Excel_Action

 

This requires several steps:
1) create custom .Net assembly
2) add that custom .Net assembly to your Analysis Services database
3) add an action to your cube that calls the assembly

 

Creating a Custom .Net assembly:

First of all we need to create a new “Visual C#” project of type “Class Library”. To work with Analysis Services we need to add two references:
VS_References

“Microsoft.AnalysisServices” refers to Microsoft.AnalysisServices.dll which  can be found in the shared features folder of your installation (default is c:Program FilesMicrosoft SQL Server110SDKAssemblies)

“msmgdsrv” refers to msmgdsrv.dll which can found be in the OLAPbin-folder of your SSAS instance (default is c:Program FilesMicrosoft SQL ServerMSAS11.MSSQLSERVEROLAPbin)

Once these references are added, we can start to create our processing code:

using System;
using AdomdServer = Microsoft.AnalysisServices.AdomdServer;
using Microsoft.AnalysisServices;
using System.Data;

namespace ASSP_Processing
{
    public class Processing
    {
        public const string LoggingPrefix = “ASSP.ProcessObject: “;

        public enum ResultHandling
        {
            Datatable = 0,
            Exception = 1,
            NULL = 2
        }

        public static DataTable ProcessObject(string cubeName, ResultHandling resultHandling)
        {
            DataTable ret = new DataTable();
            ret.Columns.Add(“Result”);

            Server server = null;

            try
            {
                server = new Server();

                //connect to the current session…
                server.Connect(“*”);

                AdomdServer.Context.CheckCancelled(); //could be a bit long running, so allow user to cancel

                Database db = server.Databases.GetByName(AdomdServer.Context.CurrentDatabaseName);

                string objectToProcessName = “#N/A”;
                ProcessableMajorObject objectToProcess = db;

                AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Database <“ + db.Name + “> found!”);
                objectToProcessName = “DB[“ + db.Name + “]”;

                if (!string.IsNullOrEmpty(cubeName))
                {
                    Cube cube = db.Cubes.GetByName(cubeName);

                    if (cube != null)
                    {
                        objectToProcess = cube;
                        AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Cube <“ + cubeName + “> found!”);
                        objectToProcessName = objectToProcessName + ” > Cube[“ + cubeName + “]”;
                    }
                }

                if (objectToProcess != null)
                {
                    AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Processing Object “ + objectToProcessName + ” …”);

                    objectToProcess.Process(ProcessType.ProcessFull);

                    ret.Rows.Add(new object[] { “S U C C E S S:    Object “ + objectToProcessName + ” successfully processed!” });
                    AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Finished Processing Object “ + objectToProcessName + “!”);
                }
            }
            finally
            {
                try
                {
                    if (server != null)
                        server.Disconnect();
                }
                catch { }
            }

            // if processing was successful a row has been added beforehand
            if (ret.Rows.Count == 0)
            {
                ret.Rows.Add(new object[] { “F A I L U R E:    Error while processing an object!” });
            }

            switch (resultHandling)
            {
                case ResultHandling.Datatable:
                    return ret;

                case ResultHandling.Exception:
                    throw new Exception(Environment.NewLine + Environment.NewLine + Environment.NewLine + ret.Rows[0][0].ToString());

                case ResultHandling.NULL:
                    return null;
            }

            return null;
        }

        public static DataTable ProcessObject(string cubeName, int resultHandling)
        {
            return ProcessObject(cubeName, (ResultHandling)resultHandling);
        }

        public static DataTable ProcessObject(string cubeName)
        {
            return ProcessObject(cubeName, 1);
        }
    }
}

There are two things here that are worth to point out. First of all we need to somehow establish a connection to the server that hosts the SSAS database in order to process an object. The first thing that comes into mind would be to create a new connection to the server and run some processing XMLA. Unfortunately this does not work here because as this would result in a deadlock. The assembly will be called as a Rowset-Action which in terms of locking is similar to a query. So if we run separate processing command within our Rowset-Action this processing command can never be committed as there is always a query running on the database. The query (=Rowset-Action) will wait until the function is finished and the function waits until the processing is committed resulting in a classical deadlock!

To avoid this we need to connect to the current users sessions using “*” as our connection string:

//connect to the current session…
server.Connect(“*”);

 

The second thing to point out is the return value of our function. In order to be used in a Rowset-action, the function has to return a DataTable object. The results would then be displayed similar to a drill through and a separate worksheet would pop up in Excel showing the result:

Result_DataTable

From my point-of-view this can be a bit annoying as after each processing this new worksheet pops up and you loose the focus of the pivot table. So I investigated a bit and found another way to display the output of the processing.

When a drill through / action throws an exception, this will also be displayed in Excel without actually displaying the result of the drill through / action in a separate worksheet. By default it may look like this message:

Result_NULL

Well, not really much information here right?

To display more information, e.g. that the processing was successful (or not) we can throw our own exception in the code after processing is done:

Result_Exception

I added a separate parameter to control this behavior, in the code above the default would be option 3 – custom Exception.

 

Once this project is built the DLL can be added to the SSAS database. It is necessary to grant the assembly “Unrestricted” permission set:

AssemblySettings

 

The last step would be to create the appropriate cube actions that call the assembly:

CubeActionConfiguration

It is necessary that the action is of Type=Rowset, otherwise it would not be displayed in Excel! The Action Target is up the you, in this example it will be displayed on every cell. The Action expression is the actual call to our assembly passing the cube name as a parameter. The second parameter controls how the result is displayed and could also be omitted.  In this example I added 3 separate actions, one for each result-type (as described above).

And that’s it! This simple steps allow you and your users to process any cube (or any other database object) from within your favorite client application, assuming it supports cube actions.

This is the final result where the three cube actions can be triggered from an Excel pivot table:

Excel_Action_Final

 

The attached zip-file includes everything that is necessary to recreate this example:
– SSAS project
– ASSP_Processing project

CubeProcessingAction.zip

Solution for Asynchronous Processing without Admin-Rights:
VS_Solution.zip

SSAS Dynamic Security and Visual Totals

Security is always an important aspect of any BI solution. Especially for big enterprise solutions the security-concept can become very complex. Analysis Services Multidimensional  in general offers two option on how to define security: Cell Data Permissions and Dimension Data Permissions. Because of the huge impact on performance Cell Data Permissions are barely used. Dimension Data Permissions are usually the way to go and can cover probably 98% of the security requirements. Though, also Dimension Data Permissions can get quite complex, especially for large-scale cubes with 100+ or even 1000+ users. Just think of an Analysis Services that holds sales data associated to KeyAccounts – further assume that we have 100 KeyAccounts. Each of this KeyAccounts is managed by one user, so we would end up creating 100 roles – one for each KeyAccount and its manager. In terms of maintenance this can get quite complex and in the long run unmanageable.

To address this issue Analysis Services offers the concept of Dynamic Security. Dynamic Security is evaluated at runtime for each user and allows you to consolidate your roles. So for our example from above instead of having 100 different roles we would end up having one dynamic role serving all 100 users. Basically, when a user connects it queries internal data (within the cube using MDX) or external data (using Analysis Services Stored Procedure (ASSP)/Assemblies) to get the security settings for the current user. Especially in combination with ASSP this can be quite powerful as you can use any datasource that you can query using .Net-code to derive your security from.

To make a role "dynamic" you basically have two options:
1) Username() – returns the current users in format "MyDomainMyUser"
2) CustomData() – returns whatever was specified in the connectionstring’s CustomData-property

Both are native MDX-functions and can be used in an expression to build your Allowed Set, Denied Set or DefaultMember.

Role_UI

This works just fine, as long as the expressions returns a valid MDX set/member of course.
But what about the Visual Totals – setting? You can only check or uncheck the checkbox and you are not able to use any dynamic expressions. That’s what the UI offers you – no more no less. But there might be some rare scenarios where you also need to make the Visual Totals dynamically based on Username() or CustomData() – and this is what this post is about.

 

So, as we have already seen above, it is not possible to make the Visual Totals dynamically using the UI. Fortunately there are also other ways besides the UI to work with the cube and modify its structure. I am talking about XMLA (XML for Analysis Services) here, which is the protocol that is used for all communications between a client and Analysis Services. Instead of writing XMLA on your own there is also a programmatically way to do this: AMO (Analysis Management Object). AMO is a .Net library that allows you to integrate calls to Analysis Services within any .Net code. After modifying any object in the code AMO generates the according XMLA for you and sends it to the server which is a much more convenient way than writing plane XMLA on your own.

This means that you can also create/modify your security roles using AMO – this approach is already very well described by Benny Austin in his blog about SSAS: Using AMO to Secure Analysis Service Cube. I recommend reading this article before you continue here.

For this little example I create a slimmed down version of Adventure works. The model contains only the Date and the Product dimension. We will put our dynamic security on the Category attribute of our Product dimension. The model also already contains a predefined empty role called "DynamicVisualTotals" that we will modify later (after the model has been deployed) using AMO. Please note here that AMO only work with the Analysis Service directly (online) but not at design time in Visual Studio. Once everything is deployed this is the AMO code to use to modify the predefined role:

using Microsoft.AnalysisServices;

namespace ModifyRole_AMO
{
    class Program
    {
        static void Main(string[] args)
        {
            using (Server oServer = new Server())
            {
                oServer.Connect("localhost");

                using (Database oDB = oServer.Databases.GetByName("DynamicVisualTotals"))
                {
                    Role oRole = oDB.Roles.GetByName("DynamicVisualTotals");

                    Dimension oDim = oDB.Dimensions.GetByName("Product");
                    DimensionAttribute oAttr = oDim.Attributes.FindByName("Category");
                    DimensionPermission oDimPermission = oDim.DimensionPermissions.FindByRole(oRole.ID);

                    AttributePermission oAttrPermission = oDimPermission.AttributePermissions.Find(oAttr.ID);

                    // can be done in the UI
                    oAttrPermission.AllowedSet = "DynamicSecurity.DynamicAllowedSet(CustomData())";
                    // can not be done in the UI
                    oAttrPermission.VisualTotals = "DynamicSecurity.DynamicVisualTotals(CustomData())";

                    oDimPermission.Update();
                }
            }
        }
    }
}

1) connect to server "localhost"
2) connect to database "DynamicVisualTotals"
3) get the Role-object for role "DynamicVisualTotals" from the database
4) get the Dimension-object for dimension "Product"
5) get the DimensionAttribute-object for attribute "Category"
6) get the DimensionPermission that is associated to our Role for dimension "Product"
7) get the AttributePermission that is associated to our attribute "Category"

So far this is straight forward. The AttributePermission object actually represents the UI that we have seen above. It has properties for all the fields and textboxes on the UI. The most important of course are:
– AllowedSet
– DeniedSet
– Defaultmember
– and VisualTotals

For the first three I was already aware that they are of type String but the VisualTotals I expected to be Boolean which turned out to be wrong – it is also a String!

VisualTotalsIsString 

Naïve as I am I simply put a MDX expression in there that resolves to "1" or "0" (Boolean, similar to the Checkbox in the UI, it could probably also resolve to "true" or "false" or a Boolean type directly but I did not further test this) – and, it worked like a charm! The MDX expression I used refers to an ASSP called DynamicSecurity which contains a function called "DynamicVisualTotals" that returns "1" or "0" based on what was passed in the connectionstring using CustomData():

using Microsoft.AnalysisServices.AdomdServer;

namespace DynamicSecurity
{
    public static class DynamicSecurity
    {
        public static string DynamicVisualTotals(string customData)
        {
            if (customData == "1")
                return "1";

            return "0";
        }

        public static Set DynamicAllowedSet(string customData)
        {
            string mdxEpression;
            Expression exp;
            Set s;

            if (customData == "1")
                mdxEpression = "{[Product].[Category].&[1]}";

            else
                mdxEpression = "{[Product].[Category].&[3], [Product].[Category].&[4]}";

            exp = new Expression(mdxEpression);
            s = exp.CalculateMdxObject(null).ToSet();

            return s;
        }
    }
}

(This ASSP has to be compiled and added to the database first to make the Dynamic Security work. This is describe here in more detail.)

And that’s it!

Once everything is done
1) Deploy the solution to the sever
2) (add ASSP if not already included in the solution)
3) Execute the AMO code (first listing)
we can use Excel to check the results:

FinalResults

 

The used connection strings are as follows:

Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Data Source=localhost;Initial Catalog=DynamicVisualTotals;Roles=DynamicVisualTotals;CustomData=0

 

Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Data Source=localhost;Initial Catalog=DynamicVisualTotals;Roles=DynamicVisualTotals;CustomData=1

Make sure to also use the Roles-property for testing as if you are administrator these privileges would overwrite all other security settings!

 

Additional Notes:

As AMO is just a wrapper for XMLA we could also use XMLA directly to modify our role:

<Alter AllowCreate="true" ObjectExpansion="ObjectProperties" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    <Object>
        <DatabaseID>DynamicVisualTotals</DatabaseID>
        <DimensionID>Dim Product</DimensionID>
        <DimensionPermissionID>DimensionPermission</DimensionPermissionID>
    </Object>
    <ObjectDefinition>
        <DimensionPermission xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400">
            <ID>DimensionPermission</ID>
            <Name>DimensionPermission</Name>
            <RoleID>Role</RoleID>
            <Read>Allowed</Read>
            <AttributePermissions>
                <AttributePermission>
                    <AttributeID>Product Category Name</AttributeID>
                    <AllowedSet>DynamicSecurity.DynamicAllowedSet(CustomData())</AllowedSet>
                    <VisualTotals>DynamicSecurity.DynamicVisualTotals(CustomData())</VisualTotals>
                </AttributePermission>
            </AttributePermissions>
        </DimensionPermission>
    </ObjectDefinition>
</Alter>

 

Please note that XMLA always uses the internal IDs which may not be the same as the name of the objects!

 

In this post I showed how to use AMO and XMLA to modify a security role and use an expression to make the VisualTotals setting dynamic which is not possible using the UI only. This is just a very simplified example but I used this approach on some of my enterprise deployments and it works just fine also for very complex scenarios.

 

The attached zip-file includes everything that is necessary to recreate this example:
– SSAS project
– ASSP project
– AMO project (and XMLA script)
– final Excel with results

DynamicVisualTotals.zip

DAX vs. MDX: DataMembers in Parent-Child Hierarchies

Recently when browsing the MSDN PowerPivot Forums I came across this thread where the question was ask on how to show the value which is directly linked to an element in a parent-child hierarchy instead of the aggregation of all "children". In this post I am going to address this problem finally showing a proper solution.

 

First of all I would like to start with some background to so called "datamembers". The term "datamember" originates from parent-child hierarchies in multidimensional models. It is a dummy-member that is created automatically below each hierarchy-node to hold values that are linked to a node-member directly. This is of course only possible for parent-child hierarchies.

Take the following MDX-Query for example:

SELECT
[Measures].[Sales Amount Quota] ON 0,
Descendants(
    [Employee].[Employees].&[290],
    1,
    SELF_AND_BEFORE) ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar].[Calendar Year].&[2006]

HiddenDatamember

As we can see, the value of "Amy E. Alberts" is not the Sum of its children but has a difference of $108,000.00 This difference is because of the "datamember" that is also a child of every parent-child node which may be hidden. For multidimensional models his behavior can be changed by setting the MembersWithData-Property to "NonLeafDataVisible":

VSProperties

 

Executing the same query again we now get this results:

VisibleDatamember

Now we see "Amy E. Alberts" twice – the node and the datamember.

 

Doing a similar query on a tabular model we will get this result by default:

q4evw0ne

As you can see, we still have the datamember but without any caption. Though, this can of course be changed when you flatten out the hierarchy using DAX by simply replacing BLANK() by "<my datamember caption>" in your calculated columns.

 

Anyway, this is not the topic of this post but just some background information. Now we will focus on how to query data that is directly associated to a node or its corresponding datamember in MDX and DAX.

 

MDX:

For MDX this is quite easy as the datamember can be referenced by simply using ".datamember" of a given hierarchy node:

WITH

MEMBER [Measures].[DataMember] AS (
[Measures].[Sales Amount Quota],
[Employee].[Employees].currentmember.datamember
), FORMAT_STRING = 'Currency'

SELECT
{
[Measures].[Sales Amount Quota],
[Measures].[DataMember]
} ON 0,
Descendants(
    [Employee].[Employees].&[290],
    1,
    SELF_AND_BEFORE) ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar].[Calendar Year].&[2006]

DatamemberValue_MDX

The DataMember-Measure only shows values that are directly linked to the current node of the parent-child hierarchy. This is pretty straight forward and very easy to accomplish.

 

DAX:

In this DAX this problem is a bit more tricky as we do not have that built-in ".datamember" function. Also navigation within hierarchies and parent-child hierarchies itself are not really supported in DAX. Though, there are several blogs out there that describe how to handle parent-child hierarchies by Alberto Ferrari (here) and Kasper de Jonge (here). Based on these solutions we can create our datamember-calculation.

First (and only) thing we need is to add a calculated column that stores the path depth of the current row. This can be achieved by using this formula:

=PATHLENGTH(
    PATH(
        Employee[EmployeeKey],
        Employee[ParentEmployeeKey]
    )
)

 

Using our previous example and selecting "Amy E. Alberts" together with our new column we get this:

CalculatedColumn_Level

As we can see there are two different Levels below "Amy E. Alberts" where Level=3 holds our datamember-value and Level=4 holds all the other values (=real child members). Lets add a calculated measure to help you make the final calculation more understandable:

MinLevel:=MIN(Employee[Level])

Calculation_MinLevel

So, for our [DataMember]-calculation we simply have to extend the context to only include rows where [Level] = [MinLevel]

DataMember:=CALCULATE(
SUM('Sales Quota'[Sales Amount Quota]),
FILTER('Employee',
    'Employee'[Level] = [MinLevel]))

CalculationNotWorking_DAX

Well, obviously our calculation is not returning the expected results. This is because of a very common pitfall which I am also very prone to. Whenever a calculated measure is used within an other calculated measure, a CALCULATE() gets wrapped around it thereby taking over the current context. In our case this means that ‘Employee'[Level] will always be the same as [MinLevel] as [MinLevel] gets evaluated for each row and doing MIN() on only one row of course always returns the value of the current row which we compared it to!

To solve this issue we have to place the original [MinLevel]-calculation directly into our [DataMember]-calculation:

DataMember:=CALCULATE(
SUM('Sales Quota'[Sales Amount Quota]),
FILTER('Employee',
    'Employee'[Level] = MIN(Employee[Level])))

CalculationWorking_DAX

This change finally makes the calculation return the correct results also when browsing using the hierarchy:

DatamemberValue_DAX

 

Even though DAX is not very good in handling hierarchies, the [DataMember]-calculation can be accomplished quite easily. This calculation also works for hierarchies of any depth without needing to adopt it. In terms of performance it may be further improved to the following formula – though the impact will be minor:

DataMember_v2:=CALCULATE(
SUM('Sales Quota'[Sales Amount Quota]),
FILTER(VALUES('Employee'[Level]),
    'Employee'[Level] = MIN(Employee[Level])))

 

Download Final Model (Office 2013!)

Universal Quantiles Calculation for PowerPivot using DAX

In my last post I showed a new approach on how to calculate the median in PowerPivot using DAX. In the comments the question was raised whether it is possible to make that calculation universal for all kind of Quantiles like Median, Quartiles, Percentiles, etc. and that’s what this post is about.

Lets analyze the final Median calculation from the last post:

Median SA Months:=CALCULATE([SumSA],
TOPN(
2-MOD([Cnt_Months], 2),
TOPN(
([Cnt_Months] + 1) / 2,
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA],
1))
/
(2-MOD([Cnt_Months], 2))

 

The Median defines the value in the middle of an ordered set. To get the first half (+1 to handle even and odd sets) of the whole we are using TOPN function:

TOPN(
([Cnt_Months] + 1) / 2,
VALUES(‘Date’[Month]),
[SumSA]),

The important part here is the “divide by 2” to split the set in the middle to start our Median calculation. Instead of dividing by 2 we could also multiply by 0.5 where 0.5 would be the separator for our quantile (in this case the Median). This expression can be made dynamic. For the first Quartile we would use 0.25 for the second Quartile (=Median) we would use 0.5 and for the third Quartile we would use 0.75.

I created a little linked table to demonstrate the dynamic approach:

Quantile SortOrder Percentage
Median 1 0.5000
Quartile 1 401 0.2500
Quartile 2 402 0.5000
Quartile 3 403 0.7500
Sextile 1 601 0.1667
Sextile 2 602 0.3333
Sextile 3 603 0.5000
Sextile 4 604 0.6667
Sextile 5 605 0.8333

We also need to add a calculated measure that returns the percentage-value of the currently selected Quantile:

SelectedQuantile:=IF(
HASONEVALUE(Quantiles[Percentage]),
VALUES(Quantiles[Percentage]))

 

Now we can change our old Median-calculation to be dynamic by using the measure defined above:

Quantile SA Months:=CALCULATE([SumSA],
TOPN(
2 – MOD([Cnt_Months], 2),
TOPN(
ROUNDUP(([Cnt_Months] + 1) * [SelectedQuantile], 0),
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA],
1))
/
(2 – MOD([Cnt_Months], 2))

We also need to explicitly add ROUNDUP() as “([Cnt_Months] + 1) * [SelectedQuantile]” may return any decimal places whereas the previous divide by 2 could only result in a x.5 which was rounded up automatically. And well, that’s all we need to change in order to make the calculation universal for all different Quantiles! The rest is the same logic that I already described for Median calculation.

 

Download Final Model (Office 2013!)

Calculating Median in PowerPivot using DAX

I just came across this blog post by Bill Anton where he discusses several approaches to calculated the Median of a given set in T-SQL, MDX and DAX. In the end of his post when it comes to the DAX calculation, he references several post by Marco, Alberto and Javier (post1, post2) that already address that kind of calculation in DAX. But he also claims that non of the solutions is “elegant”. Well, reason enough for me to try it on my own and here is what I came up with. Its up to you to decide whether this solution is more elegant than the others or not 🙂

In general, the median calculation always varies depending on the number of items and whether this number is even or odd.
For an even population the median is the mean of the values in the middle:
the median of {3, 5, 7, 9} is is (5 + 7)/2 = 6
For an odd population, the median is the value in the middle:
the median of {3, 5, 9} is 5

In both cases the values have to be ordered before the calculation. Note that it does not make a difference whether the values are sorted in ascending or descending order.

OrderedValues

In this example, our set contains 12 items (=months) so we have to find the 2 items in the middle of the ordered set – December and February – and calculate the mean.

So, how can we address this problem using DAX? Most of the posts I mentioned above use some kind of combination of ranking – RANKX() – and filtering – FILTER(). For my approach I will use none of these but use TOPN instead (yes, I really like that function as you probably know if you followed my blog for some time).

In this special case, TOPN() can do both, ranking and filtering for us. But first of all we need to know how many items exist in our set:

Cnt_Months:=DISTINCTCOUNT(‘Date’[Month])

 

This value will be subsequently used in our next calculations.

To find the value(s) in the middle I use TOPN() twice, first to get the first half of the items (similar to TopCount) and then a second time to get the last values that we need for our median calculation (similar to BottomCount):

TopBottom

As the median calculation is different for even and odd sets, this also has to be considered in our calculation. For both calculations MOD()-function is used to distinguish both cases:

Items_TopCount:=IF(MOD([Cnt_Months],2) = 0,
([Cnt_Months] / 2) + 1,
([Cnt_Months] + 1) / 2)

For an even number of items (e.g. 12) we simply divide the count of items by 2 and add 1 which gives us a (12 / 2) + 1 = 7 for our sample.
For an odd number of items (e.g. 5) we first add 1 to our count of items and then divide by 2 which gives us (5 + 1) / 2 = 3

Items_BottomCount:=IF(MOD([Cnt_Months],2) = 0, 2, 1)

For an even number of items we have to consider the last 2 values whereas for an odd number of items we only have to consider the last value.

 

These calculations are then used in our median calculation:

Median SA Months:=CALCULATE([SumSA],
TOPN(
[Items_BottomCount],
TOPN(
[Items_TopCount],
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA] * -1))
/
[Items_BottomCount]

As DAX has no built-in BOTTOMN()-function, we need to “abuse” the TOPN() function and multiply the OrderBy-value by “–1” to get the BOTTOMN() functionality. As you can see most of the logic is already handled by our [Items_TopCount] and [Items_BottomCount] measures and this pattern can be reused very easily.

 

Of course all these calculations can also be combined and the use of IF() can be avoided:

Median SA Months v2:=CALCULATE([SumSA],
TOPN(
2 – MOD([Cnt_Months], 2),
TOPN(
([Cnt_Months] + 1) / 2,
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA] * -1))
/
(2 – MOD([Cnt_Months], 2))

Note: for an even population ([Cnt_Months] + 1) / 2 returns X.5 which is automatically rounded up when it is used in a function that expects a whole number. In our example this is what happens: (12 + 1) / 2 = 6.5 –> 7

These are the final results:

FinalResults

 

Additional content:

We could also use AVERAGEX() to calculate our median but I think that it is some kind of overhead to use AVERAGEX() just to divide by “1” or “2” depending on the number of items that our TOPN-functions return:

Median SA Months AvgX:=AVERAGEX(
TOPN(
2-MOD([Cnt_Months], 2),
TOPN(
([Cnt_Months] +1) / 2,
VALUES(‘Date’[Month]),
[SumSA]),
[SumSA] * -1),
[SumSA])

 

As you can see there are various approaches to calculate the median, its up to you which on you like most. I have not tested any of them in terms of performance over bigger sets – this may be topic for an upcoming post.

 

Download Final Model (Office 2013!)