Applied Basket Analysis in Power Pivot using DAX

Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.

Lets take a look at the initial data model first:Model_Old

First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an inactive relationship to our ‘Internet Sales’ fact table.

After adding the tables the model looks as follows:Model_New

The next and actually last thing to do is to add the following calculated measure:

Sold in same Order :=
CALCULATE (
COUNTROWS ( ‘Internet Sales’ ),
CALCULATETABLE (
SUMMARIZE (
‘Internet Sales’,
‘Internet Sales’[Sales Order Number]
),
ALL ( ‘Product’ ),
USERELATIONSHIP ( ‘Internet Sales’[ProductKey], ‘Filtered Product’[ProductKey] )
)
)

(formatted using DAX Formatter)

The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.
We could also change ‘Internet Sales'[Sales Order Number] to ‘Internet Sales'[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales'[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!

So what can we finally do with this neat formula?

1) Classic Basket Analysis – “Others also bought …”:
Result_ClassicBasketAnalysis

As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:
Result_ByColor
As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.

2) Basket Analysis Matrix:
What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:
Result_BasketMatrix
Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.
This is what the whole matrix looks like:
Result_BasketMatrixFull
The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.

For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:

BasketMatrix :=
IF (
MIN ( ‘Product’[ProductKey] )
<> MIN ( ‘Filtered Product’[ProductKey] ),
[Sold in same Order]
)

3) Find Customers that have not bough a common product yet
As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:

Not Sold to same Customer :=
IF (
NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),
“Not Sold Yet”
)

The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise  BLANK() which is the default if the third parameter is omitted. That’s the result:
Result_NotSoldToCustomer
Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages? Smile

 

As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.

And that’s it – Basket Analysis made easy using Power Pivot and DAX!

Downloads:

Sample Excel Workbook with all Examples: BasketAnalysis.xlsx
Sample PowerBI Desktop file with all Examples (DAX only, no Visuals): BasketAnalysis.pbix

Optimizing Columnar Storage for Measures

First of all I have to thank Marco Russo for his blog post on Optimizing High Cardinality Columns in Vertipaq and also his great session at SQL PASS Rally Nordic in Stockholm last year which taught me a lot about columnar storage in general. I highly recommend everyone to read the two mentioned resources before continuing here. Most of the ideas presented in this post are based on these concepts and require at least basic knowledge in columnar storage.

When writing one of my recent posts I ended up with a Power Pivot model with roughly 40M rows. It contained internet logs from Wikipedia, how often someone clicked on a given page per month and how many bytes got downloaded. As you can imagine those values can vary very heavily, especially the amount of bytes downloaded. So in the Power Pivot model we end up having a lot of distinct values in our column that we use for our measure. As you know from Marcos posts, the allocated memory and therefore also the  performance of columnar storage systems is directly related to the number of distinct values in a column – the more the worse. Marco already described an approach to split up a single column with a lot of distinct values into several columns with less distinct values to optimize storage. These concepts can also be used on columns that contain measures or numeric values in general. Splitting numeric values is quite easy, assuming your values range from 1 to 1,000,000 you can split this column into two by dividing the value by 1000 and using MOD 1000 for the second column. Instead of one column with the value 123,456 you end up with two columns with the values 123 and 456. In terms of storage this means that instead of 1,000,000 distinct values we only need to store 2 x 1,000 distinct values. Nothing new so far.

The trick is to combine those columns again at query time to get the original results as before the split. For some aggregations like SUM this is pretty straight forward, others are a bit more tricky. Though, in general the formulas are not really very complex and can be adopted very easily to handle any number of columns:

Aggregation DAX Formula
Value_SUM1 =SUMX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_SUM2 =SUM ( ‘1M_Rows_splitted'[Value_1000] ) * 1000
    + SUM ( ‘1M_Rows_splitted'[Value_1] )
Value_MAX =MAXX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_MIN =MINX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_COUNT =COUNTX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_DISTINCTCOUNT =COUNTROWS (
    SUMMARIZE (
        ‘1M_Rows_splitted’,
        ‘1M_Rows_splitted'[Value_1000],
        ‘1M_Rows_splitted'[Value_1]))

As you can see you can still mimic most kind of aggregation even if the [Value]-column is split up.

Though, don’t exaggerate splitting your columns – too many may be a bit inconvenient to handle and may neglect the effect resulting in worse performance. Marco already showed that you can get a reduction of up to 90% in size, during my simple tests I came up with about the same numbers. Though, it very much depends on the number of distinct values that you actually have in your column!

I would not recommend to always use this approach for all your measure column – no, definitely not! First check how many distinct values your data/measures contain and decide afterwards. For 1 million distinct values it is probably worth it, for 10,000 you may reconsider using this approach. Most important here is to test this pattern with your own data, data model and queries! Test it in terms of size and of course also in terms of performance. It may be faster to split up columns but it may also be slower and it may be also different for each query that you execute against the tabular model / Power Pivot. Again, test with your own data, data model and queries to get representative results! 

Here is a little test that you may run on your own to test this behavior. Simple create the following Power Query using M, load the result into Power Pivot and save the workbook. It basically creates a table with 1 million distinct values (0 to 999,999) and splits this column up into two. You can just copy the workbook, remove the last step “Remove Columns” and save it again to get the “original” workbook and Power Pivot model.

let
    List1 = List.Numbers(0, 1000000),

    TableFromList = Table.FromList(List1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    RenamedColumns = Table.RenameColumns(TableFromList,{{"Column1", "Value"}}),
    ChangedType1 = Table.TransformColumnTypes(RenamedColumns,{{"Value", type number}}),
    InsertedCustom = Table.AddColumn(ChangedType1, "Value_1000", each Number.RoundDown([Value] / 1000)),
    InsertedCustom1 = Table.AddColumn(InsertedCustom, "Value_1", each Number.Mod([Value], 1000)),
    ChangedType = Table.TransformColumnTypes(InsertedCustom1,{{"Value_1000", type number}, {"Value_1", type number}}),
    RemovedColumns = Table.RemoveColumns(ChangedType,{"Value"})
in
    RemovedColumns

This is what I ended up with when you compare those two workbooks:

  # distinct values Final Size
single column 1,000,000 14.4 MB
two columns 2 x 1,000 1.4 MB

We also get a reduction in size of 90%! Though, this is a special scenario …
In the real world, taking my previous scenario with the Wikipedia data as an example, I ended up with a reduction to 50%, still very good though. But as you can see the reduction factor varies very much.

Downloads:

Sample workbook with Power Query: 1M_Rows_splitted.xlsx

Upcoming Conferences and Events in Q1 2013

After PASS SQL Rally Nordic in Stockholm last year I am very happy to announce that I am going to speak at two more events in the first quarter of 2013.

SQLKonferenz2014
I will do a session at the “Deutsche SQL Server Konferenz 2014” (=”German SQL Server Conference 2014″) on “Big Data Scenario mit Power BI vs. SAP HANA“. It is basically an advanced version of my blog post on SAP HANAs Big Data Scenario with Power BI with much more insights details on both technologies. The conference itself is a 3 day conference from 10th to 12th of February where day 1 is reserved for pre-conference sessions. It also features a lot of international speakers and of course also a good amount of English sessions (mine will be in German though). Also my colleague Marcel Franke will do a session about PDW and R which you also do not want to miss if you are into Big Data and predictive analytics!
Check out the agenda and make sure you register in time!

SQLSaturdayBanner
Later on the 6th of March I will speak at the SQLSaturday #280 in Vienna on “Scaling Analysis Services in the Cloud“. (This is not a typo, its really on 6th of March which is actually a Thursday!) The session focuses on how to get the best performance out of multidimensional Analysis Services solutions when they are moved to the Windows Azure cloud. In the end I will come up with some best practices and guide lines for this and similar scenarios.
Just make sure that you register beforehand to enjoy this full day of free sessions and trainings!

Hope to see you there!

Reporting Services MDX Field List and Using Measures on rows

When creating a Reporting Services report on top of an Analysis Services cube using the wizard it automatically creates a Field for each column in your MDX query. Those fields can then be used in your report. For reports based on a relational source the definition of these fields is quite simple, it is the same as the the column name of the originating query. For MDX queries this is very different. If you ever checked the definition of an automatically generated MDX field you will see a lengthy XML snippet instead:
FieldsList

The XMLs may look like these:

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Level"
       UniqueName="[Product].[Subcategory].[Subcategory]" />

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Measure"
       UniqueName="[Measures].[Internet Sales Amount]" />

As you can see those two are quite different in terms of xsi:type and UniqueName. The xsi:type “Level” refers to a dimension level whereas “Measure” refers to a measure. Depending on the type of field, different properties are available within the report:
FieldProperties

For example the property BackgroundColor is only populated for fields of type “Measure” whereas the property UniqueName is only populated for fields of type “Level”. Measure properties are tied to the CELL PROPERTIES in your MDX query and Level properties are tied to DIMENSION PROPERTIES:

SELECT
NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS,
NON EMPTY { ([Product].[Subcategory].[Subcategory].ALLMEMBERS ) }
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS
FROM [Adventure Works]
CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

If we remove MEMBER_UNIQUE_NAME from the DIMENSION PROPERTIES we would no longer be able to use Fields!Subcategory.UniqueName in our SSRS expressions, or to be more precise it would simply always return NULL (or NOTHING in terms of Visual Basic). The same of course is also true for the CELL PROPERTIES.

So far this is nothing really new but there are some more things about the fields of MDX queries. There is a third xsi:type called “MemberProperty” which allows you to query member properties without having to define separate measures within your query:

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="MemberProperty"
       LevelUniqueName="[Product].[Subcategory].[Subcategory]"
       PropertyName="Category" />

Once we add a member property to our MDX query SSRS also automatically creates this field for us. NOTE, this is only possible by manually modifying the MDX!

SELECT
NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS,
NON EMPTY { ([Product].[Subcategory].[Subcategory].ALLMEMBERS ) }
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME,
    [Product].[Subcategory].[Subcategory].[Category] ON ROWS
FROM [Adventure Works]
CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

To get the [Category] which is associated to a given [Subcategory] you would usually need to create a separate measure like this:

WITH
MEMBER [Measures].[Category] AS (
[Product].[Subcategory].Properties( "Category" )
)
SELECT

This has the very bad drawback that using the WITH MEMBER clause disables the formula engine cache for the whole query what may result in worse query performance. So you may consider using DIMENSION PROPERTIES instead of a custom Measure next time.

 

There is another very nice “feature” that is also related to the field list. If you ever had the requirement to create a parameter to allow the user to select which measure he wants to see in the report you probably came across this blog post by Chris Webb or this blog post by Rob Kerr. As you know by then, SSRS requires you to put the Measures-dimension on columns, otherwise the query is not valid. This is because the number of Measures is not considered to be dynamic (opposed to e.g. Customers) which allows SSRS to create a static field list. This makes sense as SSRS was originally designed for relational reporting and a table always has a fixed number of columns which are similar to fields in the final SSRS dataset. Using Measures on columns is the way how SSRS enforces this.

As we are all smart SSRS and MDX developers and we know what we are doing we can trick SSRS here. All we need to do is to write a custom MDX query using the expression builder – do not use or even open the Query Designer at this point otherwise your query may get overwritten!
Query_Measures

SSRS also automatically creates the fields for us, but this time the fields are not defined correctly. It creates one field with a very cryptic name and the following XML definition:

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Measure"
       UniqueName="[Measures].[MeasuresLevel]" />

As you can see SSRS thinks that this field is of type “Measure” but it actually is a “Level”. After changing this little thing we can access all field properties that are unique to Level-fields like <Field>.UniqueName

So this is the final MDX query and the associated XML field definition:

SELECT
{} ON 0,
[Measures].members
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON 1
FROM [Adventure Works]

Ensure that you have defined the necessary DIMENSION PROPERTIES here otherwise they will not be available/populated within the report!

<?xml version="1.0" encoding="utf-8"?>
<Field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xsi:type="Level"
       UniqueName="[Measures].[MeasuresLevel]" />

In order to make use of this approach in a report parameter we further need to create calculated fields for our parameter label and parameter value:
Fields_Measures
The definition of the parameter is straight forward then:
ParameterDefinition

You can not only use this approach to populate a parameter but you can also use it to crossjoin Measures on rows with any other hierarchy. This way you can avoid writing complex MDX just to work around this nasty SSRS limitation.

Downloads:

Sample SSRS report: CustomFieldList.rdl

Error-handling in Power Query

Data is the daily bread-and-butter for any analyst. In order to provide good results you also need good data. Sometimes this data is very well prepared beforehand and you can use it as it is but it is also very common that you need to prepare and transform the data on your own. To do this Microsoft has introduced Power Query (on tool of the Power BI suite). Power Query can be used to extract, transform and load data into Excel and/or Power Pivot directly.

When using any data you usually know what the data looks like and what to expect from certain columns – e.g. a value is delivered as a number, a text contains exactly 4 characters, etc.
Though, sometimes this does not apply for all rows of that dataset and your transformation logics may cause errors because of that. In order to avoid this and still have a clean data load you need to handle those errors. Power Query offers several options to this which I will elaborate in this post.

This is the sample data I will use for the following samples:

A B C
1 4 AXI23
2 5 TZ560
NA 6 UP945

we will perform simple transformations and type casts on this table to generate some errors:

Prepare_ChangeType

Error-handling on row-level

This is the easiest way of handling errors. Whenever a transformation causes an error, we can simply remove the whole row from the result set:

RowLevel_RemoveErrors

This will generate the following result table:

A B C
1 4 AX312
2 5 TZ560

As you can see, the third row was simply removed. This is the easiest way on how to remove errors from your result set but of course this may not be what you want as those removed rows may contain other important information in other columns! Assume you want to calculate the SUM over all values in column B. As we removed the third row we also removed a value from column B and the SUM is not the same as without the error-handling (9 vs. 15)!

Error-handling on cell-level

As we now know that column A may result in an error, we can handle this error during our transformation steps. As “NA” could not be converted to a number we see Error as result for that column. Clicking on it gives use some more details on the error itself. To handle this error we need to create a new calculated column where we first check if the value can be converted to a number and otherwise return a default numeric value:

RowLevel_RemoveErrors_CalcColumn

The M-function that we use is “try <expressions to try> otherwise <default if error>” which is very similar to a try-catch block in C#. If the expression causes an error, the default will be used instead. Details on the try-expression can be found in the Microsoft Power Query for Excel Formula Language Specification (PDF) which can be found here and is a must read for everyone interested in the M language.

CellLevel_HandleError

We could further replace our column A by the new column A_cleaned to hide this error handling transformation.

A B C A_cleaned
1 4 AXI23 1
2 5 TZ560 2
NA 6 UP945 0

Error-handling on cell-level with error details

There may also be cases where it is OK to have one of this errors but you need/want to display the cause of the error so that a power user may correct the source data beforehand. Again we can use the try-function, but this time without the otherwise-clause. This will return a record-object for each row:

RowLevel_ShowErrors_CalcColumn

RowLevel_ShowErrors_Result1

After expanding the A_Try column and also the A_Try.Error column we will get all available information on the error:

RowLevel_ShowErrors_Result2

A B C A_Try.HasError A_Try.Value A_Try.Error.Reason A_Try.Error.Message A_Try.Error.Detail
1 4 AXI23 FALSE 1
2 5 TZ560 FALSE 2
6 UP945 TRUE DataFormat.Error Could not convert to Number. NA

As you can see we get quite a lot of columns here. We could e.g. use A_Try.HasError to filter out error rows (similar to error-handling on row-level) or we could use it in a calculated column to mimic error-handling on cell-level. What you want to do with all the information is up to you, but in case you don’t need it you should remove all unnecessary columns.

Downloads:

Power Query Error Handling Workbook: Power Query Error Handling.xlsx

Call for Speakers: SQLSaturday#280 Vienna

The SQL PASS Austria chapter is organizing a SQL Saturday in Vienna. It will be held on the March 6th 2014. (For those of you how are checking their calendars right now – yes it is actually a Thursday!) So register and be part of the first SQLSaturday that is held on a Thursday Smile!!!

SQLSaturdayBanner

Also the Call for Speakers is opened until January 5th, so make sure to submit your sessions by then.

Hope to see you there!

Using Power Query to analyze SSAS Disk Usage

Some time ago Bob Duffy blogged about on how to use Power Pivot to analyze the disk usage of multidimensional Analysis Services models (here). He uses a an VBA macro to pull meta data like filename, path, extension, etc. from the file system or to be more specific from the data directory of Analysis Services. Analysis Services stores all its data in different files with specific extensions so it is possible to link those files to multidimensional objects in terms of attributes, facts, aggregations, etc. Based on this data we can analyze  how our data is distributed. Do we have too big dimensions? Which attribute uses the most space? Do our facts consume most of the space (very likely)? If yes, how much of it is real data and how big are my aggregations – if they are processed at all?!? – These are very common and also important things to know for an Analysis Services developer.

So Bob Duffy’s solution can be really useful. The only thing I did not like about it was the fact that it uses a VBA macro to get the data. This made me think and I came up with the idea of using Power Query to get this data. Btw, make sure to check out the latest release, there have been a lot of improvements recently!

With Power Query you have to option to load multiple files from a folder and also from its sub folders. When we do this on our Analysis Services data directory, we get a list of ALL files together with their full path, filename, extension and most important in this case their size which can be found by expanding the Attributes-record:
PQ_Source_FileList

The final Power Query does also a lot of other things to prepare the data so it can be later joined to our FileExtensions-table that holds detailed information for each file extension. This table currently looks like below but can be extended by any other columns that may be necessary and/or useful for you:

FileType FileType_Description ObjectType ObjectTypeSort ObjectTypeDetails
ahstore Attribute Hash Store Dimensions 20 Attribute
asstore Attribute String Store Dimensions 20 Attribute
astore Attribute Store Dimensions 20 Attribute
bsstore BLOB String Store Dimensions 20 BLOB
bstore BLOB Store Dimensions 20 BLOB
dstore Hierarchy Decoding Store Dimensions 20 Hierarchy
khstore Key Hash Store Dimensions 20 Key
ksstore Key String Store Dimensions 20 Key
kstore Key Store Dimensions 20 Key
lstore Structure Store Dimensions 20 Others
ostore Order Store Dimensions 20 Others
sstore Set Store Dimensions 20 Others
ustore ustore Dimensions 20 Others
xml XML Configuration 999 Configuration
fact.data Basedata Facts 10 Basedata
fact.data.hdr Basedata Header Facts 10 Basedata
fact.map Basedata Index Facts 10 Basedata
fact.map.hdr Basedata Index Header Facts 10 Basedata
rigid.data Rigid Aggregation Data Facts 10 Aggregations
rigid.data.hdr Rigid Aggregation Data Header Facts 10 Aggregations
rigid.map Rigid Aggregation Index Facts 10 Aggregations
rigid.map.hdr Rigid Aggregation Index Header Facts 10 Aggregations
flex.data Flexible Aggregation Data Facts 10 Aggregations
flex.data.hdr Flexible Aggregation Data Header Facts 10 Aggregations
flex.map Flexible Aggregation Index Facts 10 Aggregations
flex.map.hdr Flexible Aggregation Index Header Facts 10 Aggregations
string.data String Data (Distinct Count?) Facts 10 Basedata
cnt.bin Binary Configuration 999 Binaries
mrg.ccmap mrg.ccmap DataMining 999 DataMining
mrg.ccstat mrg.ccstat DataMining 999 DataMining
nb.ccmap nb.ccmap DataMining 999 DataMining
nb.ccstat nb.ccstat DataMining 999 DataMining
dt dt DataMining 999 DataMining
dtavl dtavl DataMining 999 DataMining
dtstr dtstr DataMining 999 DataMining
dmdimhstore dmdimhstore DataMining 999 DataMining
dmdimstore dmdimstore DataMining 999 DataMining
bin Binary Configuration 999 Binaries
OTHERS Others Others 99999 Others

As you can see the extension may contain 1, 2 or 3 parts. The more parts the more specific this file extension is. If you checked the result of the Power Query it also contains 3 columns, FileExtension1, FileExtension2 and FileExtension3. To join the two tables we first need to load both tables into Power Pivot. The next step is to create a proper column on which we can base our relationship. If the 3-part extension is found in the file extensions table, we use it, otherwise we check the 2-part extension and afterwards the 1-part extension and in case nothing matches we use “OTHERS”:

=SWITCH(TRUE(),
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension3]), [FileExtension3],
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension2]), [FileExtension2],
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension1]), [FileExtension1],
"OTHERS")

Then we can create a relationship between or PQ table and our file extension table. I also created some other calculated columns, hierarchies and measures for usability. And this is the final outcome:
PivotTable_Report

You can very easily see, how big your facts are, the distribution between base-data and Aggregations, the Dimensions sizes and you can drill down to each individual file! You can of course also create a Power View report if you want to. All visualizations are up to you, this is just a very simple example of a report.

Enjoy playing around with it!

Downloads:
(please note that I added a filter on the Database name as a last step of the Power Query to only show Adventure Works databases! In order to get all databases you need to remove this filter!)

SSAS Disk Analysis Workbook: SSAS_DiskAnalysis.xlsx

SAP HANA’s Big Data Scenario with Power BI

While browsing the web for any BI related topics I recently came across this blog post about SAP HANA and how it can be used to analyze Big Data. Its actually pretty cool, SAP together with Amazon Web Services (AWS) offer a free try out of their tools for 4 hours which you can use to rebuild a predefined demo. The demo itself is very well explained and document with videos and scripts and gives some good insights on how to deal with Big Data in SAP HANA. Basically it is divided into 3 steps:
1) Load Wikipedia data (Pagehits, etc.) from Hadoop/Hive (~2GB of flatfiles)
2) Create a data mart with SAP HANA
3) Analyze results with SAP Lumira

Having done a lot recently with Power BI and its tools I asked myself if this would also be possible with Power BI? So I first did the demo on SAP HANA and afterwards I was quite sure that I could do the same also with Power BI.

And this was the initiation of this blog post where we will do the same demo but instead of SAP HANA we will use only tools of the Power BI suite. Basically we only use 3 of our Power tools:
1) Power Query to load the data
2) Power Pivot to build the “data mart”
3) Power View to analyze the data

First of all we need to load the data. The demo uses data from Wikipedia where for each year, month, day and hour the number of pagehits and bytesdownloaded are monitored per page: http://dumps.wikimedia.org/other/pagecounts-raw/

This data was then moved to a Hadoop/Hive cluster on AWS S3 store which was made public available. The transformed files can be downloaded here (~250MB per file):
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=03/000000
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=03/000001
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=03/000002
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=04/000000
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=04/000001
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=04/000002
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=05/000000
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=05/000001
http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=05/000002

FileList

Once we have downloaded the files we can start loading them using Power Query’s “Load from Folder” source. It is basically textdata which needs to be split into several columns first. The delimiter used is 0x0001 which in Power Query’s M-language needs to be resolved to “#(0001)”:

= Table.SplitColumn(ImportedText ,"Column1",
    Splitter.SplitTextByDelimiter("#(0001)"),
    {"Column1.1", "Column1.2", "Column1.3", "Column1.4",
     "Column1.5", "Column1.6", "Column1.7", "Column1.8"})

The columns are defined as PROJECTCODE, PAGENAME, YEAR, MONTH, DAY, HOUR, PAGEHITCOUNTFORHOUR and BYTESDOWNLOADEDFORHOUR where PROJECTCODE can be further split into language code and the real project code. There is some more logic which I will not explain in detail like error handling, data conversions, etc., which are similar to what is done in the SAP HANA demo.

This is the final M-script I came up with:

let
    Source = Folder.Files("E:\SAP\Big Data"),
    FilteredRows = Table.SelectRows(Source, each ([Extension] = "")),
    CombinedBinaries = Binary.Combine(FilteredRows[Content]),
    ImportedText = Table.FromColumns({Lines.FromBinary(CombinedBinaries)}),
    SplitColumnDelimiter = Table.SplitColumn(ImportedText ,"Column1",Splitter.SplitTextByDelimiter("#(0001)"),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8"}),
    ChangedType = Table.TransformColumnTypes(SplitColumnDelimiter,{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type number}, {"Column1.4", type number}, {"Column1.5", type number}, {"Column1.6", type number}, {"Column1.7", type number}, {"Column1.8", type number}}),
    SplitColumnDelimiter1 = Table.SplitColumn(ChangedType,"Column1.1",Splitter.SplitTextByDelimiter("."),{"Column1.1.1", "Column1.1.2"}),
    ChangedType1 = Table.TransformColumnTypes(SplitColumnDelimiter1,{{"Column1.1.1", type text}, {"Column1.1.2", type text}}),
    ReplacedValue = Table.ReplaceValue(ChangedType1,null,"wp",Replacer.ReplaceValue,{"Column1.1.2"}),
    InsertedCustom = Table.AddColumn(ReplacedValue, "PageHitsPerHour", each try Number.From([Column1.7]) otherwise 0),
    InsertedCustom1 = Table.AddColumn(InsertedCustom, "BytesDownloadedPerHour", each try Number.From([Column1.8]) otherwise 0),
    RemovedColumns = Table.RemoveColumns(InsertedCustom1,{"Column1.7", "Column1.8"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"Column1.3", "Year"}, {"Column1.4", "Month"}, {"Column1.5", "Day"}, {"Column1.6", "Hour"}, {"Column1.2", "PageName"}, {"Column1.1.1", "LanguageCode"}, {"Column1.1.2", "ProjectCode"}}),
    InsertedCustom2 = Table.AddColumn(RenamedColumns, "Date", each #date([Year],[Month],[Day])),
    ChangedType2 = Table.TransformColumnTypes(InsertedCustom2,{{"PageHitsPerHour", type number}, {"BytesDownloadedPerHour", type number}, {"Date", type date}}),
    FilteredRows1 = Table.SelectRows(ChangedType2, each [PageHitsPerHour] < 53000000)
in
    FilteredRows1

If you also did the demo on SAP HANA you found some quality issues in the data why you needed to remove rows with more than 53,000,000 PageHitsPerHour as this 5 rows mess up the whole analysis. The above M-script already handles this (last statement).

There are some more important things to mention here. First of all we are dealing with about 37M rows, so loading to datasheet will not work. Instead we need to load the data directly into Power Pivot. I had several memory issues when using 32bit Excel but after switching to a 64bit Excel everything went just fine. The import itself takes about 15 minutes which I think is OK for roughly 2GB of data and 37M rows. 

Once the data is loaded into Power Pivot we need to load some additional data which are basically our lookup/dimension tables. They can also be imported from Wikipedia, in this case directly from the web:
http://wikipedia-proj-lang-codes.s3.amazonaws.com/UniqueProjectCodes.csv
http://wikipedia-proj-lang-codes.s3.amazonaws.com/UniqueLanguageCodes.csv

Both are very simple tables with only two columns. Again we can use Power Query to import them and add them to our Power Pivot data model.

The last thing to add is a time-dimension.SAP HANA has its predefined time-dimension. In the case of Power BI I simply used a linked table in Excel that holds all necessary days or actually all days of 2013. Adding all those tables and linking them we end up with this pretty simple Power Pivot model:

PowerPivot_Diagram

This is very similar to what you create during the demo as an analytical view in SAP HANA so both approaches are very similar here.

Now that the model is set up we can do our analysis using Power View and classic Excel Pivot Tables:

PowerView_Analysis

PivotTable_Report

All together it took me about 2 hours to build the whole solution of which it took about 1h to download and process the data. The final workbook containing all the data has ~500MB which is mainly because of the PageName column which contains a high number of unique values which cannot be compressed very well. After removing the PageName column and some further tuning of the datamodel for compression I could bring the size down to 148 MB. This is quite OK – in numbers this is a compression of 12 from originally 1.74 GB

I hope you understand that I could not attach the whole workbook, instead I created a smaller workbook with only 50k rows by adding a TOP filter in the Power Query. This workbook can be downloaded below.

As I showed in this post, Power BI is capable of handling this kind of data very well. Both, in terms of data volume and also in terms of the type of data (unstructured/semi-structured data). Once the data is loaded into Power Pivot it can be analyzed just like any other data source using all Excel reporting capabilities.

Downloads:
Sample Excel Sheet: Wikidata_small.xlsx
Demo Script SAP HANA: Demo Script SAP HANA.docx
Power Query M-Script: Power Query M-Script.docx

Trigger Cube-Processing from Excel using Cube Actions

Recently I faced the requirement to give specific users of my Analysis Services Multidimensional cube the opportunity to process the cube whenever they want. There are several ways to achieve this:
– start a SQL Agent that processes the cube
– create some kind of WebService that does the processing
– create some custom .NET application that does the processing
– run some script (PowerShell, Command Line, ascmd.exe, …)

 


NOTE:
The post describes a solution which requires SSAS Server Admin rights. If you are interested in a solution which triggers an asynchronous processing without requiring SSAS Server Admin Rights please scroll down to the very bottom and download VS_Solution.zip
It basically runs the same script but in a separate thread in the context of the service account so the calling user does not need to have any specific rights on the server.


From a usability point-of-view none of the above is really convenient as all options involve some kind of external process or application. In this post I will show how to process a cube using cube actions which can be triggered from Excel Pivot Tables natively. So this is what we want to achieve:

Excel_Action

 

This requires several steps:
1) create custom .Net assembly
2) add that custom .Net assembly to your Analysis Services database
3) add an action to your cube that calls the assembly

 

Creating a Custom .Net assembly:

First of all we need to create a new “Visual C#” project of type “Class Library”. To work with Analysis Services we need to add two references:
VS_References

“Microsoft.AnalysisServices” refers to Microsoft.AnalysisServices.dll which  can be found in the shared features folder of your installation (default is c:Program FilesMicrosoft SQL Server110SDKAssemblies)

“msmgdsrv” refers to msmgdsrv.dll which can found be in the OLAPbin-folder of your SSAS instance (default is c:Program FilesMicrosoft SQL ServerMSAS11.MSSQLSERVEROLAPbin)

Once these references are added, we can start to create our processing code:

using System;
using AdomdServer = Microsoft.AnalysisServices.AdomdServer;
using Microsoft.AnalysisServices;
using System.Data;

namespace ASSP_Processing
{
    public class Processing
    {
        public const string LoggingPrefix = “ASSP.ProcessObject: “;

        public enum ResultHandling
        {
            Datatable = 0,
            Exception = 1,
            NULL = 2
        }

        public static DataTable ProcessObject(string cubeName, ResultHandling resultHandling)
        {
            DataTable ret = new DataTable();
            ret.Columns.Add(“Result”);

            Server server = null;

            try
            {
                server = new Server();

                //connect to the current session…
                server.Connect(“*”);

                AdomdServer.Context.CheckCancelled(); //could be a bit long running, so allow user to cancel

                Database db = server.Databases.GetByName(AdomdServer.Context.CurrentDatabaseName);

                string objectToProcessName = “#N/A”;
                ProcessableMajorObject objectToProcess = db;

                AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Database <“ + db.Name + “> found!”);
                objectToProcessName = “DB[“ + db.Name + “]”;

                if (!string.IsNullOrEmpty(cubeName))
                {
                    Cube cube = db.Cubes.GetByName(cubeName);

                    if (cube != null)
                    {
                        objectToProcess = cube;
                        AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Cube <“ + cubeName + “> found!”);
                        objectToProcessName = objectToProcessName + ” > Cube[“ + cubeName + “]”;
                    }
                }

                if (objectToProcess != null)
                {
                    AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Processing Object “ + objectToProcessName + ” …”);

                    objectToProcess.Process(ProcessType.ProcessFull);

                    ret.Rows.Add(new object[] { “S U C C E S S:    Object “ + objectToProcessName + ” successfully processed!” });
                    AdomdServer.Context.TraceEvent(0, 0, LoggingPrefix + “Finished Processing Object “ + objectToProcessName + “!”);
                }
            }
            finally
            {
                try
                {
                    if (server != null)
                        server.Disconnect();
                }
                catch { }
            }

            // if processing was successful a row has been added beforehand
            if (ret.Rows.Count == 0)
            {
                ret.Rows.Add(new object[] { “F A I L U R E:    Error while processing an object!” });
            }

            switch (resultHandling)
            {
                case ResultHandling.Datatable:
                    return ret;

                case ResultHandling.Exception:
                    throw new Exception(Environment.NewLine + Environment.NewLine + Environment.NewLine + ret.Rows[0][0].ToString());

                case ResultHandling.NULL:
                    return null;
            }

            return null;
        }

        public static DataTable ProcessObject(string cubeName, int resultHandling)
        {
            return ProcessObject(cubeName, (ResultHandling)resultHandling);
        }

        public static DataTable ProcessObject(string cubeName)
        {
            return ProcessObject(cubeName, 1);
        }
    }
}

There are two things here that are worth to point out. First of all we need to somehow establish a connection to the server that hosts the SSAS database in order to process an object. The first thing that comes into mind would be to create a new connection to the server and run some processing XMLA. Unfortunately this does not work here because as this would result in a deadlock. The assembly will be called as a Rowset-Action which in terms of locking is similar to a query. So if we run separate processing command within our Rowset-Action this processing command can never be committed as there is always a query running on the database. The query (=Rowset-Action) will wait until the function is finished and the function waits until the processing is committed resulting in a classical deadlock!

To avoid this we need to connect to the current users sessions using “*” as our connection string:

//connect to the current session…
server.Connect(“*”);

 

The second thing to point out is the return value of our function. In order to be used in a Rowset-action, the function has to return a DataTable object. The results would then be displayed similar to a drill through and a separate worksheet would pop up in Excel showing the result:

Result_DataTable

From my point-of-view this can be a bit annoying as after each processing this new worksheet pops up and you loose the focus of the pivot table. So I investigated a bit and found another way to display the output of the processing.

When a drill through / action throws an exception, this will also be displayed in Excel without actually displaying the result of the drill through / action in a separate worksheet. By default it may look like this message:

Result_NULL

Well, not really much information here right?

To display more information, e.g. that the processing was successful (or not) we can throw our own exception in the code after processing is done:

Result_Exception

I added a separate parameter to control this behavior, in the code above the default would be option 3 – custom Exception.

 

Once this project is built the DLL can be added to the SSAS database. It is necessary to grant the assembly “Unrestricted” permission set:

AssemblySettings

 

The last step would be to create the appropriate cube actions that call the assembly:

CubeActionConfiguration

It is necessary that the action is of Type=Rowset, otherwise it would not be displayed in Excel! The Action Target is up the you, in this example it will be displayed on every cell. The Action expression is the actual call to our assembly passing the cube name as a parameter. The second parameter controls how the result is displayed and could also be omitted.  In this example I added 3 separate actions, one for each result-type (as described above).

And that’s it! This simple steps allow you and your users to process any cube (or any other database object) from within your favorite client application, assuming it supports cube actions.

This is the final result where the three cube actions can be triggered from an Excel pivot table:

Excel_Action_Final

 

The attached zip-file includes everything that is necessary to recreate this example:
– SSAS project
– ASSP_Processing project

CubeProcessingAction.zip

Solution for Asynchronous Processing without Admin-Rights:
VS_Solution.zip

SSAS Dynamic Security and Visual Totals

Security is always an important aspect of any BI solution. Especially for big enterprise solutions the security-concept can become very complex. Analysis Services Multidimensional  in general offers two option on how to define security: Cell Data Permissions and Dimension Data Permissions. Because of the huge impact on performance Cell Data Permissions are barely used. Dimension Data Permissions are usually the way to go and can cover probably 98% of the security requirements. Though, also Dimension Data Permissions can get quite complex, especially for large-scale cubes with 100+ or even 1000+ users. Just think of an Analysis Services that holds sales data associated to KeyAccounts – further assume that we have 100 KeyAccounts. Each of this KeyAccounts is managed by one user, so we would end up creating 100 roles – one for each KeyAccount and its manager. In terms of maintenance this can get quite complex and in the long run unmanageable.

To address this issue Analysis Services offers the concept of Dynamic Security. Dynamic Security is evaluated at runtime for each user and allows you to consolidate your roles. So for our example from above instead of having 100 different roles we would end up having one dynamic role serving all 100 users. Basically, when a user connects it queries internal data (within the cube using MDX) or external data (using Analysis Services Stored Procedure (ASSP)/Assemblies) to get the security settings for the current user. Especially in combination with ASSP this can be quite powerful as you can use any datasource that you can query using .Net-code to derive your security from.

To make a role "dynamic" you basically have two options:
1) Username() – returns the current users in format "MyDomainMyUser"
2) CustomData() – returns whatever was specified in the connectionstring’s CustomData-property

Both are native MDX-functions and can be used in an expression to build your Allowed Set, Denied Set or DefaultMember.

Role_UI

This works just fine, as long as the expressions returns a valid MDX set/member of course.
But what about the Visual Totals – setting? You can only check or uncheck the checkbox and you are not able to use any dynamic expressions. That’s what the UI offers you – no more no less. But there might be some rare scenarios where you also need to make the Visual Totals dynamically based on Username() or CustomData() – and this is what this post is about.

 

So, as we have already seen above, it is not possible to make the Visual Totals dynamically using the UI. Fortunately there are also other ways besides the UI to work with the cube and modify its structure. I am talking about XMLA (XML for Analysis Services) here, which is the protocol that is used for all communications between a client and Analysis Services. Instead of writing XMLA on your own there is also a programmatically way to do this: AMO (Analysis Management Object). AMO is a .Net library that allows you to integrate calls to Analysis Services within any .Net code. After modifying any object in the code AMO generates the according XMLA for you and sends it to the server which is a much more convenient way than writing plane XMLA on your own.

This means that you can also create/modify your security roles using AMO – this approach is already very well described by Benny Austin in his blog about SSAS: Using AMO to Secure Analysis Service Cube. I recommend reading this article before you continue here.

For this little example I create a slimmed down version of Adventure works. The model contains only the Date and the Product dimension. We will put our dynamic security on the Category attribute of our Product dimension. The model also already contains a predefined empty role called "DynamicVisualTotals" that we will modify later (after the model has been deployed) using AMO. Please note here that AMO only work with the Analysis Service directly (online) but not at design time in Visual Studio. Once everything is deployed this is the AMO code to use to modify the predefined role:

using Microsoft.AnalysisServices;

namespace ModifyRole_AMO
{
    class Program
    {
        static void Main(string[] args)
        {
            using (Server oServer = new Server())
            {
                oServer.Connect("localhost");

                using (Database oDB = oServer.Databases.GetByName("DynamicVisualTotals"))
                {
                    Role oRole = oDB.Roles.GetByName("DynamicVisualTotals");

                    Dimension oDim = oDB.Dimensions.GetByName("Product");
                    DimensionAttribute oAttr = oDim.Attributes.FindByName("Category");
                    DimensionPermission oDimPermission = oDim.DimensionPermissions.FindByRole(oRole.ID);

                    AttributePermission oAttrPermission = oDimPermission.AttributePermissions.Find(oAttr.ID);

                    // can be done in the UI
                    oAttrPermission.AllowedSet = "DynamicSecurity.DynamicAllowedSet(CustomData())";
                    // can not be done in the UI
                    oAttrPermission.VisualTotals = "DynamicSecurity.DynamicVisualTotals(CustomData())";

                    oDimPermission.Update();
                }
            }
        }
    }
}

1) connect to server "localhost"
2) connect to database "DynamicVisualTotals"
3) get the Role-object for role "DynamicVisualTotals" from the database
4) get the Dimension-object for dimension "Product"
5) get the DimensionAttribute-object for attribute "Category"
6) get the DimensionPermission that is associated to our Role for dimension "Product"
7) get the AttributePermission that is associated to our attribute "Category"

So far this is straight forward. The AttributePermission object actually represents the UI that we have seen above. It has properties for all the fields and textboxes on the UI. The most important of course are:
– AllowedSet
– DeniedSet
– Defaultmember
– and VisualTotals

For the first three I was already aware that they are of type String but the VisualTotals I expected to be Boolean which turned out to be wrong – it is also a String!

VisualTotalsIsString 

Naïve as I am I simply put a MDX expression in there that resolves to "1" or "0" (Boolean, similar to the Checkbox in the UI, it could probably also resolve to "true" or "false" or a Boolean type directly but I did not further test this) – and, it worked like a charm! The MDX expression I used refers to an ASSP called DynamicSecurity which contains a function called "DynamicVisualTotals" that returns "1" or "0" based on what was passed in the connectionstring using CustomData():

using Microsoft.AnalysisServices.AdomdServer;

namespace DynamicSecurity
{
    public static class DynamicSecurity
    {
        public static string DynamicVisualTotals(string customData)
        {
            if (customData == "1")
                return "1";

            return "0";
        }

        public static Set DynamicAllowedSet(string customData)
        {
            string mdxEpression;
            Expression exp;
            Set s;

            if (customData == "1")
                mdxEpression = "{[Product].[Category].&[1]}";

            else
                mdxEpression = "{[Product].[Category].&[3], [Product].[Category].&[4]}";

            exp = new Expression(mdxEpression);
            s = exp.CalculateMdxObject(null).ToSet();

            return s;
        }
    }
}

(This ASSP has to be compiled and added to the database first to make the Dynamic Security work. This is describe here in more detail.)

And that’s it!

Once everything is done
1) Deploy the solution to the sever
2) (add ASSP if not already included in the solution)
3) Execute the AMO code (first listing)
we can use Excel to check the results:

FinalResults

 

The used connection strings are as follows:

Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Data Source=localhost;Initial Catalog=DynamicVisualTotals;Roles=DynamicVisualTotals;CustomData=0

 

Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Data Source=localhost;Initial Catalog=DynamicVisualTotals;Roles=DynamicVisualTotals;CustomData=1

Make sure to also use the Roles-property for testing as if you are administrator these privileges would overwrite all other security settings!

 

Additional Notes:

As AMO is just a wrapper for XMLA we could also use XMLA directly to modify our role:

<Alter AllowCreate="true" ObjectExpansion="ObjectProperties" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    <Object>
        <DatabaseID>DynamicVisualTotals</DatabaseID>
        <DimensionID>Dim Product</DimensionID>
        <DimensionPermissionID>DimensionPermission</DimensionPermissionID>
    </Object>
    <ObjectDefinition>
        <DimensionPermission xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400">
            <ID>DimensionPermission</ID>
            <Name>DimensionPermission</Name>
            <RoleID>Role</RoleID>
            <Read>Allowed</Read>
            <AttributePermissions>
                <AttributePermission>
                    <AttributeID>Product Category Name</AttributeID>
                    <AllowedSet>DynamicSecurity.DynamicAllowedSet(CustomData())</AllowedSet>
                    <VisualTotals>DynamicSecurity.DynamicVisualTotals(CustomData())</VisualTotals>
                </AttributePermission>
            </AttributePermissions>
        </DimensionPermission>
    </ObjectDefinition>
</Alter>

 

Please note that XMLA always uses the internal IDs which may not be the same as the name of the objects!

 

In this post I showed how to use AMO and XMLA to modify a security role and use an expression to make the VisualTotals setting dynamic which is not possible using the UI only. This is just a very simplified example but I used this approach on some of my enterprise deployments and it works just fine also for very complex scenarios.

 

The attached zip-file includes everything that is necessary to recreate this example:
– SSAS project
– ASSP project
– AMO project (and XMLA script)
– final Excel with results

DynamicVisualTotals.zip