April 2014 – Gerhard Brueckl on BI & Data

In one of my recent posts I highlighted how multiple security roles are handled in tabular opposed to multidimensional Analysis Services models. In this post I will go into more detail and focus on how the security is evaluated for both models.

I would like to continue using the same example as in the last post with the following roles:
“Bikes” – is restricted to [ProductCategory] = “Bikes”
“Brakes” – is restricted to [ProductSubCategory] = “Brakes”
“DE” – is restricted to [Country] = “DE”

I used the following MDX query to test both, the tabular and the multidimensional model:

SELECT 
NON EMPTY {[Geography].[Country Region Name].[Country Region Name].MEMBERS} ON 0,
NON EMPTY {[Product].[Product Category Name].[Product Category Name].MEMBERS} ON 1
FROM [Model]
WHERE (
[Measures].[Reseller Total Sales]
)

The last thing I mentioned was how the combination of roles “Bikes” and “DE” could be expressed in SQL:

SELECT
    [ProductCategory],
    [Country],
    SUM([Reseller Sales])
FROM <table>
WHERE [ProductCategory] = ‘Bikes’
    OR [Country] = ‘Germany’
GROUP BY
    [Product Category],
    [Country] 

When running the previous MDX query on our tabular model using roles “Bikes” and “DE” we will see a very similar query being executed against our xVelocity Storage Engine:
(Note: There are also some SE engine queries executed to get the elements for rows and columns but the query below is the main query)

SELECT
[Geography_1fa13899-0770-4069-b7cb-5ddf22473324].[EnglishCountryRegionName], 
[Product_11920c93-05ae-4f1c-980e-466dfbcfca2a].[CalculatedColumn1 1],
SUM([Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c].[SalesAmount])
FROM [Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c]
LEFT OUTER JOIN [Reseller_d52b9c6f-8d2d-4e23-ae4c-2fc57c1d968a] 
        ON [Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c].[ResellerKey]
            =[Reseller_d52b9c6f-8d2d-4e23-ae4c-2fc57c1d968a].[ResellerKey]
LEFT OUTER JOIN [Geography_1fa13899-0770-4069-b7cb-5ddf22473324] 
    ON [Reseller_d52b9c6f-8d2d-4e23-ae4c-2fc57c1d968a].[GeographyKey]
        =[Geography_1fa13899-0770-4069-b7cb-5ddf22473324].[GeographyKey]
LEFT OUTER JOIN [Product_11920c93-05ae-4f1c-980e-466dfbcfca2a] 
    ON [Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c].[ProductKey]
        =[Product_11920c93-05ae-4f1c-980e-466dfbcfca2a].[ProductKey]
WHERE
(COALESCE((PFDATAID( [Product_11920c93-05ae-4f1c-980e-466dfbcfca2a].[CalculatedColumn1 1] ) = 6)) 
    OR 
COALESCE((PFDATAID( [Geography_1fa13899-0770-4069-b7cb-5ddf22473324].[CountryRegionCode] ) = 5)));

Well, not really readable so lets make it a bit nicer by removing those ugly GUIDs, etc:

SELECT
    [Geography].[EnglishCountryRegionName], 
    [Product].[ProductCategory],
    SUM([Reseller Sales].[SalesAmount])
FROM [Reseller Sales]
LEFT OUTER JOIN [Reseller] 
    ON [Reseller Sales].[ResellerKey] = [Reseller].[ResellerKey]
LEFT OUTER JOIN [Geography] 
    ON [Reseller].[GeographyKey] = [Geography].[GeographyKey]
LEFT OUTER JOIN [Product] 
    ON [Reseller Sales].[ProductKey] = [Product].[ProductKey]
WHERE [Product].[ProductCategory] = "Bikes"
OR [Geography].[CountryRegionCode] = "Germany";

This looks very similar to our SQL query. The special thing about it is the WHERE clause which combines the restrictions of both roles using OR which is then propagated also to our [Reseller Sales] fact table and that’s the reason why we see what we want and expect to see – all sales that were either made with “Bikes” OR made in “Germany”:

Another very important thing to note and remember here is that the security restrictions get propagated into and are evaluated within the query. This is done for each and every query (!) which is usually not a problem but may become crucial if you use dynamic security.
To test this with dynamic security I introduced a new role called “CustData” which is going to replace our “Bikes” for this test. It is restricted on table ‘Product’ as:

=([Product Category Name] = IF( CUSTOMDATA() = "1", "Bikes", "Clothing"))

So instead of using the connectionstring “…;Roles=Bikes,DE; …” I now used “…;Roles=CustData,DE;CustomData=1; …” which results in the exact same results of course. However, the query now changed to the following (already beautified) xVelocity query:

SELECT
    [Geography].[EnglishCountryRegionName], 
    [Product].[ProductCategory],
    SUM([Reseller Sales].[SalesAmount])
FROM [Reseller Sales]
LEFT OUTER JOIN [Reseller] 
    ON [Reseller Sales].[ResellerKey] = [Reseller].[ResellerKey]
LEFT OUTER JOIN [Geography] 
    ON [Reseller].[GeographyKey] = [Geography].[GeographyKey]
LEFT OUTER JOIN [Product] 
    ON [Reseller Sales].[ProductKey] = [Product].[ProductKey]
WHERE [Product].$ROWFILTER IN '0x000000…000000000000000000000fffff00000000000ffffffff'));
OR [Geography].[CountryRegionCode] = "Germany";

Instead of using a direct filter on [ProductCategory] we now see a filter on $ROWFILTER ?!?

### ASSUMPTIONS START ###
I have to admit that I am currently not entirely sure what this means but I assume the following:
Preceding the main query another xVelocity query is executed which is important for us:

SELECT
[Product].[RowNumber], 
[Product].[ProductCategory],
COUNT()
FROM [Product];

This query fetches each [RowNumber] and its associated [ProductCategory]. Internally the [RowNumber] column is created for every table. This is related to the columnar storage that xVelocity uses. Elaborating this in detail would go far beyond the context of this blog post. For more details on the RowNumber-column please refer too http://msdn.microsoft.com/en-us/library/hh622558(v=office.15).aspx which describes the Excel data model which is actually Power Pivot and therefore also applies to Tabular. (In general this link contains a lot of in-depth information on the tabular data model and the columnar storage concepts!)

I further assume that our security-expression is then evaluated against this temporary table to create an bitmap index of which rows match the security-expression and which don’t. This result is then applied to our main query which using the WHERE clause [Product].$ROWFILTER IN ‘0x0000….’
For all subsequent queries the above query on [RowNumber] and [ProductCategory] is not executed again so I assume that the bitmap index gets cached internally by Analysis Services. I further assume that if the bitmap index gets cached it is also shared between users belonging to the same role which would be similar to multidimensional models.
### ASSUMPTIONS END ###

So the important take-away for tabular is that the security gets propagated into the query and down to the fact table. Combining multiple roles on this level using OR delivers the expected results.

For multidimensional models this is slightly different. You can define security on either the Database Dimension (which gets inherited down to all Cube Dimension) or you can define security on the Cube Dimension directly. Defining security on the Database Dimension already makes it very clear that the security restrictions are independent of any fact table/measure group. A Database Dimension may be used in several cubes so the engine cannot know in advance which measure group to use. Security for multidimensional models is defined on the multidimensional space defined by that dimension. If one role is not restricted on a dimension at all, the user will always see the whole dimension and its associated values, even if a second role would restrict that dimension. And this causes unexpected results as the user may see the whole cube.
In terms of SQL the query could be expressed as:

SELECT 
    [ProductCategory], 
    [Country], 
    SUM([Reseller Sales]) 
FROM <table> 
WHERE ( [ProductCategory] = 'Bikes' OR 1 = 1)
    OR ( 1 = 1  OR [Country] = 'Germany')
GROUP BY 
    [Product Category], 
    [Country]

The left side of the inner OR statements represents the role “Bikes” whereas the right part represents the “DE” role. It should be very obvious that due to the combination of both roles you finally see everything without any restriction!

Another important thing to point out is that security for multidimensional models is only evaluated once and not for every query. So even if you have complex dynamic security its evaluation only hits you once for the first user that connects to a role and is cached for all queries of that user and also shared with other users belonging to that role.

I hope this gives some more insights on how tabular and multidimensional handle multiple roles and their fundamental differences!

UPDATE AUGUST 2014:
There were some changes to the DMG in August 2014. Please refer to my new blog post which addresses the issues with the new version! However, I still recommend you to read this post first in order to fully understand the original issue!
The new post can be found here.

I am currently preparing some demos and examples for Power BI. As you can expect for demos you do not want to put too much effort in building up any infrastructure so I decided to use an Azure VM to host my SQL databases and SSAS cubes. Keeping things simple the Azure VM is not joined to a domain which is fine for SQL where I can use SQL authentication, for SSAS I use msmdpump.dll. After everything was set up I wanted to install the Data Management Gateway to expose my SQL tables via OData to Power Query and Online Search.
Bryan C. Smith recently published an article on that very same topic Creating a Demo Power BI Data Gateway using an Azure Virtual Machine but for some reasons it did not work for me. Further, as Bryan already mentions in the first paragraph, his setup is not supported and its also a bit of a hack (modifying hosts-file, and so on).
So I started my own investigations and came up with another solution, which only uses out-of-the-box features and tools and is actually quite simple. Another thing to mention here is that it will (probably) not work for scheduled data refreshes but only for exposing the SQL database via OData and make it searchable in Power Query.
Having that said, here are the steps to follow:

1) Setup the Data Management Gateway itself on the Azure VM as described here: Create a Data Management Gateway. This should work just fine and the Gateway should be in the “Registered”-state on the Azure VM and in “Ready”-state in the Power BI Admin Center:

2) Create a new Data Source on top of the previously created Gateway as described here: Create a Data Source and Enable OData Feed in Power BI Admin Center

Here you will usually receive an error when you want to enter credentials for the SQL Database:

By Clicking on the [credentials]-button a new window pops up. Please note that this is a click-once application that actually runs on your client and is independent of your actual browser!

If the Gateway is running on an Azure VM, or basically any machine which cannot be reached from your current client you will receive an error that a connection could not be established or something similar.
Assuming you called your Azure VM “MyCloudServer” and is perfectly reachable via “MyCloudServer.cloudapp.net” you will receive an error saying that “MyCloudServer” (without “.cloudapp.net”) could not be resolved. Which is actually true as the correct server would be “MyCloudServer.cloudapp.net”. Unfortunatelly, this server name cannot be changed anywhere as far as I know. As the name cannot be changed we need to make the name somehow “resolveable”. Bryan manually modifies the hosts file and makes “MyCloudServer” point to the public IP address of “MyCloudServer.cloudapp.net”. This should usually work just fine, but somehow did not work for me. Also the public IP address may change if you reboot your Azure VM and so you would need to modify the hosts-file again.

So these are the findings we mad so far:
– the Data Source Manager is a click-once application which runs on the client
– the client must be able to resolve “MyCloudServer”

After some thinking I ended up with the following:
The only machine in my scenario that can correctly resolve “MyCloudServer” is the Azure VM itself! So instead of running the Data Source Manager on my client I simply connected to the Power BI Admin Center from my server and repeated the steps from above there.
Now everything works fine and we can proceed:

This connectivity check is only done once and has no further impact (I am not 100% sure on this Smile ). Though, the Username and Password are stored and used for all subsequent connection through the gateway, e.g. for OData access so make sure the user has the necessary access rights.