MDX Script Debugger – Beta-Release

I recently built a tool which should help to debug the MDX scripts of an Analysis Services cube in order to track down formula engine related performance issues of a cube. As you probably know most of the performance issues out there are usually caused by poorly or wrong written MDX scripts. This tool allows you to execute a reference query and highlights the MDX script commands that are effectively used when the query is run. It provides the overall timings and how long each additional MDX script command extended the execution time of the reference query. The results can then either be exported to XML for further analysis in e.g. Power BI or a customized version of the MDX script can be created and used to run another set of tests.

The tool is currently in a beta state and this is the first official release – and also my first written tool that I share publicly so please don’t be too severe with your feedback Open-mouthed smile  – just joking every feedback is good feedback!

Below is a little screenshot which shows the results after  the reference query is executed. The green lines are effectively used by the query whereas the others do not have any impact on the values returned by the query.

ResultsView

A list of all features, further information and also the source code can be found at the project page on codeplex:
https://mdxscriptdebugger.codeplex.com/
Also the download is available from there:
https://mdxscriptdebugger.codeplex.com/releases

Looking forward to your feedback and hope it helps you to track down performance issues of your MDX Scripts!

Configure HTTP access to Analysis Services using PowerShell

Recently I had to setup an Analysis Services cube and expose it to external users. This is usually done by using Internet Information Server (IIS) and creating a new WebSite which hosts msmdpump.dll. This DLL more or less wraps XMLA commands inside HTTP thus allowing external users to access the cube via HTTP. Besides Windows Authentication this setup also allows Basic Authentication and so external users can simply connect by specifying Username and Password in e.g. Excel when connecting to the cube:
ExcelConnectionDialog

There are already a lot of whitepapers out there which describe how to set things up correctly. Here are just some examples:
– MSDN: http://msdn.microsoft.com/en-us/library/gg492140.aspx
– MSBI Academy (great video!): http://msbiacademy.com/?p=5711 by Rob Kerr

They provide very useful information and you should be familiar with the general setup before proceeding here or using the final PowerShell script.

The PowerShell script basically performs the following steps:

  1. Create a local folder as base for your WebSite in IIS
  2. Copy SSAS ISAPI files (incl. msmdpump.dll) to the folder
  3. Create and Configure an IIS AppPool
  4. Create and Configure a IIS WebSite
  5. Add and enable an ISAPI entry for msmdpump.dll
  6. Configure Authentication
  7. Configure Default Document
  8. Update connection information to SSAS server

I tested it successfully with a clean installation of IIS 8.0 (using applicationhost.config.clean.install). In case you already have other WebSites running you may still consider doing the steps manually or adopting the script if necessary. The script is written not to overwrite any existing Folders, WebSites, etc. but you never know.

So here is my final script:

  1. #Import Modules
  2. Import-Module WebAdministration
  3.  
  4. # change these settings
  5. $iisSiteName = "OLAP"
  6. $iisPort = "8000"
  7. $olapServerName = "server\instance"
  8.  
  9. # optionally also change these settings
  10. $isapiFiles = "c:\Program Files\Microsoft SQL Server\MSAS11.MSSQLSERVER\OLAP\bin\isapi\*"
  11. $iisAbsolutePath = "C:\inetpub\wwwroot\" + $iisSiteName
  12. $iisAppPoolName = $iisSiteName + "_AppPool"
  13. $iisAppPoolUser = "" #default is ApplicationPoolIdentity
  14. $iisAppPoolPassword = ""
  15. $iisAuthAnonymousEnabled = $false
  16. $iisAuthWindowsEnabled = $true
  17. $iisAuthBasicEnabled = $true
  18. $olapSessionTimeout = "3600" #default
  19. $olapConnectionPoolSize = "100" #default
  20.  
  21. if(!(Test-Path $iisAbsolutePath -pathType container))
  22. {
  23.     #Creating Directory
  24.     mkdir $iisAbsolutePath  | Out-Null
  25.  
  26.     #Copying Files
  27.     Write-Host -NoNewline "Copying ISAPI files to IIS Folder … "
  28.     Copy -Path $isapiFiles -Destination $iisAbsolutePath -Recurse
  29.     Write-Host " Done!" -ForegroundColor Green
  30. }
  31. else
  32. {
  33.     Write-Host "Path $iisAbsolutePath already exists! Please delete manually if you want to proceed!" -ForegroundColor Red
  34.     Exit
  35. }
  36.  
  37. #Check if AppPool already exists
  38. if(!(Test-Path $("IIS:\\AppPools\" + $iisAppPoolName) -pathType container))
  39. {
  40.     #Creating AppPool
  41.     Write-Host -NoNewline "Creating ApplicationPool $iisAppPoolName if it does not exist yet … "
  42.     $appPool = New-WebAppPool -Name $iisAppPoolName
  43.     $appPool.managedRuntimeVersion = "v2.0"
  44.     $appPool.managedPipelineMode = "Classic"
  45.  
  46.     $appPool.processModel.identityType = 4 #0=LocalSystem, 1=LocalService, 2=NetworkService, 3=SpecificUser, 4=ApplicationPoolIdentity
  47.     #For details see http://www.iis.net/configreference/system.applicationhost/applicationpools/add/processmodel
  48.  
  49.     if ($iisAppPoolUser -ne "" -AND $iisAppPoolPassword -ne "") {
  50.         Write-Host
  51.         Write-Host "Setting AppPool Identity to $iisAppPoolUser"
  52.         $appPool.processmodel.identityType = 3
  53.         $appPool.processmodel.username = $iisAppPoolUser
  54.         $appPool.processmodel.password = $iisAppPoolPassword
  55.     }
  56.     $appPool | Set-Item
  57.     Write-Host " Done!" -ForegroundColor Green
  58. }
  59. else
  60. {
  61.     Write-Host "AppPool $iisAppPoolName already exists! Please delete manually if you want to proceed!" -ForegroundColor Red
  62.     Exit
  63. }
  64.  
  65. #Check if WebSite already exists
  66. $iisSite = Get-Website $iisSiteName
  67. if ($iisSite -eq $null)
  68. {
  69.     #Creating WebSite
  70.     Write-Host -NoNewline "Creating WebSite $iisSiteName if it does not exist yet … "
  71.     $iisSite = New-WebSite -Name $iisSiteName -PhysicalPath $iisAbsolutePath -ApplicationPool $iisAppPoolName -Port $iisPort
  72.     Write-Host " Done!" -ForegroundColor Green
  73. }
  74. else
  75. {
  76.     Write-Host "WebSite $iisSiteName already exists! Please delete manually if you want to proceed!" -ForegroundColor Red
  77.     Exit
  78. }
  79.  
  80. #Ensuring ISAPI CGI Restriction entry exists for msmdpump.dll
  81. if ((Get-WebConfiguration "/system.webServer/security/isapiCgiRestriction/add[@path='$iisAbsolutePath\msmdpump.dll']") -eq $null)
  82. {
  83.     Write-Host -NoNewline "Adding ISAPI CGI Restriction for $iisAbsolutePath\msmdpump.dll … "
  84.     Add-WebConfiguration "/system.webServer/security/isapiCgiRestriction" -PSPath:IIS:\\  -Value @{path="$iisAbsolutePath\msmdpump.dll"}
  85.     Write-Host " Done!" -ForegroundColor Green
  86. }
  87. #Enabling ISAPI CGI Restriction for msmdpump.dll
  88. Write-Host -NoNewline "Updating existing ISAPI CGI Restriction … "
  89. Set-WebConfiguration "/system.webServer/security/isapiCgiRestriction/add[@path='$iisAbsolutePath\msmdpump.dll']/@allowed" -PSPath:IIS:\\ -Value "True"
  90. Set-WebConfiguration "/system.webServer/security/isapiCgiRestriction/add[@path='$iisAbsolutePath\msmdpump.dll']/@description" -PSPath:IIS:\\ -Value "msmdpump.dll for SSAS"
  91. Write-Host " Done!" -ForegroundColor Green
  92.  
  93.  
  94. #Adding ISAPI Handler to WebSite
  95. Write-Host -NoNewline "Adding ISAPI Handler … "
  96. Add-WebConfiguration /system.webServer/handlers -PSPath $iisSite.PSPath -Value @{name="msmdpump"; path="*.dll"; verb="*"; modules="IsapiModule"; scriptProcessor="$iisAbsolutePath\msmdpump.dll"; resourceType="File"; preCondition="bitness64"}
  97. Write-Host " Done!" -ForegroundColor Green
  98.  
  99. #enable Windows and Basic Authentication
  100. Write-Host -NoNewline "Setting Authentication Providers … "
  101. #need to Unlock sections first
  102. Set-WebConfiguration /system.webServer/security/authentication/anonymousAuthentication  MACHINE/WEBROOT/APPHOST -Metadata overrideMode -Value Allow
  103. Set-WebConfiguration /system.webServer/security/authentication/windowsAuthentication  MACHINE/WEBROOT/APPHOST -Metadata overrideMode -Value Allow
  104. Set-WebConfiguration /system.webServer/security/authentication/basicAuthentication  MACHINE/WEBROOT/APPHOST -Metadata overrideMode -Value Allow
  105.  
  106. Set-WebConfiguration /system.webServer/security/authentication/anonymousAuthentication -PSPath $iisSite.PSPath -Value @{enabled=$iisAuthAnonymousEnabled}
  107. Set-WebConfiguration /system.webServer/security/authentication/windowsAuthentication -PSPath $iisSite.PSPath -Value @{enabled=$iisAuthWindowsEnabled}
  108. Set-WebConfiguration /system.webServer/security/authentication/basicAuthentication -PSPath $iisSite.PSPath -Value @{enabled=$iisAuthBasicEnabled}
  109. Write-Host " Done!" -ForegroundColor Green
  110.  
  111. #Adding Default Document
  112. Write-Host -NoNewline "Adding Default Document msmdpump.dll … "
  113. Add-WebConfiguration /system.webServer/defaultDocument/files -PSPath $iisSite.PSPath -atIndex 0 -Value @{value="msmdpump.dll"}
  114. Write-Host " Done!" -ForegroundColor Green
  115.  
  116. #Updating OLAP Server Settings
  117. Write-Host -NoNewline "Updating OLAP Server Settings … "
  118. [xml]$msmdpump = Get-Content "$iisAbsolutePath\msmdpump.ini"
  119. $msmdpump.ConfigurationSettings.ServerName = $olapServerName
  120. $msmdpump.ConfigurationSettings.SessionTimeout = $olapSessionTimeout
  121. $msmdpump.ConfigurationSettings.ConnectionPoolSize = $olapConnectionPoolSize
  122. $msmdpump.Save("$iisAbsolutePath\msmdpump.ini")
  123. Write-Host " Done!" -ForegroundColor Green
  124.  
  125. Write-Host "Everything done! "
  126. Write-Host "The SSAS server can now be accessed via http://$env:computername`:$iisPort"

 

The script can also be downloaded here.

The process of setting up HTTP connectivity is the same for Analysis Services Multidimensional and Tabular so the script works for both scenarios, just change the server name accordingly.

Hiding Dimension Details in Analysis Services

An Analysis Services cube is usually accessed by a wide variety of people all of them having different roles in the business. For example Product Managers, Sales Representatives, Key Account Managers and so on. Sometimes it is necessary to hide detailed information of a given dimension or attribute from a certain user or role. In this post I will show you how this can be accomplished.

Lets take a simple example and assume we have the role “ProductManager” which must not be allowed to see any specific details of a single customer but must be able to see aggregation levels above the customer like Gender, Education, etc.

This can be accomplished quite easily by changing the security settings of our Product Manager role. We navigate to “Dimension Data” > Dimension “Customer” > Attribute “Customer”.
Next go to the “Advanced” tab and manually specify your Allowed Set as:
{[Customer].[Customer].[All Customers]}
Role_ProductManager

One important thing here is to uncheck “Enable Visual Totals” as shown above, otherwise you will not see any data at all! An end-user which owns the role ProductManager would see the following result in Excel:
PivotTable_ProductManager
He sees aggregated values on Education and Gender Level but can not see details of any single customer.

A scenario that’s a bit more complex to achieve is to hide not only one attribute but all attributes of a given dimension. Of course, we could use the approach as described above and do it for each and every attribute in the dimension. This would technically work, but its timely to implement and also hard to maintain (e.g. if new attributes are added). Ideally we only want to define it once in a single place and it should work for all attributes, existing and also new ones. We can make use auf Auto-Exists within our security role here. Auto-Exists basically reflect the security defined on one attribute to all other attributes in the same dimension. So if you are restricted to [Country]=”Austria” you can only see Cities in Austria and for Continents you would only see Europe but not America.

Having this in mind we could create a dummy-customer (which has no associated facts) on which we put our security. Instead of creating a dummy-customer manually in our relational DB we can also make use of the special UnknownMember which can be activated for any dimension. Doing this SSAS automatically creates a dummy-member for us which can be used for our security purposes:
Dimension_UnknownMember
Note that we have set the UnkownMemberName to “No Details Available” to inform the user that there are no details available for him/her.
The role itself would then specify the UnknownMember in the Allowed Set as:
{[Customer].[Customer].UNKNOWNMEMBER}
Role_UnknownMember
Again, make sure that “Enable Visual Totals” is unchecked!

Now our Excel looks as follows:
PivotTable_UnknownMember
(Note: “Show items with no data on rows” was enabled in PivotTable options here)

Using any attribute or hierarchy of our secured dimension in a filter would show this result:
Filter_UnkownMember

One last option you may consider is to set the Visibility of the UnknownMember to “Hidden” thus also hiding the UnknownMember and only revealing the All-members which was our initial requirement.  However, I think using the UnknownMember’s name to provide some further information for the end-user makes quite sense e.g. naming it “No Details Available due to Security Restrictions” or “Restricted by Security”.
This is of course not possible if you create your dummy-customer manually that’s why I prefer using the UnkownMember.

Note1:
During my tests I came across a bug which when you change the Visibility of the UnkownMember after you have already defined security makes you end up seeing all details. Even a redeploy or ProcessFull do not solve the problem. What you need to do is to explicitly change the definition of the role (e.g. by adding a blank to the Allowed Set) and redeploy the role afterwards.

Note2:
None of the above works for SSAS tabular as it does not have the concept of (disabled) VisualTotals which is essential for this solution to work!

Restoring a SSAS Tabular Model to Power Pivot

It is a very common scenario to create a SSAS Tabular Model out of an Power Pivot Model contained in an Excel workbook. Microsoft even created an wizard (or actually a separate Visual Studio project) that supports you doing this. Even further, this process is also a major part of Microsoft’s strategy to cover Personal BI, Team BI and Corporate BI within one technology being xVelocity. This all works just fine but there may also be scenarios where you need to do it the other way round – converting a Tabular model to Power Pivot. Several use-cases come into my mind but I am sure that the most important one is to making data available offline for e.g. sales people to take it with them on their every day work. And in this blog post I will show how this can be done!

But before taking a closer look into how this can be accomplished, lets first see how the import from Power Pivot to SSAS Tabular works. To do this start SQL Server Profiler and connect to your tabular instance. Then create a new Tabular project in Visual Studio based on an existing Power Pivot workbook. At this point you will notice a lot of events happening on our SSAS Tabular server. The most important event for us is “Command End” with the EventSubclass “9 – restore”:
RestoreFromPowerPivot

SSAS actually restores a backup from a “Model.abf” backup file which is located in our project directory that we just created:
BackupExtractedFromPowerPivot

So far so good – but where does this file come from?

Well, the origin of the file has to be our Excel workbook that we imported. Knowing that all new office formats ending with “x” (.xlsx, .docx, …) are basically ZIP files, we can inspect our original Excel workbook by simply rename it to “.zip”. This allows us to browse the Excel file structure:
ExcelUnzipped

We will find a folder called “xl” which contains a sub-folder called “model”. This folder contains one item called “item.data”. If you take a closer look at the file size you may realize that both, the “Model.abf” file that we restored and the “item.data” file from our Excel workbook have the exact same size:
FileProperties

A Coincidence? Not really!

What happens behind the scenes when you import a Power Pivot model into SSAS Tabular is that this “item.data” file gets copied into your project directory and is renamed to “Model.abf” and then restored to the SSAS Tabular workspace instance by using an standard database restore.

Having this information probably makes you think: If it works in one direction, why wouldn’t it also work the other way round? And it does!

So here are the steps that you need to do in order to restore your SSAS Tabular backup into an Excel Power Pivot workbook:

  1. Create a backup of your SSAS Tabular database and rename it to “item.data”
  2. Create an empty Excel workbook and add a simple linked table to the Excel data model (which is actually Power Pivot).
    This is necessary to tell Excel that the workbook contains a Power Pivot model which has to be loaded once the file is opened.
  3. Close the Excel workbook and rename it from “MyFile.xlsx” to “MyFile.xlsx.zip”
  4. Open the .zip-file in Windows Explorer and locate the “\xl\model\”-folder
  5. Replace the “item.data” file with the file that you created in step 1.
  6. Rename the .zip-file back to “MyFile.xlsx”
  7. Open the Excel Workbook
  8. Voilá! You can now work with the data model as with any other Power Pivot model!

I tested this with a SSAS Tabular backup from SQL Server 2012 SP1 being restored to the streamed version of Excel from Office 365 with the latest version of Power Pivot. I assume that it also works with older versions but have not tested all combinations yet.

There are also some features that will not work, for example roles. If your Tabular database contains roles you will not be able to use this approach. Excel will complain that the Power Pivot model is broken. However, other Tabular features like partitions actually work with the little limitation that you cannot change them later on in the Power Pivot model or process them separately:
PowerPivotPartitions
Another thing to note here is that only up to 3 partitions are allowed, otherwise you will get the same error as for roles. I think this is related to the limitation of 3 partitions for SQL Server Analysis Services Standard Edition as Chris Webb described here.

Besides these obvious features there are also some other cool things that you can do in Tabular which are not possible in Power Pivot. Most (or actually all) of them are accessible only by using BIDS Helper – a great THANK YOU to the developers of BIDS Helper at this point!
BIDS Helper enables you to add classical multidimensional features also to Tabular models which is not possible using standard Visual Studio only. Those include:

  • DisplayFolders
  • Translations (metadata only)
  • Actions

I tested it for DisplayFolders and Actions and both are working also in Power Pivot after the backup was restored and I further assume that all the other things will also work just fine.
Simply keep in mind that Power Pivot is basically a fully featured Analysis Services instance running within Excel!

For my (and your) convenience I also created a little PowerShell script that does all the work:

  1. # Load the assembly with the ZipFile class
  2. [System.Reflection.Assembly]::LoadWithPartialName("System.IO.Compression.FileSystem") | Out-Null
  3. # Load the assembly to access Analysis Services
  4. [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices") | Out-Null
  5. # Also install "Analysis Services PowerShell" according to http://technet.microsoft.com/en-us/library/hh213141.aspx
  6.  
  7. # INPUT-Variables, change these to match your environment
  8. $rootFolder = "D:\Test_PowerShell\"
  9. $emptyExcelFile = $rootFolder + "EmptyExcel.xlsx"
  10. $ssasServerName = "localhost\TAB2012"
  11. $ssasDatabaseName = "AdventureWorks"
  12.  
  13. # internal variables
  14. $newExcelFile = $rootFolder + $ssasDatabaseName + ".xlsx"
  15. $newExcelFileZip = $newExcelFile + ".zip"
  16. $unzipFolder = $rootFolder + "TEMP_ExcelUnzipped"
  17. $backupFile = $rootFolder + $ssasDatabaseName + ".abf"
  18. $itemDestination = $unzipFolder + "\xl\model\item.data"
  19.  
  20. # Copy the empty Excel file and rename it to ".zip"
  21. Copy-Item -Path $emptyExcelFile -Destination $newExcelFileZip
  22.  
  23. # Unzip the file using the ZipFile class
  24. [System.IO.Compression.ZipFile]::ExtractToDirectory($newExcelFileZip, $unzipFolder)
  25.  
  26. # Create a backup of the SSAS Tabular database
  27. Backup-ASDatabase -Server $ssasServerName -Name $ssasDatabaseName -BackupFile $backupFile -AllowOverwrite -ApplyCompression
  28.  
  29. # Copy the backup-file to our extracted Excel folder structure
  30. Copy-Item -Path $backupFile -Destination $itemDestination -Force
  31.  
  32. # Check if the target file exists and delete it
  33. if (Test-Path -Path $newExcelFile) { Remove-Item -Path $newExcelFile }
  34.  
  35. # Zip the folder-structure again using the ZipFile class and rename it to ".xlsx"
  36. [System.IO.Compression.ZipFile]::CreateFromDirectory($unzipFolder, $newExcelFile)
  37.  
  38. # Cleanup the unecessary files
  39. Remove-Item -Path $unzipFolder -Recurse
  40. Remove-Item -Path $backupFile
  41. Remove-Item -Path $newExcelFileZip

The last thing to mention here is that I don’t know if this is officially supported in any way by Microsoft – actually I am pretty sure it is not – so watch out what you are doing and don’t complain if something is not working as expected.

Analysis Services Security: Multiple Roles in Tabular vs. Multidimensional – Part 2

In one of my recent posts I highlighted how multiple security roles are handled in tabular opposed to multidimensional Analysis Services models. In this post I will go into more detail and focus on how the security is evaluated for both models.

I would like to continue using the same example as in the last post with the following roles:
“Bikes” – is restricted to [ProductCategory] = “Bikes”
“Brakes” – is restricted to [ProductSubCategory] = “Brakes”
“DE” – is restricted to [Country] = “DE”

I used the following MDX query to test both, the tabular and the multidimensional model:

  1. SELECT
  2. NON EMPTY {[Geography].[Country Region Name].[Country Region Name].MEMBERS} ON 0,
  3. NON EMPTY {[Product].[Product Category Name].[Product Category Name].MEMBERS} ON 1
  4. FROM [Model]
  5. WHERE (
  6. [Measures].[Reseller Total Sales]
  7. )

The last thing I mentioned was how the combination of roles “Bikes” and “DE” could be expressed in SQL:

  1. SELECT
  2.     [ProductCategory],
  3.     [Country],
  4.     SUM([Reseller Sales])
  5. FROM <table>
  6. WHERE [ProductCategory] = ‘Bikes’
  7.     OR [Country] = ‘Germany’
  8. GROUP BY
  9.     [Product Category],
  10.     [Country]

 

When running the previous MDX query on our tabular model using roles “Bikes” and “DE” we will see a very similar query being executed against our xVelocity Storage Engine:
(Note: There are also some SE engine queries executed to get the elements for rows and columns but the query below is the main query)

  1. SELECT
  2. [Geography_1fa13899-0770-4069-b7cb-5ddf22473324].[EnglishCountryRegionName],
  3. [Product_11920c93-05ae-4f1c-980e-466dfbcfca2a].[CalculatedColumn1 1],
  4. SUM([Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c].[SalesAmount])
  5. FROM [Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c]
  6. LEFT OUTER JOIN [Reseller_d52b9c6f-8d2d-4e23-ae4c-2fc57c1d968a]
  7.         ON [Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c].[ResellerKey]
  8.             =[Reseller_d52b9c6f-8d2d-4e23-ae4c-2fc57c1d968a].[ResellerKey]
  9. LEFT OUTER JOIN [Geography_1fa13899-0770-4069-b7cb-5ddf22473324]
  10.     ON [Reseller_d52b9c6f-8d2d-4e23-ae4c-2fc57c1d968a].[GeographyKey]
  11.         =[Geography_1fa13899-0770-4069-b7cb-5ddf22473324].[GeographyKey]
  12. LEFT OUTER JOIN [Product_11920c93-05ae-4f1c-980e-466dfbcfca2a]
  13.     ON [Reseller Sales_fc635e72-28dc-4156-80d5-43b805f8df1c].[ProductKey]
  14.         =[Product_11920c93-05ae-4f1c-980e-466dfbcfca2a].[ProductKey]
  15. WHERE
  16. (COALESCE((PFDATAID( [Product_11920c93-05ae-4f1c-980e-466dfbcfca2a].[CalculatedColumn1 1] ) = 6))
  17.     OR
  18. COALESCE((PFDATAID( [Geography_1fa13899-0770-4069-b7cb-5ddf22473324].[CountryRegionCode] ) = 5)));

Well, not really readable so lets make it a bit nicer by removing those ugly GUIDs, etc:

  1. SELECT
  2.     [Geography].[EnglishCountryRegionName],
  3.     [Product].[ProductCategory],
  4.     SUM([Reseller Sales].[SalesAmount])
  5. FROM [Reseller Sales]
  6. LEFT OUTER JOIN [Reseller]
  7.     ON [Reseller Sales].[ResellerKey] = [Reseller].[ResellerKey]
  8. LEFT OUTER JOIN [Geography]
  9.     ON [Reseller].[GeographyKey] = [Geography].[GeographyKey]
  10. LEFT OUTER JOIN [Product]
  11.     ON [Reseller Sales].[ProductKey] = [Product].[ProductKey]
  12. WHERE [Product].[ProductCategory] = "Bikes"
  13. OR [Geography].[CountryRegionCode] = "Germany";

This looks very similar to our SQL query. The special thing about it is the WHERE clause which combines the restrictions of both roles using OR which is then propagated also to our [Reseller Sales] fact table and that’s the reason why we see what we want and expect to see – all sales that were either made with “Bikes” OR made in “Germany”:
MultipleRoles_tabular

Another very important thing to note and remember here is that the security restrictions get propagated into and are evaluated within the query. This is done for each and every query (!) which is usually not a problem but may become crucial if you use dynamic security.
To test this with dynamic security I introduced a new role called “CustData” which is going to replace our “Bikes” for this test. It is restricted on table ‘Product’ as:

  1. =([Product Category Name] = IF( CUSTOMDATA() = "1", "Bikes", "Clothing"))

So instead of using the connectionstring “…;Roles=Bikes,DE; …” I now used “…;Roles=CustData,DE;CustomData=1; …” which results in the exact same results of course. However, the query now changed to the following (already beautified) xVelocity query:

  1. SELECT
  2.     [Geography].[EnglishCountryRegionName],
  3.     [Product].[ProductCategory],
  4.     SUM([Reseller Sales].[SalesAmount])
  5. FROM [Reseller Sales]
  6. LEFT OUTER JOIN [Reseller]
  7.     ON [Reseller Sales].[ResellerKey] = [Reseller].[ResellerKey]
  8. LEFT OUTER JOIN [Geography]
  9.     ON [Reseller].[GeographyKey] = [Geography].[GeographyKey]
  10. LEFT OUTER JOIN [Product]
  11.     ON [Reseller Sales].[ProductKey] = [Product].[ProductKey]
  12. WHERE [Product].$ROWFILTER IN '0x000000…000000000000000000000fffff00000000000ffffffff'));
  13. OR [Geography].[CountryRegionCode] = "Germany";

Instead of using a direct filter on [ProductCategory] we now see a filter on $ROWFILTER ?!?

### ASSUMPTIONS START ###
I have to admit that I am currently not entirely sure what this means but I assume the following:
Preceding the main query another xVelocity query is executed which is important for us:

  1. SELECT
  2. [Product].[RowNumber],
  3. [Product].[ProductCategory],
  4. COUNT()
  5. FROM [Product];

This query fetches each [RowNumber] and its associated [ProductCategory]. Internally the [RowNumber] column is created for every table. This is related to the columnar storage that xVelocity uses. Elaborating this in detail would go far beyond the context of this blog post. For more details on the RowNumber-column please refer too http://msdn.microsoft.com/en-us/library/hh622558(v=office.15).aspx which describes the Excel data model which is actually Power Pivot and therefore also applies to Tabular. (In general this link contains a lot of in-depth information on the tabular data model and the columnar storage concepts!)

I further assume that our security-expression is then evaluated against this temporary table to create an bitmap index of which rows match the security-expression and which don’t. This result is then applied to our main query which using the WHERE clause  [Product].$ROWFILTER IN ‘0x0000….’
For all subsequent queries the above query on [RowNumber] and [ProductCategory] is not executed again so I assume that the bitmap index gets cached internally by Analysis Services. I further assume that if the bitmap index gets cached it is also shared between users belonging to the same role which would be similar to multidimensional models.
### ASSUMPTIONS END ###

So the important take-away for tabular is that the security gets propagated into the query and down to the fact table. Combining multiple roles on this level using OR delivers the expected results.

 

For multidimensional models this is slightly different. You can define security on either the Database Dimension (which gets inherited down to all Cube Dimension) or you can define security on the Cube Dimension directly. Defining security on the Database Dimension already makes it very clear that the security restrictions are independent of any fact table/measure group. A Database Dimension may be used in several cubes so the engine cannot know in advance which measure group to use. Security for multidimensional models is defined on the multidimensional space defined by that dimension. If one role is not restricted on a dimension at all, the user will always see the whole dimension and its associated values, even if a second role would restrict that dimension. And this causes unexpected results as the user may see the whole cube.
In terms of SQL the query could be expressed as:

  1. SELECT
  2.     [ProductCategory],
  3.     [Country],
  4.     SUM([Reseller Sales])
  5. FROM <table>
  6. WHERE ( [ProductCategory] = 'Bikes' OR 1 = 1)
  7.     OR ( 1 = 1  OR [Country] = 'Germany')
  8. GROUP BY
  9.     [Product Category],
  10.     [Country]

The left side of the inner OR statements represents the role “Bikes” whereas the right part represents the “DE” role. It should be very obvious that due to the combination of both roles you finally see everything without any restriction!

Another important thing to point out is that security for multidimensional models is only evaluated once and not for every query. So even if you have complex dynamic security its evaluation only hits you once for the first user that connects to a role and is cached for all queries of that user and also shared with other users belonging to that role.

 

I hope this gives some more insights on how tabular and multidimensional handle multiple roles and their fundamental differences!

Analysis Services Security: Multiple Roles in Tabular vs. Multidimensional

In terms of security tabular and multidimensional models of SQL Server Analysis Services are very similar. Both use Roles to handle specific security settings. Those Roles are then assigned to users and groups.  This is very trivial if a user only belongs to one Role – the user is allowed to see what is specified in the Role. But it can become very tricky if a user belongs to more than one role.

For both, tabular and multidimensional security settings of multiple roles are additive. This means that you will not see less if you are assigned multiple roles but only more. Lets assume we have the following Roles:
“Bikes” – is restricted to [ProductCategory] = “Bikes”
“Brakes” – is restricted to [ProductSubCategory] = “Brakes”
“DE” – is restricted to [Country] = “DE”

A user belonging to Roles “Bikes” and “Brakes” will see all products that belong to “Bikes” and all products that belong to “Brakes”. This is OK and returns the expected results as  both roles secure the same dimension/table. This applies to tabular and also multidimensional.

However, if roles secure different dimensions/tables it gets a bit more tricky. A user may belong to Roles “Bikes” and “DE”. For multidimensional this is a real problem as it finally result in the user seeing the whole cube! This is “by design” and can be explained as follows:
Role “Bikes” does not have any restriction on [Country] so all countries are visible, Role “DE” has no restriction on [ProductCategory] so all product categories are visible. As Roles are additive the user is allowed to see all countries and also all product categories, hence the whole cube:
MultipleRoles_multidimensional

Basically you would expect to see “Bikes”-sales for all countries and “Germany”-sales for all product categories but you end up seeing much more than this. If you have every faced this problem in real life you know that this is probably not the intended behavior your customers want to see. Usually Active Directory Groups are used and assigned to SSAS roles, so this can happen quite easily without anyone even noticing (except the user who is happy to see more Smile )!
Chris Webb wrote an excellent blog post on how to deal with those kinds of security requirements in multidimensional models here.

For tabular this behavior is somehow similar. A user belonging to both roles is also allowed to see all countries and all product categories – this is because security settings are additive, same as for multidimensional. Even though this is true in terms of the structure (rows, columns) of the query we still get a different result in terms of values!
Here is the same query on a tabular model with the same security settings:
MultipleRoles_tabular

This is exactly what we and our customers would expect to see – all sales for “Germany” and also every sale related to “Bikes”! In tabular models security applied to a given table cascades down to all related tables – in this case to our fact table. If a table is indirectly secured by different Roles which put security on different tables those restrictions are combined using OR. In terms of SQL this could be expressed as:

  1. SELECT
  2.     [ProductCategory],
  3.     [Country],
  4.     SUM([Reseller Sales])
  5. FROM <table>
  6. WHERE [ProductCategory] = 'Bikes'
  7.     OR [Country] = 'Germany'
  8. GROUP BY
  9.     [Product Category],
  10.     [Country]

Further pivoting the result would show the same as the MDX query.

Thinking back to some of my multidimensional cubes where I had to deal with multiple Roles this “slight difference” would have been really handy and would have saved me a lot of time that I put into custom security solutions using bridge tables, assemblies, etc.

In my next post I will go into much more detail on how the tabular security model works so stay tuned!

UPDATE: Part 2 can be found here

Applied Basket Analysis in Power Pivot using DAX

Basket Analysis is a very common analysis especially for online shops. Most online shops make use of it to make you buy products that “Others also bought …”. Whenever you view a specific product “Product A” from the online shop, basket analysis allows the shop to show you further products that other customers bought together with “Product A”. Its basically like taking a look into other customers shopping baskets. In this blog post I will show how this can be done using Power Pivot. Alberto Ferrari already blogged about it here some time ago and showed a solution for Power Pivot v1. There is also dedicated chapter in the whitepaper The Many-to-Many Revolution 2.0 which deals with Basket Analysis, already in Power Pivot v2. Power Pivot v2 already made the formula much more readable and also much faster in terms of performance. Though, there are still some things that I would like to add.

Lets take a look at the initial data model first:Model_Old

First of all we do not want to modify this model but just extend it so that all previously created measures, calculations and, most important, the reports still work. So the only thing we do is to add our Product-tables again but with a different name. Note that I also added the Subcategory and Category tables in order to allow Basket Analysis also by the Product-Category hierarchy. As we further do not want to break anything we only use an inactive relationship to our ‘Internet Sales’ fact table.

After adding the tables the model looks as follows:Model_New

The next and actually last thing to do is to add the following calculated measure:

Sold in same Order :=
CALCULATE (
COUNTROWS ( ‘Internet Sales’ ),
CALCULATETABLE (
SUMMARIZE (
‘Internet Sales’,
‘Internet Sales’[Sales Order Number]
),
ALL ( ‘Product’ ),
USERELATIONSHIP ( ‘Internet Sales’[ProductKey], ‘Filtered Product’[ProductKey] )
)
)

(formatted using DAX Formatter)

The inner CALCULATETABLE returns a list/table of all [Sales Order Numbers] where a ‘Filtered Product’ was sold and uses this table to extend the filter on the ‘Internet Sales’ table. It is also important to use ALL(‘Product’) here otherwise we would have two filters on the same column ([ProductKey]) which would always result in an empty table. Doing a COUNTROWS finally returns all for all baskets where the filtered product was sold.
We could also change ‘Internet Sales'[Sales Order Number] to ‘Internet Sales'[CustomerKey] in order to analyze what other customers bought also in different baskets (This was done for Example 3). The whole SUMMARIZE-function could also be replaced by VALUES(‘Internet Sales'[Sales Order Number]). I used SUMMARIZE here as I had better experiences with it in terms of performance in the past, though, this may depend on your data. The calculation itself also works with all kind of columns and hierarchies, regardless whether its from table ‘Product’, ‘Filtered Product’, or any other table!

So what can we finally do with this neat formula?

1) Classic Basket Analysis – “Others also bought …”:
Result_ClassicBasketAnalysis

As we can see Hydration Packs are more often sold together with Mountain Bikes opposed to Road Bikes and Touring Bikes. We could also use a slicer on ‘Filtered Product Subcategory’=”Accessories” in order to see how often Accessories are sold together with other products. You may analyze by Color and Product Category:
Result_ByColor
As we can see people that buy black bikes are more likely to buy red helmets than blue helmets.

2) Basket Analysis Matrix:
What may also be important for us is which products are sold together the most often? This can be achieved by pulling ‘Product’ on rows and ‘Filtered Product’ on columns. By further applying conditional formatting we can identify correlations pretty easy:
Result_BasketMatrix
Water Bottles are very often sold together with Bottle Cages – well, not really a surprise. Again, you can also use all kind of hierarchies here for your analysis.
This is what the whole matrix looks like:
Result_BasketMatrixFull
The big blank section in the middle are our Bikes. This tells us that there is no customer that bought two bikes in the same order/basket.

For this analysis I used an extended version of the calculation above to filter out values where ‘Product’ = ‘Filtered Product’ as of course every product is sold within its own basket:

BasketMatrix :=
IF (
MIN ( ‘Product’[ProductKey] )
<> MIN ( ‘Filtered Product’[ProductKey] ),
[Sold in same Order]
)

3) Find Customers that have not bough a common product yet
As we now know from the above analysis which products are very often bought together we can also analyze which customers do not fall in this pattern – e.g. customers who have bough a Water Bottle but have not bought a Bottle Cage yet. Again we can extend our base-calculation to achieve this:

Not Sold to same Customer :=
IF (
NOT ( ISBLANK ( [Sum SA] ) ) && NOT ( [Sold to same Customer] ),
“Not Sold Yet”
)

The first part checks if the selected ‘Product’ was sold to the customer at all and the second part checks if the ‘Filtered Product’ was not sold to the customer yet. In that case we return “Not Sold Yet”, and otherwise  BLANK() which is the default if the third parameter is omitted. That’s the result:
Result_NotSoldToCustomer
Aaron Phillips has bought a Water Bottle but no Mountain Bottle Cage nor a Road Bottle Cage – maybe we should send him some advertisement material on Bottle Cages? Smile

 

As you can see there are a lot of analyses possible on top of that little measure that we created originally. All work with any kind of grouping or hierarchy that you may have and no change to your data model is necessary, just a little extension.

And that’s it – Basket Analysis made easy using Power Pivot and DAX!

Downloads:

Sample Workbook with all Examples: BasketAnalysis.xlsx

Optimizing Columnar Storage for Measures

First of all I have to thank Marco Russo for his blog post on Optimizing High Cardinality Columns in Vertipaq and also his great session at SQL PASS Rally Nordic in Stockholm last year which taught me a lot about columnar storage in general. I highly recommend everyone to read the two mentioned resources before continuing here. Most of the ideas presented in this post are based on these concepts and require at least basic knowledge in columnar storage.

When writing one of my recent posts I ended up with a Power Pivot model with roughly 40M rows. It contained internet logs from Wikipedia, how often someone clicked on a given page per month and how many bytes got downloaded. As you can imagine those values can vary very heavily, especially the amount of bytes downloaded. So in the Power Pivot model we end up having a lot of distinct values in our column that we use for our measure. As you know from Marcos posts, the allocated memory and therefore also the  performance of columnar storage systems is directly related to the number of distinct values in a column – the more the worse. Marco already described an approach to split up a single column with a lot of distinct values into several columns with less distinct values to optimize storage. These concepts can also be used on columns that contain measures or numeric values in general. Splitting numeric values is quite easy, assuming your values range from 1 to 1,000,000 you can split this column into two by dividing the value by 1000 and using MOD 1000 for the second column. Instead of one column with the value 123,456 you end up with two columns with the values 123 and 456. In terms of storage this means that instead of 1,000,000 distinct values we only need to store 2 x 1,000 distinct values. Nothing new so far.

The trick is to combine those columns again at query time to get the original results as before the split. For some aggregations like SUM this is pretty straight forward, others are a bit more tricky. Though, in general the formulas are not really very complex and can be adopted very easily to handle any number of columns:

Aggregation DAX Formula
Value_SUM1 =SUMX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_SUM2 =SUM ( ‘1M_Rows_splitted'[Value_1000] ) * 1000
    + SUM ( ‘1M_Rows_splitted'[Value_1] )
Value_MAX =MAXX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_MIN =MINX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_COUNT =COUNTX(‘1M_Rows_splitted’, [Value_1000] * 1000 + [Value_1])
Value_DISTINCTCOUNT =COUNTROWS (
    SUMMARIZE (
        ‘1M_Rows_splitted’,
        ‘1M_Rows_splitted'[Value_1000],
        ‘1M_Rows_splitted'[Value_1]))

As you can see you can still mimic most kind of aggregation even if the [Value]-column is split up.

Though, don’t exaggerate splitting your columns – too many may be a bit inconvenient to handle and may neglect the effect resulting in worse performance. Marco already showed that you can get a reduction of up to 90% in size, during my simple tests I came up with about the same numbers. Though, it very much depends on the number of distinct values that you actually have in your column!

I would not recommend to always use this approach for all your measure column – no, definitely not! First check how many distinct values your data/measures contain and decide afterwards. For 1 million distinct values it is probably worth it, for 10,000 you may reconsider using this approach. Most important here is to test this pattern with your own data, data model and queries! Test it in terms of size and of course also in terms of performance. It may be faster to split up columns but it may also be slower and it may be also different for each query that you execute against the tabular model / Power Pivot. Again, test with your own data, data model and queries to get representative results! 

Here is a little test that you may run on your own to test this behavior. Simple create the following Power Query using M, load the result into Power Pivot and save the workbook. It basically creates a table with 1 million distinct values (0 to 999,999) and splits this column up into two. You can just copy the workbook, remove the last step “Remove Columns” and save it again to get the “original” workbook and Power Pivot model.

let
    List1 = List.Numbers(0, 1000000),

    TableFromList = Table.FromList(List1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    RenamedColumns = Table.RenameColumns(TableFromList,{{"Column1", "Value"}}),
    ChangedType1 = Table.TransformColumnTypes(RenamedColumns,{{"Value", type number}}),
    InsertedCustom = Table.AddColumn(ChangedType1, "Value_1000", each Number.RoundDown([Value] / 1000)),
    InsertedCustom1 = Table.AddColumn(InsertedCustom, "Value_1", each Number.Mod([Value], 1000)),
    ChangedType = Table.TransformColumnTypes(InsertedCustom1,{{"Value_1000", type number}, {"Value_1", type number}}),
    RemovedColumns = Table.RemoveColumns(ChangedType,{"Value"})
in
    RemovedColumns

This is what I ended up with when you compare those two workbooks:

  # distinct values Final Size
single column 1,000,000 14.4 MB
two columns 2 x 1,000 1.4 MB

We also get a reduction in size of 90%! Though, this is a special scenario …
In the real world, taking my previous scenario with the Wikipedia data as an example, I ended up with a reduction to 50%, still very good though. But as you can see the reduction factor varies very much.

Downloads:

Sample workbook with Power Query: 1M_Rows_splitted.xlsx

Upcoming Conferences and Events in Q1 2013

After PASS SQL Rally Nordic in Stockholm last year I am very happy to announce that I am going to speak at two more events in the first quarter of 2013.

SQLKonferenz2014
I will do a session at the “Deutsche SQL Server Konferenz 2014” (=”German SQL Server Conference 2014″) on “Big Data Scenario mit Power BI vs. SAP HANA“. It is basically an advanced version of my blog post on SAP HANAs Big Data Scenario with Power BI with much more insights details on both technologies. The conference itself is a 3 day conference from 10th to 12th of February where day 1 is reserved for pre-conference sessions. It also features a lot of international speakers and of course also a good amount of English sessions (mine will be in German though). Also my colleague Marcel Franke will do a session about PDW and R which you also do not want to miss if you are into Big Data and predictive analytics!
Check out the agenda and make sure you register in time!

SQLSaturdayBanner
Later on the 6th of March I will speak at the SQLSaturday #280 in Vienna on “Scaling Analysis Services in the Cloud“. (This is not a typo, its really on 6th of March which is actually a Thursday!) The session focuses on how to get the best performance out of multidimensional Analysis Services solutions when they are moved to the Windows Azure cloud. In the end I will come up with some best practices and guide lines for this and similar scenarios.
Just make sure that you register beforehand to enjoy this full day of free sessions and trainings!

Hope to see you there!

Using Power Query to analyze SSAS Disk Usage

Some time ago Bob Duffy blogged about on how to use Power Pivot to analyze the disk usage of multidimensional Analysis Services models (here). He uses a an VBA macro to pull meta data like filename, path, extension, etc. from the file system or to be more specific from the data directory of Analysis Services. Analysis Services stores all its data in different files with specific extensions so it is possible to link those files to multidimensional objects in terms of attributes, facts, aggregations, etc. Based on this data we can analyze  how our data is distributed. Do we have too big dimensions? Which attribute uses the most space? Do our facts consume most of the space (very likely)? If yes, how much of it is real data and how big are my aggregations – if they are processed at all?!? – These are very common and also important things to know for an Analysis Services developer.

So Bob Duffy’s solution can be really useful. The only thing I did not like about it was the fact that it uses a VBA macro to get the data. This made me think and I came up with the idea of using Power Query to get this data. Btw, make sure to check out the latest release, there have been a lot of improvements recently!

With Power Query you have to option to load multiple files from a folder and also from its sub folders. When we do this on our Analysis Services data directory, we get a list of ALL files together with their full path, filename, extension and most important in this case their size which can be found by expanding the Attributes-record:
PQ_Source_FileList

The final Power Query does also a lot of other things to prepare the data so it can be later joined to our FileExtensions-table that holds detailed information for each file extension. This table currently looks like below but can be extended by any other columns that may be necessary and/or useful for you:

FileType FileType_Description ObjectType ObjectTypeSort ObjectTypeDetails
ahstore Attribute Hash Store Dimensions 20 Attribute
asstore Attribute String Store Dimensions 20 Attribute
astore Attribute Store Dimensions 20 Attribute
bsstore BLOB String Store Dimensions 20 BLOB
bstore BLOB Store Dimensions 20 BLOB
dstore Hierarchy Decoding Store Dimensions 20 Hierarchy
khstore Key Hash Store Dimensions 20 Key
ksstore Key String Store Dimensions 20 Key
kstore Key Store Dimensions 20 Key
lstore Structure Store Dimensions 20 Others
ostore Order Store Dimensions 20 Others
sstore Set Store Dimensions 20 Others
ustore ustore Dimensions 20 Others
xml XML Configuration 999 Configuration
fact.data Basedata Facts 10 Basedata
fact.data.hdr Basedata Header Facts 10 Basedata
fact.map Basedata Index Facts 10 Basedata
fact.map.hdr Basedata Index Header Facts 10 Basedata
rigid.data Rigid Aggregation Data Facts 10 Aggregations
rigid.data.hdr Rigid Aggregation Data Header Facts 10 Aggregations
rigid.map Rigid Aggregation Index Facts 10 Aggregations
rigid.map.hdr Rigid Aggregation Index Header Facts 10 Aggregations
flex.data Flexible Aggregation Data Facts 10 Aggregations
flex.data.hdr Flexible Aggregation Data Header Facts 10 Aggregations
flex.map Flexible Aggregation Index Facts 10 Aggregations
flex.map.hdr Flexible Aggregation Index Header Facts 10 Aggregations
string.data String Data (Distinct Count?) Facts 10 Basedata
cnt.bin Binary Configuration 999 Binaries
mrg.ccmap mrg.ccmap DataMining 999 DataMining
mrg.ccstat mrg.ccstat DataMining 999 DataMining
nb.ccmap nb.ccmap DataMining 999 DataMining
nb.ccstat nb.ccstat DataMining 999 DataMining
dt dt DataMining 999 DataMining
dtavl dtavl DataMining 999 DataMining
dtstr dtstr DataMining 999 DataMining
dmdimhstore dmdimhstore DataMining 999 DataMining
dmdimstore dmdimstore DataMining 999 DataMining
bin Binary Configuration 999 Binaries
OTHERS Others Others 99999 Others

As you can see the extension may contain 1, 2 or 3 parts. The more parts the more specific this file extension is. If you checked the result of the Power Query it also contains 3 columns, FileExtension1, FileExtension2 and FileExtension3. To join the two tables we first need to load both tables into Power Pivot. The next step is to create a proper column on which we can base our relationship. If the 3-part extension is found in the file extensions table, we use it, otherwise we check the 2-part extension and afterwards the 1-part extension and in case nothing matches we use “OTHERS”:

=SWITCH(TRUE(),
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension3]), [FileExtension3],
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension2]), [FileExtension2],
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension1]), [FileExtension1],
"OTHERS")

Then we can create a relationship between or PQ table and our file extension table. I also created some other calculated columns, hierarchies and measures for usability. And this is the final outcome:
PivotTable_Report

You can very easily see, how big your facts are, the distribution between base-data and Aggregations, the Dimensions sizes and you can drill down to each individual file! You can of course also create a Power View report if you want to. All visualizations are up to you, this is just a very simple example of a report.

Enjoy playing around with it!

Downloads:
(please note that I added a filter on the Database name as a last step of the Power Query to only show Adventure Works databases! In order to get all databases you need to remove this filter!)

SSAS Disk Analysis Workbook: SSAS_DiskAnalysis.xlsx