Some time ago Bob Duffy blogged about on how to use Power Pivot to analyze the disk usage of multidimensional Analysis Services models (here). He uses a an VBA macro to pull meta data like filename, path, extension, etc. from the file system or to be more specific from the data directory of Analysis Services. Analysis Services stores all its data in different files with specific extensions so it is possible to link those files to multidimensional objects in terms of attributes, facts, aggregations, etc. Based on this data we can analyze how our data is distributed. Do we have too big dimensions? Which attribute uses the most space? Do our facts consume most of the space (very likely)? If yes, how much of it is real data and how big are my aggregations – if they are processed at all?!? – These are very common and also important things to know for an Analysis Services developer.
So Bob Duffy’s solution can be really useful. The only thing I did not like about it was the fact that it uses a VBA macro to get the data. This made me think and I came up with the idea of using Power Query to get this data. Btw, make sure to check out the latest release, there have been a lot of improvements recently!
With Power Query you have to option to load multiple files from a folder and also from its sub folders. When we do this on our Analysis Services data directory, we get a list of ALL files together with their full path, filename, extension and most important in this case their size which can be found by expanding the Attributes-record:
The final Power Query does also a lot of other things to prepare the data so it can be later joined to our FileExtensions-table that holds detailed information for each file extension. This table currently looks like below but can be extended by any other columns that may be necessary and/or useful for you:
FileType | FileType_Description | ObjectType | ObjectTypeSort | ObjectTypeDetails |
ahstore | Attribute Hash Store | Dimensions | 20 | Attribute |
asstore | Attribute String Store | Dimensions | 20 | Attribute |
astore | Attribute Store | Dimensions | 20 | Attribute |
bsstore | BLOB String Store | Dimensions | 20 | BLOB |
bstore | BLOB Store | Dimensions | 20 | BLOB |
dstore | Hierarchy Decoding Store | Dimensions | 20 | Hierarchy |
khstore | Key Hash Store | Dimensions | 20 | Key |
ksstore | Key String Store | Dimensions | 20 | Key |
kstore | Key Store | Dimensions | 20 | Key |
lstore | Structure Store | Dimensions | 20 | Others |
ostore | Order Store | Dimensions | 20 | Others |
sstore | Set Store | Dimensions | 20 | Others |
ustore | ustore | Dimensions | 20 | Others |
xml | XML | Configuration | 999 | Configuration |
fact.data | Basedata | Facts | 10 | Basedata |
fact.data.hdr | Basedata Header | Facts | 10 | Basedata |
fact.map | Basedata Index | Facts | 10 | Basedata |
fact.map.hdr | Basedata Index Header | Facts | 10 | Basedata |
rigid.data | Rigid Aggregation Data | Facts | 10 | Aggregations |
rigid.data.hdr | Rigid Aggregation Data Header | Facts | 10 | Aggregations |
rigid.map | Rigid Aggregation Index | Facts | 10 | Aggregations |
rigid.map.hdr | Rigid Aggregation Index Header | Facts | 10 | Aggregations |
flex.data | Flexible Aggregation Data | Facts | 10 | Aggregations |
flex.data.hdr | Flexible Aggregation Data Header | Facts | 10 | Aggregations |
flex.map | Flexible Aggregation Index | Facts | 10 | Aggregations |
flex.map.hdr | Flexible Aggregation Index Header | Facts | 10 | Aggregations |
string.data | String Data (Distinct Count?) | Facts | 10 | Basedata |
cnt.bin | Binary | Configuration | 999 | Binaries |
mrg.ccmap | mrg.ccmap | DataMining | 999 | DataMining |
mrg.ccstat | mrg.ccstat | DataMining | 999 | DataMining |
nb.ccmap | nb.ccmap | DataMining | 999 | DataMining |
nb.ccstat | nb.ccstat | DataMining | 999 | DataMining |
dt | dt | DataMining | 999 | DataMining |
dtavl | dtavl | DataMining | 999 | DataMining |
dtstr | dtstr | DataMining | 999 | DataMining |
dmdimhstore | dmdimhstore | DataMining | 999 | DataMining |
dmdimstore | dmdimstore | DataMining | 999 | DataMining |
bin | Binary | Configuration | 999 | Binaries |
OTHERS | Others | Others | 99999 | Others |
As you can see the extension may contain 1, 2 or 3 parts. The more parts the more specific this file extension is. If you checked the result of the Power Query it also contains 3 columns, FileExtension1, FileExtension2 and FileExtension3. To join the two tables we first need to load both tables into Power Pivot. The next step is to create a proper column on which we can base our relationship. If the 3-part extension is found in the file extensions table, we use it, otherwise we check the 2-part extension and afterwards the 1-part extension and in case nothing matches we use “OTHERS”:
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension3]), [FileExtension3],
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension2]), [FileExtension2],
CONTAINS(FileTypes, FileTypes[FileType], [FileExtension1]), [FileExtension1],
"OTHERS")
Then we can create a relationship between or PQ table and our file extension table. I also created some other calculated columns, hierarchies and measures for usability. And this is the final outcome:
You can very easily see, how big your facts are, the distribution between base-data and Aggregations, the Dimensions sizes and you can drill down to each individual file! You can of course also create a Power View report if you want to. All visualizations are up to you, this is just a very simple example of a report.
Enjoy playing around with it!
Downloads:
(please note that I added a filter on the Database name as a last step of the Power Query to only show Adventure Works databases! In order to get all databases you need to remove this filter!)
SSAS Disk Analysis Workbook: SSAS_DiskAnalysis.xlsx
Pingback: Using Power Query to analyze SSAS Disk Usage (Redirect) | Gerhard Brueckl's BI Blog
Great use of PowerQuery Sir – I’m going to try this approach.
I did update the FileType linked table to differentiate between bitmap indexes and aggregations/Basedata. I think anything with a “MAP” in the file extension should be a bitmap index.
Hi Bob,
I just added some new files types, seems that I have forgotten all *.data*-extensions for the aggregations 🙂
I also added string.data which seems to be used to stored distinct count measures that are based on char/string-columns
Add any new column as required to the file-extension list to fulfill your reporting needs
-gerhard