About Gerhard Brueckl

Gerhard has been working with Microsoft BI tools since 2006 mainly focusing on Microsoft SQL Server and its components. As a consultant and architect he designed various enterprise BI solutions primary in the German-speaking areas. His personal interest has always been on analytical databases and their capabilities. From the start on he has always been active in the Analysis Services community and in 2012 he achieved the SSAS Maestro certification. This was also about the same time when he started his own blog and began speaking at international conferences all over the world.

Power BI – Dynamic TopN + Others with Drill-Down

A very common requirement in reporting is to show the Top N items (products, regions, customers, …) and this can also be achieved in Power BI quite easily.

But lets start from the beginning and show how this requirement usually evolves and how to solve the different stages.

The easiest thing to do is to simply resize the visual (e.g. table visual) to only who 5 rows and sort them descending by your measure:

This is very straight forward and I do not think it needs any further explanation.

The next requirement that usually comes up next is that the customer wants to control, how many Top items to show. So they implement a slicer and make the whole calculation dynamic as described here:
SQL BI – Use of RANKX in a Power BI measure
FourMoo – Dynamic TopN made easy with What-If Parameter

Again, this works pretty well and is explained in detail in the blog posts.

Once you have implemented this change the business users usually complain that Total is wrong. This depends on how you implemented the TopN measure and what the users actually expect. I have seen two scenarios that cause confusion:
1) The Total is the SUM of the TopN items only – not reflecting the actual Grand Total
2) The Total is NOT the SUM of the TopN items only – people complaining that Power BI does not sum up correctly

As I said, this pretty much depends on the business requirements and after discussing that in length with the users, the solution is usually to simply add an “Others” row that sums up all values which are not part of the TopN items. For regular business users this requirement sounds really trivial because in Excel the could just add a new row and subtract the values of the TopN items from the Grand Total.

However, they usually will not understand the complexity behind this requirement for Power BI. In Power BI we cannot simply add a new “Others” row on the fly. It has to be part of the data model and as the TopN calculations is already dynamic, also the calculation for “Others” has to be dynamic. As you probably expected, also this has been covered already:
Oraylis – Show TopN and rest in Power BI
Power BI community – Dynamic Top N and Others category

These work fine even if I do not like the DAX as it is unnecessarily complex (from my point of view) but the general approach is the same as the one that will I show in this blog post and follows these steps:
1) create a new table in the data model (either with Power Query or DAX) that contains all our items that we want to use in our TopN calculation and an additional row for “Others”
2) link the new table also to the fact table, similar to the original table that contains your items
3) write a measure that calculates the rank for each item, filters the TopN items and assigns the rest to the “Others” item
4) use the new measure in combination with the new table/column in your visual

Step 1 – Create table with “Others” row

I used a DAX calculated table that does a UNION() of the existing rows for the TopN calculation and a static row for “Others”. I used ROW() first so I can specify the new column names directly. I further use ALLNOBLANKROW() to remove to get rid of any blank rows.

Step 2 – Create Relationship

The new table is linked to the same table to which the original table was linked to. This can be the fact-table directly or an intermediate table that then filters the facts in a second step (as shown below)

Step 3 – Create DAX measure

That’s actually the tricky part about this solution, but I think the code is still very easy to read and understand:

Step 4 – Build Visual

One of the benefits of this approach is that it also allows you to use the “Others” value in slicers, for cross-filtering/-highlight and even in drill-downs. To do so we need to configure our visual with two levels. The first one is the column that contains the “Others” item and the second level is the original column that contains the items. The DAX measure will take care of the rest.

And that’s it! You can now use the column that contains the artificial “Others” in combination with the new measure wherever you like. In a slicer, in a chart or in a table/matrix!

The final PBIX workbook can also be downloaded: TopN_Others.pbix

Using Power BI Desktop Direct Query with Parameters

I frequently work on projects where we have multiple tiers on which our solution is deployed to using continuous integration / continuous deployment (CI / CD) pipelines in Azure DevOps. Once everything is deployed, you also need to monitor these different environments and check the status of the data or ETL pipelines. My tool of choice is usually Power BI desktop as it allows me to connect to e.g. SQL databases very easily. However, I always ended up creating a multiple Power BI files – one for each environment.

Having multiple files results in a lot of overhead when it comes to maintenance and also managing these files. Fortunately, I came across this little trick when I was investigating in composite models and aggregations that I am going to explain in this blog post.

To be honest, I barely used Power BI Direct Query in past and so maybe this feature has been there for quite some time without me realizing it but It may also be that it was introduced just recently with composite models.
So the “feature” is, that you can also use Query Parameters to parameterize your Direct Query queries. This is pretty awesome if you think of it for a second:

  • easy switching between databases
  • use one file for all environments
  • only maintain a single file
  • no need to import/load any data

Power BI DirectQuery with Parameters

The configuration within Power Query is also quite easy – simply replacing the hard coded values with the ones from the parameters:

Power Query configuration using Parameters instead of hard-coded values

And that’s it already! you can now easily switch between different databases by just using Power BI parameters and the Direct Query connection will change automatically to the new server/database.
Of course, all the target servers/databases have to have the same schema otherwise, you will get an error.

Caveats:
Even though this looks quite trivial, there are some caveats which makes me believe this is not fully supported yet. You may noticed above already that in Power Query, when going to the step that actually queries the database, it complains about that this step would cause the whole table to be converted to Import Mode. However, you can just ignore it and go on with the next step to remain in Direct Query Mode.

Ignore warning and DO NOT convert to Import Mode

It seems like Power BI keeps track from where a table was originally imported. So if you want to add a new table, make sure to copy an existing Direct Query table and change it accordingly instead of going to “New Source > …” !

Also, you need to make sure that you have entered the credentials for the different source databases at least once – otherwise Power BI will ask you when you query the database the first time. This is also the reason why this does not work so well in the Power BI service as changing the parameters there is not as simple as it is in Power BI desktop.

As I said, I do not know if this is a new feature (or a feature at all), but it is definitely helpful for certain scenarios.

Downloads:
Power BI Workbook: DirectQuery_wParameters.pbix

PowerShell module for Databricks on Azure and AWS

Over the last year I worked a lot with Databricks on Azure and I have to say that I was (and still am) very impressed how well it works and how it integrates with other services of the Microsoft Azure Data Platform like Data Lake Store, Data Factory, etc.

Some of the projects I worked on also included CI/CD like pipelines using Azure DevOps where Databricks did not really shine so bright in the beginning. There are no native tasks for it or anything. But this is OK as for those scenarios, where you need to automate/script something, Databricks offers a REST API (Azure, AWS).

As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended up having 20+ single scripts which was not really maintainable any more. So I packed them into a PowerShell module and also published it to the PowerShell Gallery (https://www.powershellgallery.com/packages/DatabricksPS) for everyone to use!

The module works for Databricks on Azure and also if you run Databricks on AWS – fortunately the API endpoints are almost identical.
The usage is quite simple as for any other PowerShell module:

  1. Install it using Install-Module cmdlet
  2. Setup the Databricks environment using API key and endpoint URL
  3. run the actual cmdlets (e.g. to start a cluster)


Here is the same code for you to copy&paste:

At the moment, the module supports the following APIs:

These APIs are not yet implemented but will be added in the near future:

All the cmdlets are documented and contain links to official documentation of the Rest API call used by the cmdlet. Some API endpoints support different variations of parameters – this was implemented using different parameter sets in PowerShell. There are still some ongoing tests (especially on AWS) and improvements but I general all cmdlets work as expected. I hope this helps anyone else who also has to deal with the Databricks APIs frequently or has to integrate it in a CI/CD pipeline.

The whole source code is also available from my Git-repository (https://github.com/gbrueckl/Databricks.API.PowerShell). If you want to provide any feedback, please use the Git-repository to do so.

Using Parameters and hidden Properties in Azure Data Factory v2

Azure Data Factory v2 is Microsoft Azure’s Platform as a Service (PaaS) solution to schedule and orchestrate data processing jobs in the cloud. As the name implies, this is already the second version of this kind of service and a lot has changed since its predecessor. One of these things is how datasets and pipelines are parameterized and how these parameters are passed between the different objects. The basic concepts behind this process are well explained by the MSDN documentation – for example Create a trigger that runs a pipeline on a schedule. In this example an trigger is created that runs a pipeline every 15 minute and passes the property “scheduledTime” of the trigger to the pipeline. This is the JSON expression that is used:

"parameters": {
  "scheduledRunTime": "@trigger().scheduledTime"
}

@trigger() basically references the object that is returned by the trigger and it seems that this object has a property called “scheduledTime”. So far so good, this is documented and fulfills the basic needs. Some of these properties are also documented here: System variables supported by Azure Data Factory but unfortunately not all of them.

So sometimes this trigger objects can be much more complex and also contain additional information that may not be documented. This makes it pretty hard for the developer to actually know which properties exist and how they could be used. A good example are Event-Based Triggers which were just recently introduced where the documentation only mentions the properties “fileName” and “folderPath” but it contains much more (details see further down). For simplicity I will stick to scheduled triggers at this point but the very same concept applies to all kinds of triggers and actually also to all other internal objects like @pipeline(), @dataset() or @activity() as well!

So how can you investigate those internal objects like @trigger() and see what they actually look like? Well, the answer is quite simple – just pass the object itself without any property to the pipeline. The target parameter of the pipeline can either be of type String or Object.
ADFv2_Set_Pipeline_Parameter_from_Trigger
This allows you to see the whole object on the Monitoring-page once the pipeline is triggered:
ADFv2_Monitor_Parameter_Value

For the Scheduled-trigger, the object looks like this:

@trigger() – Schedule-Trigger
{
  "name": "Trigger_12348CAF-BE66-42CF-83DA-E3028693F304",
  "startTime": "2018-09-25T18:00:22.4180978Z",
  "endTime": "2018-09-25T18:00:22.4180978Z",
  "scheduledTime": "2018-09-25T18:00:22.507Z",
  "trackingId": "1234a112-7bb9-4ba6-b032-6189d6dd8b73",
  "clientTrackingId": "12346637084630521889360938860CU33",
  "code": "OK",
  "status": "Succeeded"
}

And as you can guess, you can pass any of these properties to the pipeline using the syntax
“@trigger().<property_name>” or even the whole object! The syntax can of course also be combined with all the built-in expressions.

This should hopefully make it easier for you to build and debug more complex Azure Data Factory v2 pipelines!

Below you can find an example of the object that a Event-Based Trigger creates:

@trigger() – Event-Trigger
{
  "name": "Trigger_12348CAF-BE66-42CF-83DA-E3028693F304",
  "outputs": {
    "headers": {
      "Host": "prod-1234.westeurope.logic.azure.com",
      "x-ms-client-tracking-id": "1234c153-fc96-4b8e-9002-0f5096bcd744",
      "Content-Length": "52",
      "Content-Type": "application/json; charset=utf-8"
    },
    "body": {
      "folderPath": "data",
      "fileName": "myFile.csv"
    }
  },
  "startTime": "2018-09-25T18:22:54.8383112Z",
  "endTime": "2018-09-25T18:22:54.8383112Z",
  "trackingId": "07b3d1a1-8735-4ff0-9cc6-c83d95046101",
  "clientTrackingId": "56dcc153-fc96-4b8e-9002-0f5096bcd744",
  "status": "Succeeded"
}

Note that right now, it does not say whether the trigger fired because the file was created, updated or deleted! But I hope this will be fixed by the product team in the near future.

Showing OLAP UniqueNames in PowerBI

I just had the request to expose the UniqueNames of an Analysis Services Multidimensional cube in PowerBI. You may ask why I would want to do this and the answer is actually pretty simple: In SSAS MD the caption of elements/members even within the same attribute is not necessarily unique. This is because of the Key/Name concept where the elements/members are grouped by the Key but for the end-user a proper Name is displayed. So if you happen to have duplicate Names in your cube, import the values into PowerBI you will end up with less rows (and wrong values!) compared to the original SSAS MD cube because PowerBI (and also Analysis Services Tabular) does not have a Key/Name concept and therefore the grouping and what is display is always the same.

Having worked quite a lot with SSAS MD in the past I knew that every attribute member contains various internal properties, one of them being the UniqueName, which, as the name implies, is the unique identifier for each member regardless of the caption displayed for that member. And that’s exactly what I needed in this scenario. So the question is how to get this information in PowerBI as this is nothing that should usually be exposed to an end-user.

There is very little information in the internet about SSAS MD connectivity with PowerBI for in general (talking about the import-mode here and not the live-connection!).
One of the few blog posts I found from Chris Webb is already 3 years old: https://blog.crossjoin.co.uk/2015/01/13/a-closer-look-at-power-queryssas-integration/. The other resource is the official documentation on MSDN (scroll down to the “Cube” functions): https://msdn.microsoft.com/en-us/query-bi/m/accessing-data-functions which does not really provide a lot of information except for the syntax of the functions.

Anyway, I started to dig into this topic and made some this. Basically this is what I want to achieve:
PowerQuery_Cube_AttributeMemberId_Output

For my sample I used to Adventure Works MD cube, opened it in PowerBI using Import-Mode and just selected the [Product].[Subcategory] hierarchy:
PowerQuery_Cube_Transform_Product_Subcategory

The UI is quite limited here and you can only select hierarchies and measures.
However, getting the UniqueName of a given hierarchy can be achieved quite easily in a subsequent step by adding a new custom column:
PowerQuery_Cube_AttributeMemberId

And that’s already all you need to do. The column [Product.Subcategory] contains various information, one of them being the UniqueName of the product subcategory which can be accessed by the Cube.AttributeMemberId function.

My next step was to try to get some other properties in a similar way using the Cube.AttributeMemberProperty function. According to the documentation it is quite similar to Cube.AttributeMemberId but takes an additional parameter where you can define which property you want to retrieve. As the [Product].[Subcategory] hierarchy has a property called “Category” I tried this:
PowerQuery_Cube_AttributeMemberProperty

This caused a huge error in PowerBI desktop and so I tried different styles to define the property:

  • “Category”
  • “[Category]”
  • “[Product].[Subcategory].[Subcategory].[Category]”

I also tried to access internal properties:

  • “MEMBER_KEY”
  • “CAPTION”
  • “UNIQUE_NAME”

None of these worked though, neither for the regular properties nor for the internal ones. The main problem seems to be that the MDX query executed does not query any other properties except for the UniqueName not even if you specify them manually in your PowerQuery script. This means that so far there is no way to access member properties from within PowerBI. There is already a user voice where you can vote for this: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/12443955-member-properties-ssas

Download: PowerBI_UniqueNames.pbix
This PowerBI Desktop model contains all samples from above including the my failed tries for the properties!

Storing Images in a PowerBI/Analysis Services Data Models

As some of you probably remember, when PowerPivot was still only available in Excel and Power Query did not yet exist, it was possible to load images from a database (binary column) directly into the data model and display them in PowerView. Unfortunately, this feature did not work anymore in PowerBI Desktop and the only way to display images in a visual was to provide the URL of the image which is public accessible. The visual would then grab the image on-the-fly from the URL and render it. This of course has various drawbacks:

  • The image needs to be available via a public URL (e.g. upload it first to an Azure Blob Store)
  • The image cannot be displayed when you are offline
  • The link may break in the future or point to a different image as initially when the model was built

There is also a  feedback items about this issue which I encourage you to vote for: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/7340150-data-model-image-binary-support-in-reports

Until today I was sure that we have to live with this limitation but then I came across this blog post from Jason Thomas aka SqlJason. He shows a workaround to store images directly in the PowerBI data model and display them in the report as if they were regular images loaded from an URL. This is pretty awesome and I have to dedicate at least 99.9% of this blog post to Jason and his solution!

However, with this blog post I would like to take Jasons’ approach a step further. He creates the Base64 string externally and hardcodes it in the model using DAX. This has some advantages (static image, no external dependency anymore, …) but also a lot of disadvantages (externally create the Base64 string, manually copy&paste the Base64 string for each image, hard to maintain, cannot dynamically add images …). For scenarios where you have a local folder with images, a set of [private] URLs pointing to images or images stored in a SQL table (as binary) which you want to load into your PowerBI data model, this whole process should be automated and ideally done within PowerBI.

PowerBI_Images_Stored_Sample

Fortunately, this turns out to be quite simple! Power Query provides a native function to convert any binary to a Base64 encoded string: Binary.ToText() . The important part to point out here is to use the second parameter which allows you to set the encoding of the resulting text. It supports two values: BinaryEncoding.Base64 (default) and BinaryEncoding.Hex. Once we have the Base64 string, we simply need to prefix it with the following meta data: “data:image/jpeg;base64, “

To make it easy, I wrote to two custom PowerQuery functions which convert and URL or a binary image to the appropriate string which can be used by PowerBI:

Function: UrlToPbiImage
  1. let
  2.     UrlToImage = (ImageUrl as text) as text =>
  3. let
  4.     BinaryContent = Web.Contents(ImageUrl),
  5.     Base64 = “data:image/jpeg;base64, “ & Binary.ToText(BinaryContent, BinaryEncoding.Base64)
  6. in
  7.     Base64
  8. in
  9.     UrlToImage
Function: BinaryToPbiImage
  1. let
  2.     BinaryToPbiImage = (BinaryContent as binary) as text=>
  3. let
  4.     Base64 = “data:image/jpeg;base64, “ & Binary.ToText(BinaryContent, BinaryEncoding.Base64)
  5. in
  6.     Base64
  7. in
  8.     BinaryToPbiImage

If your images reside in a local folder, you can simply load them using the “Folder” data source. This will give you a list of all images and and their binary content as separate column. Next add a new Custom Column where you call the above function to convert the binary to a prefixed Base64 string which can then be displayed in PowerBI (or Analysis Services) as a regular image. Just make sure to also set the Data Category of the column to “Image URL”:PowerBI_Image_URL_Base64

And that’s it, now your visual will display the image stored in the data model without having to access any external resources!

Caution: As Jason also mentions at the end of his blog post, there is an internal limitation about the size of a text column. So this may cause issues when you try to load high-resolution images! In this case, simply lower the size/quality of the images before you load them.
UPDATE May 2019: Chris Webb provides much more information and a solution(!) to this issue in his blog post: https://blog.crossjoin.co.uk/2019/05/19/storing-large-images-in-power-bi-datasets

Download: StoreImageInPbiModel.pbix
This PowerBI Desktop model contains all samples from above including the PowerQuery functions!

Processing Azure Analysis Services with OAuth Sources (like Azure Data Lake Store)

As you probably know from my last blog post, I am currently upgrading the PowerBI reporting platform of one of my customer from a PowerBI backend (dataset hosted in PowerBI service) to an Azure Analysis Services backend. The upgrade/import of the dataset into Azure Analysis Services itself worked pretty flawless and after switching the connection of the reports everything worked as expected and everyone was happy. However, things got a bit tricky when it came to automatically refreshing the Azure Analysis Services database which was based on an Azure Data Lake Store. For the original PowerBI dataset, this was pretty straight forward as a scheduled refresh from an Azure Data Lake store data source works out of the box. For Azure Analysis Services this is a bit different.

When you build and deploy your data model from Visual Studio, your are prompted for the credentials to access ADLS which are then stored in the data source object of AAS. As you probably know, AAS uses OAuth authentication to access data from ADLS. And this also causes a lot of problems. OAuth is based on tokens and those tokens are only valid for a limited time, by default this is 2 hours. This basically means, that you can process your database for the next 2 hours and it will fail later on with an error message saying that the token expired. (The above applies to all OAuth sources!)
This problem is usually solved by using an Azure Service Principal instead of a regular user account where the token does not expire. Unfortunately, this is not supported at the moment for ADLS data sources and you have to work around this issue.


IMPORTANT NOTE: NONE OF THE FOLLOWING IS OFFICIALLY SUPPORTED BY MICROSOFT !!!


So the current situation that we need to solve is as follows:

  • we can only use regular user accounts to connect AAS to ADLS as service principals are not supported yet
  • the token expires after 2 hours
  • the database has to be processed on a regular basis (daily, hourly, …) without any manual interaction
  • manually updating the token is (of course) not an option

Before you continue here, make sure that you read this blog post first: https://blogs.msdn.microsoft.com/dataaccesstechnologies/2017/09/01/automating-azure-analysis-service-processing-using-azure-automation-account/
It describes the general approach of using Azure Automation to process an Azure Analysis Services model and most of the code in this blog post if based on this!
Also this older blog post will be a good read as some concepts and code snippets are reused here.

Back to our example – as we were already using Azure Automation for some other tasks, we decided to also use it here. Also, PowerShell integrates very well with other Azure components and was the language of choice for us. To accomplish our goal we had to implement 3 steps:

  1. acquire a new OAuth token
  2. update the ADLS data source with the new token
  3. run our processing script

I could copy the code for the first step more or less from one of my older blog post (here) where I used PowerShell to acquire an OAuth token to trigger a refresh in PowerBI.

The second step is to update ADLS data source of our Azure Analysis Services model. To get started, the easiest thing to do is to simply open the AAS database in SQL Server Management Studio and let it script the existing datasource for you: AAS_Script_OAuth_DataSource
The resulting JSON will look similar to this one:

Update ADLS DataSource
{
“createOrReplace”: {
“object”: {
“database”: “Channel Analytics”,
“dataSource”: “DS_ADLS”
},
“dataSource”: {
“type”: “structured”,
“name”: “DS_ADLS”,
“connectionDetails”: {
“protocol”: “data-lake-store”,
“address”: {
“url”: “https://mydatalake.azuredatalakestore.net”
}
},
“credential”: {
“AuthenticationKind”: “OAuth2”,
“token_type”: “********”,
“scope”: “********”,
“expires_in”: “********”,
“ext_expires_in”: “********”,
“expires_on”: “********”,
“not_before”: “********”,
“resource”: “********”,
“id_token”: “********”,
“kind”: “DataLake”,
“path”: “https://mydatalake.azuredatalakestore.net/”,
“RefreshToken”: “********”,
“AccessToken”: “********”
}
}
}
}

The important part for us is the “credential” field. It contains all the information necessary to authenticate against our ADLS store. However, most of this information is sensitive so only asterisks are displayed in the script. The rest of the JSON (except for the “credential” field) is currently hardcoded in the PowerShell cmdlet so if you want to use it, you need to change this manually!
The PowerShell cmdlet then combines the hardcoded part with an updated “credential”-field which is obtained by invoking a REST request to retrieve a new OAuth token. The returned object is modified a bit in order to match the required JSON for the datasource.
Once we have our final JSON created, we can send it to our Azure Analysis Services instance by calling the Invoke-ASCmd cmdlet from the SqlServer module.
Again, please see the original blog post mentioned above for the details of this approach.

After we have updated our datasource, we can simply call our regular processing commands which will then be executed using the newly updated credentials.
The script I wrote allows you to specify which objects to process in different ways:

  • whole database (by leaving AASTableName and AASPartitionName empty)
  • a single or multiple table and all its partitions (by leaving only AASPartitionName empty)
  • or multiple partitions of a single table (by specifying exactly one AASTableName and multiple AASPartitionNames

If multiple tables or partitions are specified, the elements are separated by commas (“,”)

So to make the Runbook work in your environment, follow all the initial steps as described in the original blog post from Microsoft. In addition, you also need to create an Application (Type = “Native”) in your Azure Active Directory to obtain the OAuth token programmatically. This application needs the “Sign in and read user profile” permission from the API “Windows Azure Active Directory (Microsoft.Azure.ActiveDirectory)”:
AAD_App_Permissions
Also remember the ApplicationID, it will be used as a parameter for the final PowerShell Runbook (=parameter “ClientID”!
When it comes to writing the PowerShell code, simply use the code from the download at the end of this blog post.

For the actual credential that you are using, make sure that it has the following permissions:

  • to update the AAS datasource (can be set in the AAS model or for the whole server)
  • has access to the required ADLS files/folders which are processed (can be set e.g. via ADLS Data Explorer)
  • (if you previously used your own account to do all the AAS and ADLS development, this should work just fine)

In general, a similar approach should work for all kinds of datasources that require OAuth authentication but so far I have only tested it with Azure Data Lake Store!

Download: AAS_Process_OAuth_Runbook.ps1

Upgrading your reports from PowerBI to Azure Analysis Services

Since April 2017 it is possible to build reports on top of datasets that are hosted in the PowerBI service. This was announced and described here and here in more detail. This might not seem like a big deal at first sight, but it can have a huge impact on how you work with PowerBI. By separating the data model from the report, you can have two or more independent people working with the same dataset. Also, the people who build the reports in the end most not necessarily have the knowledge to build a data model – the just need to use it. So, there are some clear advantages when you split up your workbook:

  • separation of duty (data modeler vs. report builder)´
  • any number of reports on top of the same model
  • easy control over reports as the files are quite small (they only contain the definition of the report)

This is all pretty cool and, from my point of view, the way to go once you want to use the reports in production and/or have several people working on/with the same reports.

But lets go a step further. After some time, as your data model grows, you realize that the reports get slow and also the processing takes a considerable amount of time to finish. The official upgrade path will then guide you to Azure Analysis Services and you will migrate your data model to deal with the larger data volumes and make use of the flexibility in processing you gained by your upgrade. This migration process is very well described here.

So far so good, but what happens to your reports? Last week I was in exactly the position described above and we had to migrate the existing reports (which were base on a dataset hosted in PowerBI) to Azure Analysis Services. As of now, there is now simple way to simply change the connection string from PowerBI to Azure Analysis Services neither in PBI Desktop nor in the Service. But we could think of some options how it might work:

  • rebuild all reports
  • use the REST API to update the connection string of the existing reports
  • modify the .pbix file manually (NOT OFFICIALLY SUPPORTED)

As you can imagine, rebuilding all reports was not really an option.

The next option, the PowerBI REST API looked pretty promising at first sight. It allows you to retrieve and set the dataset that is used by your report. So the idea is to simply create a new dataset which points to Azure Analysis Services in Live Query mode, take the existing report and use the Rebind API call to bind it to the new AAS dataset. Even though this is supposed to work, I could not make it work in my environment. I tried all things that I could think of but nothing work and I also gave up on this.

So I was stuck there but knew that the information of the data source has to be somewhere in the .pbix file. In the past I already did something similar with Excel/PowerPivot files (“Restoring a SSAS Tabular model to PowerPivot”) so I thought I would also give it a try for .pbix files. And it turns out that they are quite similar. For those of you who are new to this, most (if not all) of the files that are associated with a Microsoft tool and end with “x” (e.g. .xlsx/docx/…) are just ZIP-files in the end. To unzip them, simply rename them to .zip and use your favorite zip-tool to open them. You will see a file-structure similar to the one below:
pbix zip file content

(If your file contains a data model, you see a file called “DataModelSchema” instead of “Connections”. The next steps will not work in this case!). However, in our case, as the report is linked to a dataset hosted in the PowerBI service, our file does not contain any data itself but only the connection information to our data source. As you can guess, this information is stored in the “Connections” file.

To see what a connection to an Azure Analysis Services dataset looks like, I simply created a new PowerBI desktop model and established a Live Connection. Saved it and opened it again as zip file. The Connection file itself is just a JSON but the details are not really relevant here. I simply replaced the Connections file from my original report with the one from my new workbook linked to AAS. Renamed it back to pbix, opened it and voilà, my report was connected to AAS!

This saved us a lot of time and we could move all of our reports within a couple of hours!

Please keep in mind, that this is not officially supported and might break your model. So make sure to always create a backup before you modify the contents of a pbix file manually!
I do not take any responsibility for any broken models or anything else that might happen!

Refresh PowerBI Datasets using PowerShell and Azure Runbooks

In June 2017, Microsoft announced a new set of API function to manage data refreshes in PowerBI. The new API basically allows you to trigger a refresh or retrieve the history of previously executed refreshes. The full specification can be found in the official MSDN documentation, or using this direct links: Refresh dataset and Get dataset refresh history

So besides the scheduled and manual refreshes from within the PowerBI service directly, we now have a third option to trigger refreshes but this time also from an external caller! This itself is already pretty awesome and some people already did some cool stuff leveraging the new API functions:

Charles Sterling: Running the Power BI Refresh API’s Headless
Sirui Sun: Git-Repository powerbi-powershell

The basic idea is to use object from pre-built Azure Management DLLs to generate the OAuth Access token that is necessary to use the API. This works very well locally but cannot be used in the cloud – e.g. in combination with Azure Automation Runbooks or Azure Functions where you cannot install or reference any custom DLLs.

In this blog post I will show you how you can accomplish exactly this  – create an Azure Automation Runbook to refresh your PowerBI dataset!
But first of all there are some things that you need to keep in mind:

  1. There are no service accounts in PowerBI so we will always use a “real” user
  2. you need to supply the credentials of a “real” user
  3. The user needs to have appropriate access to the dataset in order to refresh it
  4. the dataset refresh must succeed if you do it manually in PowerBI
  5. you are still limited to 8 refreshes/day through the API

OK, so lets get started. First of all we need an Azure Application which has permissions in PowerBI. The easiest way to do this is to use the navigate to https://dev.powerbi.com/apps, log in with your account and simply follow the steps on the screen. The only import thing is to select the App Type “Native app”. At the end, you will receive a ClientID and a ClientSecret – Please remember the ClientID for later use!

Next step is to create the Azure Runbook. There are plenty of tutorials out there on how to do this: My first PowerShell workflow runbook or Creating or importing a runbook in Azure Automation so I will no go into much more detail here. Besides the runbook itself you also need to create an Automation Credential to store the username and password in a secure way – here is a tutorial for this: Credential Assets in Azure Automation

Now lets take a look at the PowerShell code. Instead of using any pre-built DLLs I removed all unnecessary code and do all the communication using Invoke-RestMethod. This is a very low-level function and is part of the standard PowerShell modules so there is no need to install anything! The tricky part is to acquire an Authentication Token using username/password as it is nowhere documented (at least I could not find it) what the REST call has to look like. So I used Fiddler to track the REST calls that the pre-built DLLs use and rebuilt them using Invoke-RestMethod. This is what I came up with:

Get Authentication Token
$authUrl = "https://login.windows.net/common/oauth2/token/"
$body = @{
    "resource" =https://analysis.windows.net/powerbi/api";
    "client_id" = $clientId;
    "grant_type" = "password";
    "username" = $pbiUsername;
    "password" = $pbiPassword;
    "scope" = "openid"
}

$authResponse = Invoke-RestMethod -Uri $authUrlMethod POST -Body $body

$clientId is the ClientID of the Azure AD Application
$pbiUsername is the email address of the PowerBI user.
$pbiPassword is the password of the PowerBI user.
The $authRepsonse then contains our Authentication token which we can use to make our subsequent calls:

Trigger Refresh in PowerBI
$restURL = "https://api.powerbi.com/v1.0/myorg/datasets/$pbiDatasetId/refreshes"
$headers = @{
    "Content-Type" = "application/json";
    "Authorization" = $authResponse.token_type + " " + $authResponse.access_token
}

$restResponse = Invoke-RestMethod -Uri $restURLMethod POST -Headers $headers

And that’s all you need. I wrapped everything into a PowerShell function that can be used as an Azure Runbook. The username/password is derived from an Azure Automation Credential.

The final runbook can be found here: PowerBI_Refresh_Runbook.ps1

Refresh_PowerBI_Dataset_Azure_Runbook

It takes 4 Parameters:

  1. CredentialName – the name of the Azure Automation credential that you created and which stores the PowerBI username and password
  2. ClientID – the ID of your Azure Active Directory Application which you created in the first step
  3. PBIDatasetName – the name of the PowerBI dataset that you want to refresh
  4. PBIGroupName – (optional) the name of the group/workspace in which the PowerBI dataset from 3) resides

When everything is working as expected, you can create custom schedules or even create webhooks to trigger the script and refresh you PowerBI dataset! As you probably know, this is really powerful as you can now make the refresh of the PowerBI dataset part of your daily ETL job!

C# Wrapper for Power BI REST API – Version 2

Some time ago, when Microsoft released the first version of Power BI Rest API I already wrote a wrapper for C# which allowed you to map the object from the API into regular C# objects and work with them locally. However, there have been some major upgrades since then (Actually they were already announced in July 2016 but I did not find anytime to work on this again until now Smile ). Anyway, I just published a new version of my C# wrapper on my GitHub site: https://github.com/gbrueckl/PowerBI.API.Client

To use it, you first need to create an Azure AD application and get an ApplicationID – this is very well described here or can be done directly at https://dev.powerbi.com/apps

A new PowerBI API Client object can then be created using the ApplicationID:

Create a PowerBI API Client
PBIAPIClient pbic = new PBIAPIClient(ApplicationID);

Basically, most of the features described in the API reference are also included in the API Wrapper. So you can now use C# to create your PowerBI model locally and deploy it to the PowerBI service! The only to keep in mind is that the dataset you create via the API can only be sourced by pushing data into it using the Push/Streaming API. As this can be quite cumbersome sometimes, I also added the functionality to publish a whole C# DataTable with basically just two lines of code to publish your reference data/dimensions:

Publish DataTable to PowerBI
// create a regular DataTable – but could also be derived from a SQL Database!
DataTable dataTable = new DataTable();
/* populate the dataTable */
// create a PBI table from a regular DataTable object
PBITable productsTable = new PBITable(dataTable);
// publish the table and push the rows from the dataTable to the PowerBI table
productsTable.PublishToPowerBI(true);

 

This snippet basically deploys the table structure to the PowerBI service and populates it with data from the DataTable:
Published_DataTable

 

For your “fact”-data you can also create single rows on your own using the PBIRow-object and publish them manually e.g. for WriteBack-scenarios:

Publish Rows to PowerBI
salesTable.DeleteRowsFromPowerBI();
PBIRow row = salesTable.GetSampleRow();
row.SetValue("ProductKey", 1);
row.SetValue("SalesDate", DateTime.Now);
row.SetValue("Amount_BASE", 100);
salesTable.PushRowToPowerBI(row);

Depending on the type of DataSet you choose (Push, PushStreaming or Streaming), you can also create DAX Measures or Relationships:

Add Measures and Relationships
salesTable.Measures.Add(new PBIMeasure("Sales Amount", "SUM('{0}'[{1}])", tableNameFacts, "Amount_BASE")); // adding a measure
dataset.Relationships.Add(new PBIRelationship("MyRelationship", salesTable.GetColumnByName("ProductKey"), productsTable.GetColumnByName("ProductKey")));

 

Of course, all the features that were already supported in the first version, are still supported:

  • Get Embed-URLs of Reports and Tiles
  • List Reports, Dashboards, Datasets, …

The new version also supports Streaming and PushStreaming datasets in the same way as it does for regular Push datasets. For details on Streaming datasets please take a look at Real-time streaming in PowerBI

I recommend to explore the API on your own by simply building your first PowerBI Push/Streaming Model on your own!
For the latest features and improvements please refer to the GitHub repository which will be updated frequently.

Any feedback and participation in the further development is highly appreciated and will be done via the GitHub repository.