Score whole PowerBI DataSets dynamically in Azure ML

One of the most requested features when it comes to Azure ML is and has always been the integration into PowerBI. By now we are still lacking a native connector in PowerBI which would allow us to query a published Azure ML web service directly and score our datasets. Reason enough for me to dig into this issue and create some Power Query M scripts to do this. But lets first start off with the basics of Azure ML Web Services.

Every Azure ML project can be published as a Web Service with just a single click. Once its published, it can be used like any other Web Service. Usually we would send a record or a whole dataset to the Web Service, the Azure ML models does some scoring (or any other operation within Azure ML) and then sends the scored result back to the client. This is straight forward and Microsoft even supplies samples for the most common programming languages. The Web Service relies on a standardized REST API which can basically be called by any client. Yes, in our case this client will be PowerBI using Power Query.
Rui Quintino has already written an article on AzureML Web Service Scoring with Excel and Power Query and also Chris Webb wrote a more generic one on POST Request in Power Query in general Web Service and POST requests in Power Query. Even Microsoft recently published an article how you can use the R Integration of Power Query to call a Azure ML Web Service here.

Having tried these solutions, I have to admit that they have some major issues:
1) very static / hard coded
2) complex to write
3) operate on row-by-row basis and might run into the API Call Limits as discussed here.
4) need a local R installation

As Azure ML usually deal with tables, which are basically Power Query DataSets, a requirement would be to directly use a Power Query DataSet. The DataSet has to be converted dynamically into the required JSON structure to be POSTed to Azure ML. The returned result, usually a table again, should be converted back to a Power Query DataSet. And that’s what I did, I wrote a function that does all this for you. All information that you have to supply can be found in the configuration of your Azure ML Web Service:
– Request URI of your Web Service
– API Key
– the [Table to Score]

the [Table to Score] can be any Power Query table but of course has to have the very same structure (including column names and data types) as expected by the Web Service Input. Then you can simply call my function:
InvokeFunction_PowerBI_DataSet_in_AzureML
Score_PowerBI_DataSet_in_AzureML

The whole process involves a lot of JSON conversions and is kind of complex but as I encapsulated everything into M functions it should be quite easy to use by simply calling the CallAzureMLService-function.

However, here is a little description of the used functions:
ToAzureMLJson – converts any object that is passed in as an argument to a JSON element. If you pass in a table, it is converted to a JSON-array. Dates and Numbers are formatted correctly, etc. so the result can the be passed directly to Azure ML.

AzureMLJsonToTable – converts the returned JSON back to a Power Query Table. It obeys column names and also data types as defined in the Azure ML Web Service output. If the output changes (e.g. new columns are added) this will be taken care of dynamically!

CallAzureMLService – uses the two function from above to convert a table to JSON, POST the JSON to Azure ML and convert the result back to a Power Query Table.

Known Issues:
As the [Table to Score] will probably come from a SQL DB or somewhere else, you may run into issues with Privacy Levels/Settings and the Formula Firewall. In this case make sure to enable Fast Combine for your workbook as described here.

The maximum timeout of a Request/Response call to an Azure ML Web Service is 100 seconds. If your call exceeds this limit, you might get an error message returned.I ran a test and tried to score 60k rows (with 2 numeric columns) at once and it worked just fine, but I would assume that you can run into some Azure ML limits here very easily with bigger data sets. As far as I know, these 100 seconds are for the Azure ML itself only. If it takes several minutes to upload your dataset in the POST request, than this is not part of this 100 seconds. If you are still hitting this issue, you could further try to split your table into different batches, score them separately and combine the results again afterwards.

 

So these are the steps that you need to do in order to use your Azure ML Web Service together with PowerBI:
1) Create an Azure ML Experiment (or use an existing)
2) Publish the Experiment as a Web Service
3) note the URL and the API Key of your Web Service
4) run PowerBI and load the data that you want to score
5) make sure that the dataset created in 4) has the exact same structure as expected by Azure ML (column names, data types, …)
6) call the function “CallAzureMLWebService” with the parameters from 3) and 5)
7) wait for the Web Service to return the result set
8) load the final table into PowerBI (or do some further transformations before)

And that’s it!

Download:
You can find a PowerBI workbook which contains all the functions and code here: CallAzureMLWebService.pbix
I used a simple Web Service which takes 2 numeric columns (“Number1” and “Number2”) and returns the [Number1] * [Number2] and [Number1] / [Number2]

PS: you will not be able to run the sample as it is as I changed the API Key and also the URL of my original Azure ML Web Service

Running Local R Scripts in Power BI

One of the coolest features of Power BI is that I integrates very well with other tools and also offers a lot of interfaces which can be used to extend this capabilities even further. One of those is the R Integration which allows you to run R code from within Power BI. R scripts can either be used as a data source or for visualizing your data. In this post I will focus on the data source component and show how you can use a locally stored R script and execute it directly in Power BI. Compared to the native approach where you need to embed the R code in the Power BI file, this has several advantages:

  • Develop R script in familiar external tool like RStudio
  • Integration with Source Control
  • Leverage Power BI for publishing and visualizing results

Out of the box Power BI only supplies one function to call R scripts as a data source which is R.Execute(text). Usually, when you use the wizard, it simply passes your R script as a hardcoded value to this function. Knowing the power of Power BI and its scripting language M for data integration made me think – “Hey, as R scripts are just text files and Power BI can read text files, I could also dynamically read any R script and execute it!”

Well, turns out to be true! So I created a little M function where I pass in the file-path of an existing R script and which returns a table of data frames which are created during the execution of the script. Those can then be used like any other data sets/tables within Power BI:
Power_BI_R_DataSource_dynamic_local_script

And here is the corresponding M code for the Power Query function:
(Thanks also to Imke Feldmann for simplifying my original code to the readable one below)

  1. let
  2.     LoadLocalRScript = (file_path as text) as table =>
  3. let
  4.     Source = Csv.Document(File.Contents(file_path),[Delimiter=#(lf), Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]),
  5.     RScript =  Text.Combine(Source[Column1], "#(lf)"),
  6.     output = R.Execute(RScript)
  7. in
  8.     output
  9. in
  10.     LoadLocalRScript

First we read the R script like any other regular CSV file but we use line-feed (“#(lf)”) as delimiter. So we get a table with one column and one row for each line of our original R script.
Then we use Text.Combine() on our column to transform the single lines back into one long text resembling our original R script. This text can the be passed to the R.Execute() function to return the list of R data frames created during the execution of the script.

And that’s it! Any further steps are similar to using any regular R script which is embedded in Power BI so it is up to you on how you proceed from here. Just one thing you need to keep in mind is that changing the local R script might break the Power BI load if you changed or deleted any data frames which are referenced in Power BI later on.

One issues that I came across during my tests is that this approach does not work with scheduled refreshes in the Power BI Web Service via the Personal Gateway. The first reason for this is that it is currently not possible to use scheduled refresh if custom functions are involved. Even if you can work around this issue pretty easily by using the code from above directly in Power Query I still ran into issues with different privacy levels for the location of the R script and the R.Execute() function. But I will investigate into those issues and update this blog post accordingly (see UPDATE below).
For the future I hope that is fixed by Microsoft and Power BI allows you to execute remote scripts natively – but until then, this approach worked quite well for me.

UPDATE:
To make the refresh via the Personal Gateway work you have to enable “FastCombine”. How to do this is described in more detail here: Turn on FastCombine for Personal Gateway.

In case you are interested in more details on this approach, I am speaking at TugaIT in Lisbon, Portugal this Friday (20th of May 2016) about “Power BI for the Data Scientist” where I will cover this and lots of other interesting topics about the daily work of a data scientist and how PowerBI can used to ease them.

Downloads:
Power BI Workbook: Load_Local_R_Script_wFunction.pbix
Sample R Script: Sample_R_Script.r

Visualizing SSAS Calculation Dependencies using PowerBI

 

UPDATE: This does not work for Tabular Models in Compatibility Level 120 or above as they do not expose the calculation dependencies anymore!

 

One of my best practices when designing bigger SQL Server Analysis Services (SSAS) Tabular models is to nest calculations whenever possible. The reasons for this should be quite obvious:

  • no duplication of logics
  • easier to develop and maintain
  • (caching)

However, this also comes with a slight drawback: after having created multiple layers of nested calculations it can be quite hart to tell on which measures a top-level calculations actually depends on. Fortunately the SSAS engine exposes this calculation dependencies in one of its DMVs – DISCOVER_CALC_DEPENDENCY.
This DMV basically contains information about all calculations in the model:

  • Calculated Measures
  • Calculated Columns
  • Relationships
  • Dependencies to Tables/Columns

Chris Webb already blogged about this DMV some time ago and showed some basic (tabular) visualization within an Excel Pivot table (here). My post focuses on PowerBI and how can make the content of this DMV much more appealing and visualize it in a way that is very easy to understand.
As the DMV is built up like a parent-child hierarchy, I had to use a recursive M-function to resolve this self-referencing table which actually was the hardest part to do. Each row contains a link to a dependent object, which can have other dependencies again. In order to visualize this properly and let the user select a Calculation of his choice to see a calculation tree, I needed to expand each row with all of its dependencies, keeping their link to the root-node:

Here is a little example:

Object Referenced_Object
A B
B C

The table above is resolved to this table:

Root Object Referenced_Object
A A B
A B C
B B C

The Root-column is then used to filter and get all dependent calculations.
The PowerBI file also contains some other M-functions but those are mainly for ease-of-use and to keep the queries simple.

Once all the data was loaded into the model, I could use one of PowerBI’s custom visuals from the PowerBI Gallery – the Sankey Chart with Labels
SSAS_Visualizing_Tabular_Calc_Dependencies

Here is also an interactive version using the Publishing Feature of Power BI:

 

You can use the Slicers to filter on the Table, the Calculation Type and the Calculation itself and the visual shows all the dependencies down to the physical objects being Tables and Columns. This makes it a lot easier to understand your model and the dependencies that you built up over time.
I attached the sample-PowerBI-file below. You simply need to change the connectionstring to your SSAS Tabular Server and refresh the data connections.

The PowerBI-file (*.pbix) can be downloaded here: SSAS_CalcDependencies.pbix

Error-handling in Power Query

Data is the daily bread-and-butter for any analyst. In order to provide good results you also need good data. Sometimes this data is very well prepared beforehand and you can use it as it is but it is also very common that you need to prepare and transform the data on your own. To do this Microsoft has introduced Power Query (on tool of the Power BI suite). Power Query can be used to extract, transform and load data into Excel and/or Power Pivot directly.

When using any data you usually know what the data looks like and what to expect from certain columns – e.g. a value is delivered as a number, a text contains exactly 4 characters, etc.
Though, sometimes this does not apply for all rows of that dataset and your transformation logics may cause errors because of that. In order to avoid this and still have a clean data load you need to handle those errors. Power Query offers several options to this which I will elaborate in this post.

This is the sample data I will use for the following samples:

A B C
1 4 AXI23
2 5 TZ560
NA 6 UP945

we will perform simple transformations and type casts on this table to generate some errors:

Prepare_ChangeType

Error-handling on row-level

This is the easiest way of handling errors. Whenever a transformation causes an error, we can simply remove the whole row from the result set:

RowLevel_RemoveErrors

This will generate the following result table:

A B C
1 4 AX312
2 5 TZ560

As you can see, the third row was simply removed. This is the easiest way on how to remove errors from your result set but of course this may not be what you want as those removed rows may contain other important information in other columns! Assume you want to calculate the SUM over all values in column B. As we removed the third row we also removed a value from column B and the SUM is not the same as without the error-handling (9 vs. 15)!

 

Error-handling on cell-level

As we now know that column A may result in an error, we can handle this error during our transformation steps. As “NA” could not be converted to a number we see Error as result for that column. Clicking on it gives use some more details on the error itself. To handle this error we need to create a new calculated column where we first check if the value can be converted to a number and otherwise return a default numeric value:

RowLevel_RemoveErrors_CalcColumn

The M-function that we use is “try <expressions to try> otherwise <default if error>” which is very similar to a try-catch block in C#. If the expression causes an error, the default will be used instead. Details on the try-expression can be found in the Microsoft Power Query for Excel Formula Language Specification (PDF) which can be found here and is a must read for everyone interested in the M language.

CellLevel_HandleError

We could further replace our column A by the new column A_cleaned to hide this error handling transformation.

A B C A_cleaned
1 4 AXI23 1
2 5 TZ560 2
NA 6 UP945 0

 

Error-handling on cell-level with error details

There may also be cases where it is OK to have one of this errors but you need/want to display the cause of the error so that a power user may correct the source data beforehand. Again we can use the try-function, but this time without the otherwise-clause. This will return a record-object for each row:

RowLevel_ShowErrors_CalcColumn

RowLevel_ShowErrors_Result1

After expanding the A_Try column and also the A_Try.Error column we will get all available information on the error:

RowLevel_ShowErrors_Result2

A B C A_Try.HasError A_Try.Value A_Try.Error.Reason A_Try.Error.Message A_Try.Error.Detail
1 4 AXI23 FALSE 1
2 5 TZ560 FALSE 2
6 UP945 TRUE DataFormat.Error Could not convert to Number. NA

As you can see we get quite a lot of columns here. We could e.g. use A_Try.HasError to filter out error rows (similar to error-handling on row-level) or we could use it in a calculated column to mimic error-handling on cell-level. What you want to do with all the information is up to you, but in case you don’t need it you should remove all unnecessary columns.

 

Downloads:

Power Query Error Handling Workbook: Power Query Error Handling.xlsx