About Gerhard Brueckl

Gerhard has been working with Microsoft BI tools since 2006 mainly focusing on Microsoft SQL Server and its components. As a consultant and architect he designed various enterprise BI solutions primary in the German-speaking areas. His personal interest has always been on analytical databases and their capabilities. From the start on he has always been active in the Analysis Services community and in 2012 he achieved the SSAS Maestro certification. This was also about the same time when he started his own blog and began speaking at international conferences all over the world.

Processing Azure Analysis Services with OAuth Sources (like Azure Data Lake Store)

As you probably know from my last blog post, I am currently upgrading the PowerBI reporting platform of one of my customer from a PowerBI backend (dataset hosted in PowerBI service) to an Azure Analysis Services backend. The upgrade/import of the dataset into Azure Analysis Services itself worked pretty flawless and after switching the connection of the reports everything worked as expected and everyone was happy. However, things got a bit tricky when it came to automatically refreshing the Azure Analysis Services database which was based on an Azure Data Lake Store. For the original PowerBI dataset, this was pretty straight forward as a scheduled refresh from an Azure Data Lake store data source works out of the box. For Azure Analysis Services this is a bit different.

When you build and deploy your data model from Visual Studio, your are prompted for the credentials to access ADLS which are then stored in the data source object of AAS. As you probably know, AAS uses OAuth authentication to access data from ADLS. And this also causes a lot of problems. OAuth is based on tokens and those tokens are only valid for a limited time, by default this is 2 hours. This basically means, that you can process your database for the next 2 hours and it will fail later on with an error message saying that the token expired. (The above applies to all OAuth sources!)
This problem is usually solved by using an Azure Service Principal instead of a regular user account where the token does not expire. Unfortunately, this is not supported at the moment for ADLS data sources and you have to work around this issue.

So the current situation that we need to solve is as follows:

  • we can only use regular user accounts to connect AAS to ADLS as service principals are not supported yet
  • the token expires after 2 hours
  • the database has to be processed on a regular basis (daily, hourly, …) without any manual interaction
  • manually updating the token is (of course) not an option

Before you continue here, make sure that you read this blog post first: https://blogs.msdn.microsoft.com/dataaccesstechnologies/2017/09/01/automating-azure-analysis-service-processing-using-azure-automation-account/
It describes the general approach of using Azure Automation to process an Azure Analysis Services model and most of the code in this blog post if based on this!
Also this older blog post will be a good read as some concepts and code snippets are reused here.

Back to our example – as we were already using Azure Automation for some other tasks, we decided to also use it here. Also, PowerShell integrates very well with other Azure components and was the language of choice for us. To accomplish our goal we had to implement 3 steps:

  1. acquire a new OAuth token
  2. update the ADLS data source with the new token
  3. run our processing script

I could copy the code for the first step more or less from one of my older blog post (here) where I used PowerShell to acquire an OAuth token to trigger a refresh in PowerBI.

The second step is to update ADLS data source of our Azure Analysis Services model. To get started, the easiest thing to do is to simply open the AAS database in SQL Server Management Studio and let it script the existing datasource for you: AAS_Script_OAuth_DataSource
The resulting JSON will look similar to this one:

Update ADLS DataSource
{
  "createOrReplace": {
    "object": {
      "database": "Channel Analytics",
      "dataSource": "DS_ADLS"
    },
    "dataSource": {
      "type": "structured",
      "name": "DS_ADLS",
      "connectionDetails": {
        "protocol": "data-lake-store",
        "address": {
          "url": "https://mydatalake.azuredatalakestore.net"
        }
      },
      "credential": {
        "AuthenticationKind": "OAuth2",
        "token_type": "********",
        "scope": "********",
        "expires_in": "********",
        "ext_expires_in": "********",
        "expires_on": "********",
        "not_before": "********",
        "resource": "********",
        "id_token": "********",
        "kind": "DataLake",
        "path": "https://mydatalake.azuredatalakestore.net/",
        "RefreshToken": "********",
        "AccessToken": "********"
      }
    }
  }
}

The important part for us is the “credential” field. It contains all the information necessary to authenticate against our ADLS store. However, most of this information is sensitive so only asterisks are displayed in the script. The rest of the JSON (except for the “credential” field) is currently hardcoded in the PowerShell cmdlet so if you want to use it, you need to change this manually!
The PowerShell cmdlet then combines the hardcoded part with an updated “credential”-field which is obtained by invoking a REST request to retrieve a new OAuth token. The returned object is modified a bit in order to match the required JSON for the datasource.
Once we have our final JSON created, we can send it to our Azure Analysis Services instance by calling the Invoke-ASCmd cmdlet from the SqlServer module.
Again, please see the original blog post mentioned above for the details of this approach.

After we have updated our datasource, we can simply call our regular processing commands which will then be executed using the newly updated credentials.
The script I wrote allows you to specify which objects to process in different ways:

  • whole database (by leaving AASTableName and AASPartitionName empty)
  • a single or multiple table and all its partitions (by leaving only AASPartitionName empty)
  • or multiple partitions of a single table (by specifying exactly one AASTableName and multiple AASPartitionNames

If multiple tables or partitions are specified, the elements are separated by commas (“,”)

So to make the Runbook work in your environment, follow all the initial steps as described in the original blog post from Microsoft. In addition, you also need to create an Application (Type = “Native”) in your Azure Active Directory to obtain the OAuth token programmatically. This application needs the “Sign in and read user profile” permission from the API “Windows Azure Active Directory (Microsoft.Azure.ActiveDirectory)”:
AAD_App_Permissions
Also remember the ApplicationID, it will be used as a parameter for the final PowerShell Runbook (=parameter “ClientID”!
When it comes to writing the PowerShell code, simply use the code from the download at the end of this blog post.

For the actual credential that you are using, make sure that it has the following permissions:

  • to update the AAS datasource (can be set in the AAS model or for the whole server)
  • has access to the required ADLS files/folders which are processed (can be set e.g. via ADLS Data Explorer)
  • (if you previously used your own account to do all the AAS and ADLS development, this should work just fine)

In general, a similar approach should work for all kinds of datasources that require OAuth authentication but so far I have only tested it with Azure Data Lake Store!

Download: AAS_Process_OAuth_Runbook.ps1

Upgrading your reports from PowerBI to Azure Analysis Services

Since April 2017 it is possible to build reports on top of datasets that are hosted in the PowerBI service. This was announced and described here and here in more detail. This might not seem like a big deal at first sight, but it can have a huge impact on how you work with PowerBI. By separating the data model from the report, you can have two or more independent people working with the same dataset. Also, the people who build the reports in the end most not necessarily have the knowledge to build a data model – the just need to use it. So, there are some clear advantages when you split up your workbook:

  • separation of duty (data modeler vs. report builder)´
  • any number of reports on top of the same model
  • easy control over reports as the files are quite small (they only contain the definition of the report)

This is all pretty cool and, from my point of view, the way to go once you want to use the reports in production and/or have several people working on/with the same reports.

But lets go a step further. After some time, as your data model grows, you realize that the reports get slow and also the processing takes a considerable amount of time to finish. The official upgrade path will then guide you to Azure Analysis Services and you will migrate your data model to deal with the larger data volumes and make use of the flexibility in processing you gained by your upgrade. This migration process is very well described here.

So far so good, but what happens to your reports? Last week I was in exactly the position described above and we had to migrate the existing reports (which were base on a dataset hosted in PowerBI) to Azure Analysis Services. As of now, there is now simple way to simply change the connection string from PowerBI to Azure Analysis Services neither in PBI Desktop nor in the Service. But we could think of some options how it might work:

  • rebuild all reports
  • use the REST API to update the connection string of the existing reports
  • modify the .pbix file manually (NOT OFFICIALLY SUPPORTED)

As you can imagine, rebuilding all reports was not really an option.

The next option, the PowerBI REST API looked pretty promising at first sight. It allows you to retrieve and set the dataset that is used by your report. So the idea is to simply create a new dataset which points to Azure Analysis Services in Live Query mode, take the existing report and use the Rebind API call to bind it to the new AAS dataset. Even though this is supposed to work, I could not make it work in my environment. I tried all things that I could think of but nothing work and I also gave up on this.

So I was stuck there but knew that the information of the data source has to be somewhere in the .pbix file. In the past I already did something similar with Excel/PowerPivot files (“Restoring a SSAS Tabular model to PowerPivot”) so I thought I would also give it a try for .pbix files. And it turns out that they are quite similar. For those of you who are new to this, most (if not all) of the files that are associated with a Microsoft tool and end with “x” (e.g. .xlsx/docx/…) are just ZIP-files in the end. To unzip them, simply rename them to .zip and use your favorite zip-tool to open them. You will see a file-structure similar to the one below:
pbix zip file content

(If your file contains a data model, you see a file called “DataModelSchema” instead of “Connections”. The next steps will not work in this case!). However, in our case, as the report is linked to a dataset hosted in the PowerBI service, our file does not contain any data itself but only the connection information to our data source. As you can guess, this information is stored in the “Connections” file.

To see what a connection to an Azure Analysis Services dataset looks like, I simply created a new PowerBI desktop model and established a Live Connection. Saved it and opened it again as zip file. The Connection file itself is just a JSON but the details are not really relevant here. I simply replaced the Connections file from my original report with the one from my new workbook linked to AAS. Renamed it back to pbix, opened it and voilà, my report was connected to AAS!

This saved us a lot of time and we could move all of our reports within a couple of hours!

Please keep in mind, that this is not officially supported and might break your model. So make sure to always create a backup before you modify the contents of a pbix file manually!
I do not take any responsibility for any broken models or anything else that might happen!

Refresh PowerBI Datasets using PowerShell and Azure Runbooks

In June 2017, Microsoft announced a new set of API function to manage data refreshes in PowerBI. The new API basically allows you to trigger a refresh or retrieve the history of previously executed refreshes. The full specification can be found in the official MSDN documentation, or using this direct links: Refresh dataset and Get dataset refresh history

So besides the scheduled and manual refreshes from within the PowerBI service directly, we now have a third option to trigger refreshes but this time also from an external caller! This itself is already pretty awesome and some people already did some cool stuff leveraging the new API functions:

Charles Sterling: Running the Power BI Refresh API’s Headless
Sirui Sun: Git-Repository powerbi-powershell

The basic idea is to use object from pre-built Azure Management DLLs to generate the OAuth Access token that is necessary to use the API. This works very well locally but cannot be used in the cloud – e.g. in combination with Azure Automation Runbooks or Azure Functions where you cannot install or reference any custom DLLs.

In this blog post I will show you how you can accomplish exactly this  – create an Azure Automation Runbook to refresh your PowerBI dataset!
But first of all there are some things that you need to keep in mind:

  1. There are no service accounts in PowerBI so we will always use a “real” user
  2. you need to supply the credentials of a “real” user
  3. The user needs to have appropriate access to the dataset in order to refresh it
  4. the dataset refresh must succeed if you do it manually in PowerBI
  5. you are still limited to 8 refreshes/day through the API

OK, so lets get started. First of all we need an Azure Application which has permissions in PowerBI. The easiest way to do this is to use the navigate to https://dev.powerbi.com/apps, log in with your account and simply follow the steps on the screen. The only import thing is to select the App Type “Native app”. At the end, you will receive a ClientID and a ClientSecret – Please remember the ClientID for later use!

Next step is to create the Azure Runbook. There are plenty of tutorials out there on how to do this: My first PowerShell workflow runbook or Creating or importing a runbook in Azure Automation so I will no go into much more detail here. Besides the runbook itself you also need to create an Automation Credential to store the username and password in a secure way – here is a tutorial for this: Credential Assets in Azure Automation

Now lets take a look at the PowerShell code. Instead of using any pre-built DLLs I removed all unnecessary code and do all the communication using Invoke-RestMethod. This is a very low-level function and is part of the standard PowerShell modules so there is no need to install anything! The tricky part is to acquire an Authentication Token using username/password as it is nowhere documented (at least I could not find it) what the REST call has to look like. So I used Fiddler to track the REST calls that the pre-built DLLs use and rebuilt them using Invoke-RestMethod. This is what I came up with:

Get Authentication Token
$authUrl = "https://login.windows.net/common/oauth2/token/"
$body = @{
    "resource" =https://analysis.windows.net/powerbi/api";
    "client_id" = $clientId;
    "grant_type" = "password";
    "username" = $pbiUsername;
    "password" = $pbiPassword;
    "scope" = "openid"
}

$authResponse = Invoke-RestMethod -Uri $authUrlMethod POST -Body $body

$clientId is the ClientID of the Azure AD Application
$pbiUsername is the email address of the PowerBI user.
$pbiPassword is the password of the PowerBI user.
The $authRepsonse then contains our Authentication token which we can use to make our subsequent calls:

Trigger Refresh in PowerBI
$restURL = "https://api.powerbi.com/v1.0/myorg/datasets/$pbiDatasetId/refreshes"
$headers = @{
    "Content-Type" = "application/json";
    "Authorization" = $authResponse.token_type + " " + $authResponse.access_token
}

$restResponse = Invoke-RestMethod -Uri $restURLMethod POST -Headers $headers

And that’s all you need. I wrapped everything into a PowerShell function that can be used as an Azure Runbook. The username/password is derived from an Azure Automation Credential.

The final runbook can be found here: PowerBI_Refresh_Runbook.ps1

Refresh_PowerBI_Dataset_Azure_Runbook

It takes 4 Parameters:

  1. CredentialName – the name of the Azure Automation credential that you created and which stores the PowerBI username and password
  2. ClientID – the ID of your Azure Active Directory Application which you created in the first step
  3. PBIDatasetName – the name of the PowerBI dataset that you want to refresh
  4. PBIGroupName – (optional) the name of the group/workspace in which the PowerBI dataset from 3) resides

When everything is working as expected, you can create custom schedules or even create webhooks to trigger the script and refresh you PowerBI dataset! As you probably know, this is really powerful as you can now make the refresh of the PowerBI dataset part of your daily ETL job!

C# Wrapper for Power BI REST API – Version 2

Some time ago, when Microsoft released the first version of Power BI Rest API I already wrote a wrapper for C# which allowed you to map the object from the API into regular C# objects and work with them locally. However, there have been some major upgrades since then (Actually they were already announced in July 2016 but I did not find anytime to work on this again until now Smile ). Anyway, I just published a new version of my C# wrapper on my GitHub site: https://github.com/gbrueckl/PowerBI.API.Client

To use it, you first need to create an Azure AD application and get an ApplicationID – this is very well described here or can be done directly at https://dev.powerbi.com/apps

A new PowerBI API Client object can then be created using the ApplicationID:

Create a PowerBI API Client
PBIAPIClient pbic = new PBIAPIClient(ApplicationID);

Basically, most of the features described in the API reference are also included in the API Wrapper. So you can now use C# to create your PowerBI model locally and deploy it to the PowerBI service! The only to keep in mind is that the dataset you create via the API can only be sourced by pushing data into it using the Push/Streaming API. As this can be quite cumbersome sometimes, I also added the functionality to publish a whole C# DataTable with basically just two lines of code to publish your reference data/dimensions:

Publish DataTable to PowerBI
// create a regular DataTable – but could also be derived from a SQL Database!
DataTable dataTable = new DataTable();
/* populate the dataTable */
// create a PBI table from a regular DataTable object
PBITable productsTable = new PBITable(dataTable);
// publish the table and push the rows from the dataTable to the PowerBI table
productsTable.PublishToPowerBI(true);

 

This snippet basically deploys the table structure to the PowerBI service and populates it with data from the DataTable:
Published_DataTable

 

For your “fact”-data you can also create single rows on your own using the PBIRow-object and publish them manually e.g. for WriteBack-scenarios:

Publish Rows to PowerBI
salesTable.DeleteRowsFromPowerBI();
PBIRow row = salesTable.GetSampleRow();
row.SetValue("ProductKey", 1);
row.SetValue("SalesDate", DateTime.Now);
row.SetValue("Amount_BASE", 100);
salesTable.PushRowToPowerBI(row);

Depending on the type of DataSet you choose (Push, PushStreaming or Streaming), you can also create DAX Measures or Relationships:

Add Measures and Relationships
salesTable.Measures.Add(new PBIMeasure("Sales Amount", "SUM('{0}'[{1}])", tableNameFacts, "Amount_BASE")); // adding a measure
dataset.Relationships.Add(new PBIRelationship("MyRelationship", salesTable.GetColumnByName("ProductKey"), productsTable.GetColumnByName("ProductKey")));

 

Of course, all the features that were already supported in the first version, are still supported:

  • Get Embed-URLs of Reports and Tiles
  • List Reports, Dashboards, Datasets, …

The new version also supports Streaming and PushStreaming datasets in the same way as it does for regular Push datasets. For details on Streaming datasets please take a look at Real-time streaming in PowerBI

I recommend to explore the API on your own by simply building your first PowerBI Push/Streaming Model on your own!
For the latest features and improvements please refer to the GitHub repository which will be updated frequently.

Any feedback and participation in the further development is highly appreciated and will be done via the GitHub repository.

Deploying an Azure Data Factory project as ARM Template

In my last post I wrote about how to Debug Custom .Net Activities in Azure Data Factory locally. This fixes one of the biggest issues in Azure Data Factory at the moment for developers. The next bigger problem that you will run into is when it comes to deploying your Azure Data Factory project. At the moment, you can only do it manually from Visual Studio which, for bigger projects, can take quite some time. So I extended and advanced the code from my CustomActivityDebugger. Well, actually I rewrote some major parts of it and moved it into a new GitHub repository: Azure.DataFactory.LocalEnvironment

The new code base now includes the functionality to export an existing ADF project to an ARM template which can then be deployed very easily using Azure standard deployment mechanisms.

So basically, these are the changes and new Features that I made:

Export as ARM template:

  • Export all ADF objects and properties
  • Support for configurations
  • obey dependencies between ADF objects
  • parameterized Data Factory name
  • automatic upload of ADF dependencies (e.g. custom activities)
  • specify the region where ADF should be deployed (ADF is not available in all regions yet!)

ADF_LocalEnvironment_DeployARMTemplate

Custom Activity Debugger:

  • simplified usability – just select the pipeline, activity and set the slice-dates
  • Support for configurations
  • no need to add any namespaces
  • no need to add any references
  • write activity log to console output

ADF_LocalEnvironment_DebugActivity_Breakpoing

General:

  • Load from the ADF Project file (.dfproj) instead of a whole folder
  • implemented as Assembly
  • can be used in a Console Application for automation
  • will be published via NuGet in the future! (coming soon)

 

Everything else is described in the Git-Repository itself!

Hope you enjoy it!

Monitoring Azure Data Factory using PowerBI

Some time ago Microsoft released the first preview of a tool which allows you to monitor and control your Azure Data Factory (ADF). It is a web UI where you can select your Pipeline, Activity or Dataset and check its execution state over time. However, from my very personal point of view the UI could be much better, especially much clearer(!) as it is at the moment. But that’s not really a problem as the thing I like the most about ADF is that its quite open for developers (for example Custom C#/.Net Activities) and it also offers a quite comprehensive REST API to control an manage it.
For our monitoring purposes we are mainly interested in the LIST interface but we could do basically every operation using this API. In my example I only used the Dataset API, the Slices API and the Pipeline API.

First we start with the Dataset API to get a list of all data sets in our Data Factory. This is quite simple as we just need to build our URL of the REST web service like this:

  1. https://management.azure.com/subscriptions/{SubscriptionID}/resourcegroups/{ResourceGroupName}/providers/Microsoft.DataFactory/datafactories/{DataFactoryName}/datasets?api-version={api-version}

You can get all of this information for the Azure Portal by simply navigating to your Data Factory and checking the URL which will be similar to this one:

  1. https://portal.azure.com/#resource/subscriptions/1234567832324a04a0a66e44bf2f5d11/resourceGroups/myResourceGroup/providers/Microsoft.DataFactory/dataFactories/myDataFactory

So this would be my values for the API Call:
– {SubscriptionID} would be “12345678-3232-4a04-a0a6-6e44bf2f5d11”
– {ResourceGroupName} would be “myResourceGroup”
– {DataFactoryName} would be “myDataFactory”
– {api-version} would be a fixed value of “2015-10-01”

Once you have your URL you can use PowerBI to query the API using Get Data –> From Web
Next you need to authenticate using your Personal or Organizational Account – the same that you use to sign in to the Portal – and also the level for which you want to use the credentials. I’d recommend you to set it either to the subscription level or to the data-factory itself, depending on your security requirements. This ensures that you are not asked for credentials for each different API:
ADF_PowerBI_Authentication

This works in a very similar way also for the Slices API, the Pipeline API and all other APIs available! The other transformations I used are regular PowerQuery/M steps done via the UI so I am not going to describe them in more detail here. Also, setting up the relationships in our final PowerPivot model should be straight forward.

Now that we have all the required data in place, we can start with our report. I used some custom visuals for the calendar view, some slicers and a simple table to show the details. I also used a Sankey Chart to visualize the dependencies between the datasets.

ADF_PowerBI_Monitoring_Dashboard
ADF_PowerBI_Monitoring_Dependencies

Compared to the standard GUI for monitoring this provides a much better overview of slices and their current states and it also allows easy filtering. I am sure there are a lot of other PowerBI visualizations which would make a lot of sense here, these are just to give you an idea how it could look like, but of course you have all the freedom PowerBI offers you for reporting!

The only drawback at the moment is that you cannot reschedule/reset slices from PowerBI but for my monitoring-use-case this was not a problem at all. Also, I did not include the SliceRun API in my report as this would increase the size of the data model a lot, so detailed log information is not available in my sample report.

The whole PowerBI template is available for download on my GitHub site: https://github.com/gbrueckl/Azure.DataFactory.PowerBIMonitor

Debugging Custom .Net Activities in Azure Data Factory


UPDATE 2017-02-22:
I released a new toolset for Azure Data Factor which also integrates the Customer .Net Activity Debugger from this blob post. Please refer to the new GitHub project: https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment

(all links have been changed to refer to the new repository!)


Azure Data Factory (ADF) is one of the newer tools of the whole Microsoft Data Platform on Azure. It is Microsoft’s Data Integration tool, which allows you to easily load data from you on-premises servers to the cloud (and also the other way round). It comes with some handy templates to copy data fro various sources to any available destination. However, when the Extract-Transform-Load (ETL) or ELT steps get more complicated you will hit the (current) out-of-the-box limits of Azure Data Factory pretty soon. But this is OK as ADF is a very open platform and allows you to integrate so called “Custom Activities”. These can either be .Net/C# Activities or HDInsight Activities. In this post we will focus on .Net Activities and how to develop and debug them in an efficient way.

A .Net Activity is basically just a .dll which implements a specific Interface (IDotNetActivity)and is then executed by the Azure Data Factory. To be more precise here, the .dll (and all dependencies) are copied to an Azure Batch Node which then executes the code when the .Net Activity is scheduled by ADF. So far so good, but the tricky part is to actually develop the .Net code, test, and debug it. Well, not the code itself but the more or less complex integration with the ADF Interface which you are very likely not familiar with in the beginning. In such cases it usually helps to run the code locally, step into the different code paths and examine the C# objects and their values. The problem is that you do not have a local instance of ADF on your workstation which you could use the start the .Net Activity and debug it interactively in Visual Studio.
So I wrote my own tool which you can add to the Solution that already contains the code of your Custom .Net Activity. Then you can simply link the CustomActivityDebugger to the JSON definitions and configurations of your ADF project, reference your custom code, configure some other things like SliceStart/SliceEnd and you are ready to go.
Once you start the CustomActivityDebugger it will read all ADF files and settings and basically create a local ADF environment which helps you to debug your custom .Net Activity using all settings and parameters as they would be passed in when the code is executed on the Azure Batch Node.

This little picture shows the CustomActivityDebugger in action – debugging custom .Net activities is now like debugging any other code:
Debugger_in_Action

All the sources including a simple ADF Project, a simple Custom Activity and setup instructions are available on my GitHub site:

https://github.com/gbrueckl/Azure.DataFactory.CustomActivityDebugger
https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment

Feel free to use it as it is and/or extend it to your needs.

Upcoming Conferences – Fall and Winter 2016

The next months are going to be quite busy for me. There will be a lot conferences and events which I will attend and where I am also speaking. I am always happy to attend those events as it is a great place to interact with other community members and learn new stuff about the latest and coolest new tools. So lets get started:

The first conference is the SQL Saturday #546 in Oporto (Portugal) on the 1st of October where I will be speaking about “What’s new in Analysis Services 2016”.
PASS_SQL_Saturday_546_Oporto
It is the first time for me to speak in Porto but from what I heard from other people it is said to be a really great event so I am really looking forward to it!

The week after – on the 6th of October – I will be speaking at the “Virtuelle SQL Server 2016 Konferenz” (German only) where I will be doing a session on SQL Server Polybase. As the name implies, it is virtual and you can attend from wherever you are at the time. Another important thing is that it is for free! You just need to register and you’ll get two full days of sessions around SQL Server and the Microsoft Data Platform.

In the same week, the SQL Saturday #555 in Munich (Germany) is taking place.
PASS_SQL_Saturday_555_Munich
Another free event for all community members. Again, just make sure you register in time to get a ticket! I will do a more advanced session on Polybase there with a more technical focus. So even if you attended the virtual event before, this might still be of great value for you!

Later in October, the world’s biggest conference about the Microsoft Data Platform and all other related technologies – the PASS Summit – will be in Seattle again, attracting 4,000+ attendees every year.
PASS_Summit_2016
It’s probably a bit late but you can still register for this great event. Unfortunately, I forgot to submit some sessions for this conference so I won’t be there this year. (Note to myself: submit sessions for 2017!) Having been there already several times, I can only recommend this conference to anyone who is into Microsoft, it’s Data Platform and everything around it.

For January 2017, I am very happy to announce that our SQL Saturday #579 in Vienna (Austria) is taking place the 4th year in a row now.
PASS_SQL_Saturday_579_Vienna
We had 250+ attendees last year and we are definitely aiming to hit the 300 mark this year! Again, it is free and you just need to register as soon as possible. (In case you cannot make it later on, please unregister as early as possible as otherwise you take away the slot of someone else). The Call for Papers is still open so if you want to speak at the event, simply submit your session details there.

Last but not least, the SQL Server Konferenz 2017 will be back in Darmstadt as in the previous years.
PASS_SQL_Konferenz_2017
This means 3-4 full days of sessions and trainings around the Microsoft Data Platform. Overall an awesome event organized by PASS Germany. It is also the 5th time in a row this year and believe me, they have organized and optimized everything down to the tiniest detail making it one of the best and biggest conferences in whole Europe!
I already submitted some sessions and I am hoping to be selected to speak there.

As you can see, a lot of things are ahead but I am really looking forward to all of them!

I just received my first Microsoft MVP Award!

Almost exactly 10 years ago I started my career in the field of Data Warehousing and Business Intelligence using the Microsoft Data Platform. Every since I have been active in the community, sharing my experiences and engaged with other people in various forums.
I have to admit that most of the things that I know today I originally learned from my strong interaction with the community and by solving (or trying to solve) problems of other community members that I came across when browsing e.g. the MSDN forums. This exchange is always a win-win situation for both sides as you grow with every challenge you master and that’s what the community is all about – having someone to discuss your issues with and solve them together!

Therefore I am very proud that I got awarded with the Microsoft MVP Award for my work in the community which includes blogging, answering forum questions, speaking at conferences and supporting local Microsoft events – or, to say it in a more general way, my engagement in the community!

Microsoft_MVP_Logo_Horizontal_Preferred_Cyan300_CMYK_300ppi

Thanks to everyone who I ever worked with, letting me learn on their problems, inspiring me to start my blog and keep being active in the community over the last years! Thank You!

Score whole PowerBI DataSets dynamically in Azure ML

One of the most requested features when it comes to Azure ML is and has always been the integration into PowerBI. By now we are still lacking a native connector in PowerBI which would allow us to query a published Azure ML web service directly and score our datasets. Reason enough for me to dig into this issue and create some Power Query M scripts to do this. But lets first start off with the basics of Azure ML Web Services.

Every Azure ML project can be published as a Web Service with just a single click. Once its published, it can be used like any other Web Service. Usually we would send a record or a whole dataset to the Web Service, the Azure ML models does some scoring (or any other operation within Azure ML) and then sends the scored result back to the client. This is straight forward and Microsoft even supplies samples for the most common programming languages. The Web Service relies on a standardized REST API which can basically be called by any client. Yes, in our case this client will be PowerBI using Power Query.
Rui Quintino has already written an article on AzureML Web Service Scoring with Excel and Power Query and also Chris Webb wrote a more generic one on POST Request in Power Query in general Web Service and POST requests in Power Query. Even Microsoft recently published an article how you can use the R Integration of Power Query to call a Azure ML Web Service here.

Having tried these solutions, I have to admit that they have some major issues:
1) very static / hard coded
2) complex to write
3) operate on row-by-row basis and might run into the API Call Limits as discussed here.
4) need a local R installation

As Azure ML usually deal with tables, which are basically Power Query DataSets, a requirement would be to directly use a Power Query DataSet. The DataSet has to be converted dynamically into the required JSON structure to be POSTed to Azure ML. The returned result, usually a table again, should be converted back to a Power Query DataSet. And that’s what I did, I wrote a function that does all this for you. All information that you have to supply can be found in the configuration of your Azure ML Web Service:
– Request URI of your Web Service
– API Key
– the [Table to Score]

the [Table to Score] can be any Power Query table but of course has to have the very same structure (including column names and data types) as expected by the Web Service Input. Then you can simply call my function:
InvokeFunction_PowerBI_DataSet_in_AzureML
Score_PowerBI_DataSet_in_AzureML

The whole process involves a lot of JSON conversions and is kind of complex but as I encapsulated everything into M functions it should be quite easy to use by simply calling the CallAzureMLService-function.

However, here is a little description of the used functions:
ToAzureMLJson – converts any object that is passed in as an argument to a JSON element. If you pass in a table, it is converted to a JSON-array. Dates and Numbers are formatted correctly, etc. so the result can the be passed directly to Azure ML.

AzureMLJsonToTable – converts the returned JSON back to a Power Query Table. It obeys column names and also data types as defined in the Azure ML Web Service output. If the output changes (e.g. new columns are added) this will be taken care of dynamically!

CallAzureMLService – uses the two function from above to convert a table to JSON, POST the JSON to Azure ML and convert the result back to a Power Query Table.

Known Issues:
As the [Table to Score] will probably come from a SQL DB or somewhere else, you may run into issues with Privacy Levels/Settings and the Formula Firewall. In this case make sure to enable Fast Combine for your workbook as described here.

The maximum timeout of a Request/Response call to an Azure ML Web Service is 100 seconds. If your call exceeds this limit, you might get an error message returned.I ran a test and tried to score 60k rows (with 2 numeric columns) at once and it worked just fine, but I would assume that you can run into some Azure ML limits here very easily with bigger data sets. As far as I know, these 100 seconds are for the Azure ML itself only. If it takes several minutes to upload your dataset in the POST request, than this is not part of this 100 seconds. If you are still hitting this issue, you could further try to split your table into different batches, score them separately and combine the results again afterwards.

 

So these are the steps that you need to do in order to use your Azure ML Web Service together with PowerBI:
1) Create an Azure ML Experiment (or use an existing)
2) Publish the Experiment as a Web Service
3) note the URL and the API Key of your Web Service
4) run PowerBI and load the data that you want to score
5) make sure that the dataset created in 4) has the exact same structure as expected by Azure ML (column names, data types, …)
6) call the function “CallAzureMLWebService” with the parameters from 3) and 5)
7) wait for the Web Service to return the result set
8) load the final table into PowerBI (or do some further transformations before)

And that’s it!

Download:
You can find a PowerBI workbook which contains all the functions and code here: CallAzureMLWebService.pbix
I used a simple Web Service which takes 2 numeric columns (“Number1” and “Number2”) and returns the [Number1] * [Number2] and [Number1] / [Number2]

PS: you will not be able to run the sample as it is as I changed the API Key and also the URL of my original Azure ML Web Service