Using Parameters and hidden Properties in Azure Data Factory v2

Azure Data Factory v2 is Microsoft Azure’s Platform as a Service (PaaS) solution to schedule and orchestrate data processing jobs in the cloud. As the name implies, this is already the second version of this kind of service and a lot has changed since its predecessor. One of these things is how datasets and pipelines are parameterized and how these parameters are passed between the different objects. The basic concepts behind this process are well explained by the MSDN documentation – for example Create a trigger that runs a pipeline on a schedule. In this example an trigger is created that runs a pipeline every 15 minute and passes the property “scheduledTime” of the trigger to the pipeline. This is the JSON expression that is used:

"parameters": {
  "scheduledRunTime": "@trigger().scheduledTime"
}

@trigger() basically references the object that is returned by the trigger and it seems that this object has a property called “scheduledTime”. So far so good, this is documented and fulfills the basic needs. Some of these properties are also documented here: System variables supported by Azure Data Factory but unfortunately not all of them.

So sometimes this trigger objects can be much more complex and also contain additional information that may not be documented. This makes it pretty hard for the developer to actually know which properties exist and how they could be used. A good example are Event-Based Triggers which were just recently introduced where the documentation only mentions the properties “fileName” and “folderPath” but it contains much more (details see further down). For simplicity I will stick to scheduled triggers at this point but the very same concept applies to all kinds of triggers and actually also to all other internal objects like @pipeline(), @dataset() or @activity() as well!

So how can you investigate those internal objects like @trigger() and see what they actually look like? Well, the answer is quite simple – just pass the object itself without any property to the pipeline. The target parameter of the pipeline can either be of type String or Object.
ADFv2_Set_Pipeline_Parameter_from_Trigger
This allows you to see the whole object on the Monitoring-page once the pipeline is triggered:
ADFv2_Monitor_Parameter_Value

For the Scheduled-trigger, the object looks like this:

@trigger() – Schedule-Trigger
{
  "name": "Trigger_12348CAF-BE66-42CF-83DA-E3028693F304",
  "startTime": "2018-09-25T18:00:22.4180978Z",
  "endTime": "2018-09-25T18:00:22.4180978Z",
  "scheduledTime": "2018-09-25T18:00:22.507Z",
  "trackingId": "1234a112-7bb9-4ba6-b032-6189d6dd8b73",
  "clientTrackingId": "12346637084630521889360938860CU33",
  "code": "OK",
  "status": "Succeeded"
}

And as you can guess, you can pass any of these properties to the pipeline using the syntax
“@trigger().<property_name>” or even the whole object! The syntax can of course also be combined with all the built-in expressions.

This should hopefully make it easier for you to build and debug more complex Azure Data Factory v2 pipelines!

Below you can find an example of the object that a Event-Based Trigger creates:

@trigger() – Event-Trigger
{
  "name": "Trigger_12348CAF-BE66-42CF-83DA-E3028693F304",
  "outputs": {
    "headers": {
      "Host": "prod-1234.westeurope.logic.azure.com",
      "x-ms-client-tracking-id": "1234c153-fc96-4b8e-9002-0f5096bcd744",
      "Content-Length": "52",
      "Content-Type": "application/json; charset=utf-8"
    },
    "body": {
      "folderPath": "data",
      "fileName": "myFile.csv"
    }
  },
  "startTime": "2018-09-25T18:22:54.8383112Z",
  "endTime": "2018-09-25T18:22:54.8383112Z",
  "trackingId": "07b3d1a1-8735-4ff0-9cc6-c83d95046101",
  "clientTrackingId": "56dcc153-fc96-4b8e-9002-0f5096bcd744",
  "status": "Succeeded"
}

Note that right now, it does not say whether the trigger fired because the file was created, updated or deleted! But I hope this will be fixed by the product team in the near future.

Refresh PowerBI Datasets using PowerShell and Azure Runbooks

In June 2017, Microsoft announced a new set of API function to manage data refreshes in PowerBI. The new API basically allows you to trigger a refresh or retrieve the history of previously executed refreshes. The full specification can be found in the official MSDN documentation, or using this direct links: Refresh dataset and Get dataset refresh history

So besides the scheduled and manual refreshes from within the PowerBI service directly, we now have a third option to trigger refreshes but this time also from an external caller! This itself is already pretty awesome and some people already did some cool stuff leveraging the new API functions:

Charles Sterling: Running the Power BI Refresh API’s Headless
Sirui Sun: Git-Repository powerbi-powershell

The basic idea is to use object from pre-built Azure Management DLLs to generate the OAuth Access token that is necessary to use the API. This works very well locally but cannot be used in the cloud – e.g. in combination with Azure Automation Runbooks or Azure Functions where you cannot install or reference any custom DLLs.

In this blog post I will show you how you can accomplish exactly this  – create an Azure Automation Runbook to refresh your PowerBI dataset!
But first of all there are some things that you need to keep in mind:

  1. There are no service accounts in PowerBI so we will always use a “real” user
  2. you need to supply the credentials of a “real” user
  3. The user needs to have appropriate access to the dataset in order to refresh it
  4. the dataset refresh must succeed if you do it manually in PowerBI
  5. you are still limited to 8 refreshes/day through the API

OK, so lets get started. First of all we need an Azure Application which has permissions in PowerBI. The easiest way to do this is to use the navigate to https://dev.powerbi.com/apps, log in with your account and simply follow the steps on the screen. The only import thing is to select the App Type “Native app”. At the end, you will receive a ClientID and a ClientSecret – Please remember the ClientID for later use!

Next step is to create the Azure Runbook. There are plenty of tutorials out there on how to do this: My first PowerShell workflow runbook or Creating or importing a runbook in Azure Automation so I will no go into much more detail here. Besides the runbook itself you also need to create an Automation Credential to store the username and password in a secure way – here is a tutorial for this: Credential Assets in Azure Automation

Now lets take a look at the PowerShell code. Instead of using any pre-built DLLs I removed all unnecessary code and do all the communication using Invoke-RestMethod. This is a very low-level function and is part of the standard PowerShell modules so there is no need to install anything! The tricky part is to acquire an Authentication Token using username/password as it is nowhere documented (at least I could not find it) what the REST call has to look like. So I used Fiddler to track the REST calls that the pre-built DLLs use and rebuilt them using Invoke-RestMethod. This is what I came up with:

Get Authentication Token
$authUrl = "https://login.windows.net/common/oauth2/token/"
$body = @{
    "resource" =https://analysis.windows.net/powerbi/api";
    "client_id" = $clientId;
    "grant_type" = "password";
    "username" = $pbiUsername;
    "password" = $pbiPassword;
    "scope" = "openid"
}

$authResponse = Invoke-RestMethod -Uri $authUrlMethod POST -Body $body

$clientId is the ClientID of the Azure AD Application
$pbiUsername is the email address of the PowerBI user.
$pbiPassword is the password of the PowerBI user.
The $authRepsonse then contains our Authentication token which we can use to make our subsequent calls:

Trigger Refresh in PowerBI
$restURL = "https://api.powerbi.com/v1.0/myorg/datasets/$pbiDatasetId/refreshes"
$headers = @{
    "Content-Type" = "application/json";
    "Authorization" = $authResponse.token_type + " " + $authResponse.access_token
}

$restResponse = Invoke-RestMethod -Uri $restURLMethod POST -Headers $headers

And that’s all you need. I wrapped everything into a PowerShell function that can be used as an Azure Runbook. The username/password is derived from an Azure Automation Credential.

The final runbook can be found here: PowerBI_Refresh_Runbook.ps1

Refresh_PowerBI_Dataset_Azure_Runbook

It takes 4 Parameters:

  1. CredentialName – the name of the Azure Automation credential that you created and which stores the PowerBI username and password
  2. ClientID – the ID of your Azure Active Directory Application which you created in the first step
  3. PBIDatasetName – the name of the PowerBI dataset that you want to refresh
  4. PBIGroupName – (optional) the name of the group/workspace in which the PowerBI dataset from 3) resides

When everything is working as expected, you can create custom schedules or even create webhooks to trigger the script and refresh you PowerBI dataset! As you probably know, this is really powerful as you can now make the refresh of the PowerBI dataset part of your daily ETL job!