Deploying an Azure Data Factory project as ARM Template

In my last post I wrote about how to Debug Custom .Net Activities in Azure Data Factory locally. This fixes one of the biggest issues in Azure Data Factory at the moment for developers. The next bigger problem that you will run into is when it comes to deploying your Azure Data Factory project. At the moment, you can only do it manually from Visual Studio which, for bigger projects, can take quite some time. So I extended and advanced the code from my CustomActivityDebugger. Well, actually I rewrote some major parts of it and moved it into a new GitHub repository: Azure.DataFactory.LocalEnvironment

The new code base now includes the functionality to export an existing ADF project to an ARM template which can then be deployed very easily using Azure standard deployment mechanisms.

So basically, these are the changes and new Features that I made:

Export as ARM template:

  • Export all ADF objects and properties
  • Support for configurations
  • obey dependencies between ADF objects
  • parameterized Data Factory name
  • automatic upload of ADF dependencies (e.g. custom activities)
  • specify the region where ADF should be deployed (ADF is not available in all regions yet!)

ADF_LocalEnvironment_DeployARMTemplate

Custom Activity Debugger:

  • simplified usability – just select the pipeline, activity and set the slice-dates
  • Support for configurations
  • no need to add any namespaces
  • no need to add any references
  • write activity log to console output

ADF_LocalEnvironment_DebugActivity_Breakpoing

General:

  • Load from the ADF Project file (.dfproj) instead of a whole folder
  • implemented as Assembly
  • can be used in a Console Application for automation
  • will be published via NuGet in the future! (coming soon)

 

Everything else is described in the Git-Repository itself!

Hope you enjoy it!

Monitoring Azure Data Factory using PowerBI

Some time ago Microsoft released the first preview of a tool which allows you to monitor and control your Azure Data Factory (ADF). It is a web UI where you can select your Pipeline, Activity or Dataset and check its execution state over time. However, from my very personal point of view the UI could be much better, especially much clearer(!) as it is at the moment. But that’s not really a problem as the thing I like the most about ADF is that its quite open for developers (for example Custom C#/.Net Activities) and it also offers a quite comprehensive REST API to control an manage it.
For our monitoring purposes we are mainly interested in the LIST interface but we could do basically every operation using this API. In my example I only used the Dataset API, the Slices API and the Pipeline API.

First we start with the Dataset API to get a list of all data sets in our Data Factory. This is quite simple as we just need to build our URL of the REST web service like this:

  1. https://management.azure.com/subscriptions/{SubscriptionID}/resourcegroups/{ResourceGroupName}/providers/Microsoft.DataFactory/datafactories/{DataFactoryName}/datasets?api-version={api-version}

You can get all of this information for the Azure Portal by simply navigating to your Data Factory and checking the URL which will be similar to this one:

  1. https://portal.azure.com/#resource/subscriptions/1234567832324a04a0a66e44bf2f5d11/resourceGroups/myResourceGroup/providers/Microsoft.DataFactory/dataFactories/myDataFactory

So this would be my values for the API Call:
– {SubscriptionID} would be “12345678-3232-4a04-a0a6-6e44bf2f5d11”
– {ResourceGroupName} would be “myResourceGroup”
– {DataFactoryName} would be “myDataFactory”
– {api-version} would be a fixed value of “2015-10-01”

Once you have your URL you can use PowerBI to query the API using Get Data –> From Web
Next you need to authenticate using your Personal or Organizational Account – the same that you use to sign in to the Portal – and also the level for which you want to use the credentials. I’d recommend you to set it either to the subscription level or to the data-factory itself, depending on your security requirements. This ensures that you are not asked for credentials for each different API:
ADF_PowerBI_Authentication

This works in a very similar way also for the Slices API, the Pipeline API and all other APIs available! The other transformations I used are regular PowerQuery/M steps done via the UI so I am not going to describe them in more detail here. Also, setting up the relationships in our final PowerPivot model should be straight forward.

Now that we have all the required data in place, we can start with our report. I used some custom visuals for the calendar view, some slicers and a simple table to show the details. I also used a Sankey Chart to visualize the dependencies between the datasets.

ADF_PowerBI_Monitoring_Dashboard
ADF_PowerBI_Monitoring_Dependencies

Compared to the standard GUI for monitoring this provides a much better overview of slices and their current states and it also allows easy filtering. I am sure there are a lot of other PowerBI visualizations which would make a lot of sense here, these are just to give you an idea how it could look like, but of course you have all the freedom PowerBI offers you for reporting!

The only drawback at the moment is that you cannot reschedule/reset slices from PowerBI but for my monitoring-use-case this was not a problem at all. Also, I did not include the SliceRun API in my report as this would increase the size of the data model a lot, so detailed log information is not available in my sample report.

The whole PowerBI template is available for download on my GitHub site: https://github.com/gbrueckl/Azure.DataFactory.PowerBIMonitor

Debugging Custom .Net Activities in Azure Data Factory


UPDATE 2017-02-22:
I released a new toolset for Azure Data Factor which also integrates the Customer .Net Activity Debugger from this blob post. Please refer to the new GitHub project: https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment

(all links have been changed to refer to the new repository!)


Azure Data Factory (ADF) is one of the newer tools of the whole Microsoft Data Platform on Azure. It is Microsoft’s Data Integration tool, which allows you to easily load data from you on-premises servers to the cloud (and also the other way round). It comes with some handy templates to copy data fro various sources to any available destination. However, when the Extract-Transform-Load (ETL) or ELT steps get more complicated you will hit the (current) out-of-the-box limits of Azure Data Factory pretty soon. But this is OK as ADF is a very open platform and allows you to integrate so called “Custom Activities”. These can either be .Net/C# Activities or HDInsight Activities. In this post we will focus on .Net Activities and how to develop and debug them in an efficient way.

A .Net Activity is basically just a .dll which implements a specific Interface (IDotNetActivity)and is then executed by the Azure Data Factory. To be more precise here, the .dll (and all dependencies) are copied to an Azure Batch Node which then executes the code when the .Net Activity is scheduled by ADF. So far so good, but the tricky part is to actually develop the .Net code, test, and debug it. Well, not the code itself but the more or less complex integration with the ADF Interface which you are very likely not familiar with in the beginning. In such cases it usually helps to run the code locally, step into the different code paths and examine the C# objects and their values. The problem is that you do not have a local instance of ADF on your workstation which you could use the start the .Net Activity and debug it interactively in Visual Studio.
So I wrote my own tool which you can add to the Solution that already contains the code of your Custom .Net Activity. Then you can simply link the CustomActivityDebugger to the JSON definitions and configurations of your ADF project, reference your custom code, configure some other things like SliceStart/SliceEnd and you are ready to go.
Once you start the CustomActivityDebugger it will read all ADF files and settings and basically create a local ADF environment which helps you to debug your custom .Net Activity using all settings and parameters as they would be passed in when the code is executed on the Azure Batch Node.

This little picture shows the CustomActivityDebugger in action – debugging custom .Net activities is now like debugging any other code:
Debugger_in_Action

All the sources including a simple ADF Project, a simple Custom Activity and setup instructions are available on my GitHub site:

https://github.com/gbrueckl/Azure.DataFactory.CustomActivityDebugger
https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment

Feel free to use it as it is and/or extend it to your needs.