Debugging Custom .Net Activities in Azure Data Factory


UPDATE 2017-02-22:
I released a new toolset for Azure Data Factor which also integrates the Customer .Net Activity Debugger from this blob post. Please refer to the new GitHub project: https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment

(all links have been changed to refer to the new repository!)


Azure Data Factory (ADF) is one of the newer tools of the whole Microsoft Data Platform on Azure. It is Microsoft’s Data Integration tool, which allows you to easily load data from you on-premises servers to the cloud (and also the other way round). It comes with some handy templates to copy data fro various sources to any available destination. However, when the Extract-Transform-Load (ETL) or ELT steps get more complicated you will hit the (current) out-of-the-box limits of Azure Data Factory pretty soon. But this is OK as ADF is a very open platform and allows you to integrate so called “Custom Activities”. These can either be .Net/C# Activities or HDInsight Activities. In this post we will focus on .Net Activities and how to develop and debug them in an efficient way.

A .Net Activity is basically just a .dll which implements a specific Interface (IDotNetActivity)and is then executed by the Azure Data Factory. To be more precise here, the .dll (and all dependencies) are copied to an Azure Batch Node which then executes the code when the .Net Activity is scheduled by ADF. So far so good, but the tricky part is to actually develop the .Net code, test, and debug it. Well, not the code itself but the more or less complex integration with the ADF Interface which you are very likely not familiar with in the beginning. In such cases it usually helps to run the code locally, step into the different code paths and examine the C# objects and their values. The problem is that you do not have a local instance of ADF on your workstation which you could use the start the .Net Activity and debug it interactively in Visual Studio.
So I wrote my own tool which you can add to the Solution that already contains the code of your Custom .Net Activity. Then you can simply link the CustomActivityDebugger to the JSON definitions and configurations of your ADF project, reference your custom code, configure some other things like SliceStart/SliceEnd and you are ready to go.
Once you start the CustomActivityDebugger it will read all ADF files and settings and basically create a local ADF environment which helps you to debug your custom .Net Activity using all settings and parameters as they would be passed in when the code is executed on the Azure Batch Node.

This little picture shows the CustomActivityDebugger in action – debugging custom .Net activities is now like debugging any other code:
Debugger_in_Action

All the sources including a simple ADF Project, a simple Custom Activity and setup instructions are available on my GitHub site:

https://github.com/gbrueckl/Azure.DataFactory.CustomActivityDebugger
https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment

Feel free to use it as it is and/or extend it to your needs.

18 Replies to “Debugging Custom .Net Activities in Azure Data Factory”

  1. Saved me another day at not knowing what was happening. This should be part of the data factory project templates.

  2. Hi,

    Indeed, your article and code helped me, Thanks a lot.

    But i am getting error “No valid combination of account information found” in following code

    CloudStorageAccount.Parse(linkedService.ConnectionString)

    under method GetCloudStorageAccount().

    I googled it, it listed many results, i tried all i guess, but without success.
    option i tried are:
    1. UseDevelopmentStorage=true
    2. Checked case sensitivity.

    Did you faced such error? Thanks in advance.

    • Hi Ravi,
      This looks like you have a general issue in your code. Make sure your linked service uses a proper connection string and you don’t have any typos in there.
      Are you using SAS or a regular connection string using storage account name and key?

  3. Hi Gerhard,

    Very useful tool – thanks for sharing. When I run Program.cs I am getting an error that the .ZIP containing custom dll is not contained in my ADF project (pasted below). Do I need to package all the artefacts together in the same ADF project for this to work? I’m not a VS expert but I just want to check if that’s a pre-requisite before I re-configure my projects. Currently I have in my VS solution:
    1 project for ADF pipelines etc
    1 project for Custom C# activity (from which I get dll build)
    1 project for your custom activity debugger

    Severity Code Description Project File Line Suppression State
    Error Reference ADFDataDownloader.zip was not found in the solution MaaSBI4MADF C:\Users\\Documents\Visual Studio 2015\Projects\MaaSBI4MADF\MaaSBI4MADF\P001b-CSVDataDownloader.json 12

    Thanks,
    Claire

    • Hi Claire,
      This is basically a precondition that has to met.
      In your activity you reference the binaries of your custom code at some point. This is usually a. ZIP file which contains the whole build output of your custom activity. This file must be present in the dependencies folder of your ADF project in order to also be deployed later on with your ADF deployment.
      I usually use a post-build script to zip and copy the build output to that folder
      -gerhard

    • thats the code that I usually use in my Post-Build events:
      “C:\Program Files\7-Zip\7z.exe” a “$(TargetDir)$(ProjectName).zip” “$(TargetDir)*”
      copy /Y “$(TargetDir)$(ProjectName).zip” “$(SolutionDir)MyADFProject\Dependencies\”

      Note: you need to have 7Zip installed for this to work

      hope that helps,
      -gerhard

          • Hi Gerhard,

            I still keep getting the zip file not found in the solution error after saving the zip file in the dependencies folder. Also I tried to add references through the solution and it gave me the operation is not valid due to its current state error. Can you help on this.

            • make sure to also check the output in the console – it should contain all relevant information about the ZIP-file and its path. Here is a sample output:

              The following Dependencies/References have been found:
              ‘MyOtherExternalReference.zip’ from path ‘D:\Work\SourceControl\GitHub\Azure.DataFactory.LocalEnvironment\MyADFProject\Dependencies\MyOtherExternalReference.zip’
              ‘MyCustomActivity.zip’ from path ‘D:\Work\SourceControl\GitHub\Azure.DataFactory.LocalEnvironment\MyADFProject\bin\Debug\Dependencies\MyCustomActivity.zip’
              Debugging Custom Activity ‘DownloadData’ from Pipeline ‘DataDownloaderSamplePipeline’ …
              The Code from the last build of the ADF project will be used (D:\Work\SourceControl\GitHub\Azure.DataFactory.LocalEnvironment\MyADFProject\Dependencies). Make sure to rebuild the ADF project if it does not reflect your latest changes!
              The Custom Activity refers to the following ZIP-file: ‘adfcontainer/package/MyCustomActivity.zip’
              Using ‘MyCustomActivity.dll’ from ZIP-file ‘D:\Work\SourceControl\GitHub\Azure.DataFactory.LocalEnvironment\MyADFProject\bin\Debug\Dependencies\MyCustomActivity.zip’!

              kindly let me know if you need any more information!

              regards,
              -gerhard

          • Tried every permutation combination but cant get this to work.

            Pipeline code.

            type”: “DotNetActivity”,
            “typeProperties”: {
            “assemblyName”: “RefreshFailureCustomAssembly.dll”,
            “entryPoint”: “RefreshFailureCustomAssembly.MasterClass”,
            “packageLinkedService”: “AzureStorageLinkedService”,
            “packageFile”: “customassembly/RefreshFailureCustomAssembly.zip”,

            OUTPUT

            Validate references: Failed

            this happens after all the json’s have been validated.

            ERROR

            RefreshFailureCustomAssembly.zip was not found in the solution.

            I have added the RefreshFailureCustomAssembly.dll in the references and also added RefreshFailureCustomAssembly.zip in the dependencies folder.

            • there are two ways how you can reference a custom assembly.
              Either put the .zip in the Dependencies Folder or use the References-Folder in your ADF Project and use “Add Reference” pointing to a project within the same VS solution. The sample provided on GitHub shows both approaches.
              The console Output shows you which references have been found in the ADF project (both types from above, the white text in the output). When you execute/debug a custom activity, the yellow output shows you from where the tool tries to load the .zip which should give you a good indicator of what is wrong with your location or why it cannot find your zip-file
              SampleOutput

          • My solution follows the same steps as mentioned in the github project and i still get the error. Will give it some thought over the weekend and respond back if Im not able to resolve this. Thanks for all the help.

          • Hi Gerhard, I got this thing to work. I had not added the zip in the Dependency folder through VS but directly in the folder.

            • great to hear that its working now!
              the ADFLocalEnvironment reads all the files based on the VS project. This applies also to all JSON files – they need to be part of the VS project in order to be processed

              -gerhard

Leave a Reply

Your email address will not be published. Required fields are marked *

*