DatabricksPS and Azure AD Authentication

Avaiilable via PowerShell Gallery: DatabricksPS

Databricks recently announced that it is now also supporting Azure Active Directory Authentication for the REST API which is now in public preview. This may not sound super exciting but is actually a very important feature when it comes to Continuous Integration/Continuous Delivery pipelines in Azure DevOps or any other CI/CD tool. Previously, whenever you wanted to deploy content to a new Databricks workspace, you first needed to manually create a user-bound API access token. As you can imagine, manual steps are also bad for otherwise automated processes like a CI/CD pipeline. With Databricks REST API finally supporting Azure Active Directory Authentication of regular users and service principals, this last manual step is finally also gone!

As I had this issue at many of my customers where we had already fully automated the deployment of our data platform based on Azure and Databricks, I also wanted to use this new feature there. The deployment of regular Databricks objects (clusters, notebooks, jobs, …) was already implemented in the CI/CD pipeline using my PowerShell module DatabricksPS and of course I did not want to rewrite any of those steps. So, I simply extend the module’s authentication methods to also support Azure Active Directory Authentication. The only thing that actually changed was the call to Set-DatabricksEnvironment which now supports additional parameter sets and parameters:

The first thing you will realize is that it is now necessary to specify the Databricks Workspace explicitly either using SubscriptionID/ResourceGroupName/WorkspaceName to uniquely identify the Databricks workspace within Azure or using the OrganizationID that you see displayed in the URL of your Databricks Workspace. For the actual authentication the parameters -ClientID, -TenantID, -Credential and the switch -ServicePrincipal are used.

Regardless of whether you use regular username/password authentication with an AAD user or an AAD service principal, the first thing you need to do in both cases is to create an AAD Application as described in the official docs from Databricks:
Using Azure Active Directory Authentication Library
Using a service principal

Once you have ensured all prerequisites exist, you can use the samples below to authenticate with your AAD username/password with DatabricksPS:

Here is another sample using a regular service principal authentication and the OrganizationID with DatabricksPS:

As you can see, once the environment is set up using the new authentication methods, the rest of the script stays the same and there is not much more you need to do fully automate your CI/CD pipeline with DatabricksPS!

I have not yet fully tested all cmdlets of the module so if you experience any issues, please contact me or open a ticket in the GIT repository.

How-To: Migrating Databricks workspaces

Foreword:
The approach described in this blog post only uses the Databricks REST API and therefore should work with both, Azure Databricks and also Databricks on AWS!

It recently had to migrate an existing Databricks workspace to a new Azure subscription causing as little interruption as possible and not loosing any valuable content. So I thought a simple Move of the Azure resource would be the easiest thing to do in this case. Unfortunately it turns out that moving an Azure Databricks Service (=workspace) is not supported:

Resource move is not supported for resource types ‘Microsoft.Databricks/workspaces’. (Code: ResourceMoveNotSupported)

I do not know what is/was the problem here but I did not have time to investigate but instead needed to come up with a proper solution in time. So I had a look what needs to be done for a manual export. Basically there are 5 types of content within a Databricks workspace:

  • Workspace items (notebooks and folders)
  • Clusters
  • Jobs
  • Secrets
  • Security (users and groups)

For all of them an appropriate REST API is provided by Databricks to manage and also exports and imports. This was fantastic news for me as I knew I could use my existing PowerShell module DatabricksPS to do all the stuff without having to re-invent the wheel again.
So I basically extended the module and added new Import and Export functions which automatically process all the different content types:

  • Export-DatabricksEnvironment
  • Import-DatabricksEnvironment

They can be further parameterized to only import/export certain artifacts and how to deal with updates to already existing items. The actual output of the export looks like this and of course you can also modify it manually to your needs – all files are in JSON except for the notebooks which are exported as .DBC file by default:

A very simple sample code doing and export and an import into a different environment could look like this:

Having those scripts made the whole migration a very easy task.
In addition, these new cmdlets can also be used in your Continuous Integration/Continuous Delivery (CI/CD) pipelines in Azure DevOps or any other CI/CD tool!

So just download the latest version from the PowerShell gallery and give it a try!