How-To: Migrating Databricks workspaces

Foreword:
The approach described in this blog post only uses the Databricks REST API and therefore should work with both, Azure Databricks and also Databricks on AWS!

It recently had to migrate an existing Databricks workspace to a new Azure subscription causing as little interruption as possible and not loosing any valuable content. So I thought a simple Move of the Azure resource would be the easiest thing to do in this case. Unfortunately it turns out that moving an Azure Databricks Service (=workspace) is not supported:

Resource move is not supported for resource types ‘Microsoft.Databricks/workspaces’. (Code: ResourceMoveNotSupported)

I do not know what is/was the problem here but I did not have time to investigate but instead needed to come up with a proper solution in time. So I had a look what needs to be done for a manual export. Basically there are 5 types of content within a Databricks workspace:

  • Workspace items (notebooks and folders)
  • Clusters
  • Jobs
  • Secrets
  • Security (users and groups)

For all of them an appropriate REST API is provided by Databricks to manage and also exports and imports. This was fantastic news for me as I knew I could use my existing PowerShell module DatabricksPS to do all the stuff without having to re-invent the wheel again.
So I basically extended the module and added new Import and Export functions which automatically process all the different content types:

  • Export-DatabricksEnvironment
  • Import-DatabricksEnvironment

They can be further parameterized to only import/export certain artifacts and how to deal with updates to already existing items. The actual output of the export looks like this and of course you can also modify it manually to your needs – all files are in JSON except for the notebooks which are exported as .DBC file by default:

A very simple sample code doing and export and an import into a different environment could look like this:

Having those scripts made the whole migration a very easy task.
In addition, these new cmdlets can also be used in your Continuous Integration/Continuous Delivery (CI/CD) pipelines in Azure DevOps or any other CI/CD tool!

So just download the latest version from the PowerShell gallery and give it a try!

10 Replies to “How-To: Migrating Databricks workspaces”

  1. Can I use your DatabricksPS module to add environmental variables to clusters? Also, can I restrict the export to just workspaces (i.e. no clusters or jobs) using the Export-DatabricksEnvironment?

  2. I have Owner role on my databricks workspace that I want to import and export, and while I try to export the whole workspace, I get the following errors

    MY Powershell version – 7.1.0, running on Windows 10 Enterprise

    Commands Run –
    PS C:\Windows\System32> Set-DatabricksEnvironment -AccessToken -ApiRootUrl “https://westeurope.azuredatabricks.net”
    PS C:\Windows\System32> Export-DatabricksEnvironment -LocalPath ‘C:\Databricks\Export’ -CleanLocalPath
    WARNING: This feature is EXPERIMENTAL and still UNDER DEVELOPMENT!
    WARNING: LIBRARY found at /Users//spark-xml_2.12-0.9.0 – Exporting Libraries is currently not supported!
    WARNING: It is not possible to extract secret values via the Databricks REST API.
    This export only exports the names of SecretScopes and their Secrets but not the values!
    WARNING: It is not possible to donwload the whole DBFS.
    This export will only download files from DBFS that already exist locally and overwrite them!
    Export-DatabricksEnvironment: Local DBFS path C:\Databricks\Export\DBFS does not exist so the DBFS export cannot work properly!

    How do I import all the artefacts, for a complete workspace migration?

    • Correction: I noticed, the content put in angle brackets are missed out. So reposting a warning message regarding libraries again, from above.


      WARNING: LIBRARY found at /Users/another-user/spark-xml_2.12-0.9.0 – Exporting Libraries is currently not supported!

        • How do I resolve the below Warning?

          WARNING: It is not possible to donwload the whole DBFS.
          This export will only download files from DBFS that already exist locally and overwrite them!
          Export-DatabricksEnvironment: Local DBFS path C:\Databricks\Export\DBFS does not exist so the DBFS export cannot work properly!

          Right now I’m performing this from my local desktop.

          Also, a small correction to your logging module, the spelling for download is incorrect ?

          • You can use the parameter “-Artifacts” and provide e.g. Only Workspace, Clusters and Jobs as array so DBFS and Secrets would not be exported and you also would not get a warning

Leave a Reply