How-To: Migrating Databricks workspaces

Posted on 2019-10-02 by Gerhard Brueckl — 56 Comments ↓

Foreword:
The approach described in this blog post only uses the Databricks REST API and therefore should work with both, Azure Databricks and also Databricks on AWS!

It recently had to migrate an existing Databricks workspace to a new Azure subscription causing as little interruption as possible and not loosing any valuable content. So I thought a simple Move of the Azure resource would be the easiest thing to do in this case. Unfortunately it turns out that moving an Azure Databricks Service (=workspace) is not supported:

Resource move is not supported for resource types ‘Microsoft.Databricks/workspaces’. (Code: ResourceMoveNotSupported)

I do not know what is/was the problem here but I did not have time to investigate but instead needed to come up with a proper solution in time. So I had a look what needs to be done for a manual export. Basically there are 5 types of content within a Databricks workspace:

Workspace items (notebooks and folders)
Clusters
Jobs
Secrets
Security (users and groups)

For all of them an appropriate REST API is provided by Databricks to manage and also exports and imports. This was fantastic news for me as I knew I could use my existing PowerShell module DatabricksPS to do all the stuff without having to re-invent the wheel again.
So I basically extended the module and added new Import and Export functions which automatically process all the different content types:

Export-DatabricksEnvironment
Import-DatabricksEnvironment

They can be further parameterized to only import/export certain artifacts and how to deal with updates to already existing items. The actual output of the export looks like this and of course you can also modify it manually to your needs – all files are in JSON except for the notebooks which are exported as .DBC file by default:

A very simple sample code doing and export and an import into a different environment could look like this:

Set-DatabricksEnvironment -AccessToken $accessTokenExport -ApiRootUrl "https://westeurope.azuredatabricks.net"
Export-DatabricksEnvironment -LocalPath 'D:\Desktop\MyExport' -CleanLocalPath

Set-DatabricksEnvironment -AccessToken $accessTokenImpport -ApiRootUrl "https://westeurope.azuredatabricks.net"
Import-DatabricksEnvironment -LocalPath 'D:\Desktop\MyExport'

Having those scripts made the whole migration a very easy task.
In addition, these new cmdlets can also be used in your Continuous Integration/Continuous Delivery (CI/CD) pipelines in Azure DevOps or any other CI/CD tool!

So just download the latest version from the PowerShell gallery and give it a try!

56 Replies to “How-To: Migrating Databricks workspaces”

Brian Custer on 2020-08-24 at 18:59 said:

Can I use your DatabricksPS module to add environmental variables to clusters? Also, can I restrict the export to just workspaces (i.e. no clusters or jobs) using the Export-DatabricksEnvironment?

Reply ↓
- Gerhard Brueckl on 2020-08-24 at 19:29 said:
  
  You can specify parameter “-Artifacts” to only export specific objects
  What do you mean by environment variables? Databricks Secrets?
  
  Reply ↓
  - Brian Custer on 2020-08-25 at 03:56 said:
    
    No, I mean environmental variables to clusters.
    
    Reply ↓
    - Gerhard Brueckl on 2020-08-25 at 06:39 said:
      
      You can specify them as part of your clusters definition
      https://docs.databricks.com/dev-tools/api/latest/clusters.html#sparkenvpair
      
      Reply ↓
      - Brian Custer on 2020-08-25 at 14:04 said:
        
        Thanks. I’ll check it out.
        
        Reply ↓
AJ on 2020-08-27 at 05:18 said:

I have Owner role on my databricks workspace that I want to import and export, and while I try to export the whole workspace, I get the following errors

MY Powershell version – 7.1.0, running on Windows 10 Enterprise

Commands Run –
PS C:\Windows\System32> Set-DatabricksEnvironment -AccessToken -ApiRootUrl “https://westeurope.azuredatabricks.net”
PS C:\Windows\System32> Export-DatabricksEnvironment -LocalPath ‘C:\Databricks\Export’ -CleanLocalPath
WARNING: This feature is EXPERIMENTAL and still UNDER DEVELOPMENT!
WARNING: LIBRARY found at /Users//spark-xml_2.12-0.9.0 – Exporting Libraries is currently not supported!
WARNING: It is not possible to extract secret values via the Databricks REST API.
This export only exports the names of SecretScopes and their Secrets but not the values!
WARNING: It is not possible to donwload the whole DBFS.
This export will only download files from DBFS that already exist locally and overwrite them!
Export-DatabricksEnvironment: Local DBFS path C:\Databricks\Export\DBFS does not exist so the DBFS export cannot work properly!

How do I import all the artefacts, for a complete workspace migration?

Reply ↓
- AJ on 2020-08-27 at 05:50 said:
  
  Correction: I noticed, the content put in angle brackets are missed out. So reposting a warning message regarding libraries again, from above.
  
  …
  WARNING: LIBRARY found at /Users/another-user/spark-xml_2.12-0.9.0 – Exporting Libraries is currently not supported!
  …
  
  Reply ↓
  - Gerhard Brueckl on 2020-08-27 at 07:41 said:
    
    Currently the REST API does not support creation of workspace libraries.
    The only workaround is to add those libraries to the clusters directly
    
    Reply ↓
    - AJ on 2020-08-27 at 13:10 said:
      
      How do I resolve the below Warning?
      
      WARNING: It is not possible to donwload the whole DBFS.
      This export will only download files from DBFS that already exist locally and overwrite them!
      Export-DatabricksEnvironment: Local DBFS path C:\Databricks\Export\DBFS does not exist so the DBFS export cannot work properly!
      
      Right now I’m performing this from my local desktop.
      
      Also, a small correction to your logging module, the spelling for download is incorrect ?
      
      Reply ↓
      - Gerhard Brueckl on 2020-08-27 at 13:14 said:
        
        You can use the parameter “-Artifacts” and provide e.g. Only Workspace, Clusters and Jobs as array so DBFS and Secrets would not be exported and you also would not get a warning
        
        Reply ↓
SamJ on 2021-03-23 at 08:34 said:

Can I export the notebooks as a python / SQL file?
My plan is to commit the code to Azure repo and sync it with azure pipeline

Reply ↓
- Gerhard Brueckl on 2021-03-23 at 09:02 said:
  yes, you can definitely do that
  take a look at the
```
Export-DatabricksEnvironment
```
  and
```
Import-DatabricksEnvironment
```
  cmdlet described in the blog post.
  they allow you to export and import various items like the workspace (which includes the notebooks) via PowerShell which can then be triggered via Azure DevOps for example
  
  regards
  -gerhard
  Reply ↓
rafael veloso on 2021-03-24 at 19:19 said:

How to migrate my DBFS files?

Reply ↓
- Gerhard Brueckl on 2021-03-24 at 21:47 said:
  the
```
Import-DatabricksEnvironment
```
  cmdlet can also be used to import DBFS items
  for this to work you need to place the files you want to upload in a subfolder called “DBFS” under your -LocalPath parameter
  
  Assuming you have your files under c:ContentDBFS you can use the following code snippet to upload them to DBFS:
```
Import-DatabricksEnvironment -Artifacts "DBFS" -LocalPath "c:Content" 
```
  Reply ↓
  - rafael veloso on 2021-03-24 at 23:03 said:
    
    Sounds good Mate, I will try, thanks!
    
    Reply ↓
    - Gerhard Brueckl on 2021-03-25 at 13:58 said:
      
      the
      
      Export-DatabricksEnvironment
      
      works slightly different as we obviously cannot just download the whole DBFS it will just download the files that already exist in the local folder
      
      Reply ↓
  - Aniket Kumar on 2022-08-05 at 11:24 said:
    
    @gerhard Brueckl How to export the DBFS items to my local system?
    
    Reply ↓
    - Gerhard Brueckl on 2022-08-05 at 11:40 said:
      
      currently the export Export-DatabricksEnvironment takes the local files in your output-DBFS folder and downloads them again
      so if you have \DBFS\myfile.txt it will download myfile.txt from the root of your DBFS when you run Export-DatabricksEnvironment
      so basically the locally existing files control what is downloaded
      you can also create an empty local file to force a download
      
      This is by design as it does not make sense to download the whole DBFS
      
      Reply ↓
      - Aniket Kumar on 2022-08-05 at 11:55 said:
        
        I have some DBFS files locally. I ran this command but no output appears.
        
        Reply ↓
      - Aniket Kumar on 2022-08-05 at 13:53 said:
        
        @gerhard brueckl I have some DBFS file locally, but after running this command no output related to DBFS appears
        
        Reply ↓
        
        Gerhard Brueckl on 2022-08-05 at 13:58 said:
        
        whats the actual PS command you are running?
        
        Reply ↓
        
        Aniket Kumar on 2022-08-05 at 14:11 said:
        
        Export-DatabricksEnvironment -Artifacts \DBFS\temp\LoanStats3a.csv -LocalPath ‘D:\MyExport’
        
        Reply ↓
        
        Gerhard Brueckl on 2022-08-05 at 14:46 said:
        
        I think what you are looking for is Download-DatabricksFSFile
        also, your syntax for Export-DatabricksEnvironment would be wrong, it should be
        you would need to use Export-DatabricksEnvironment -Artifacts DBFS -LocalPath ‘D:\MyExport’
        
        Reply ↓
        
        Aniket Kumar on 2022-08-05 at 15:33 said:
        
        It doesn’t work.
        
        Reply ↓
        
        Gerhard Brueckl on 2022-08-05 at 21:55 said:
        
        it would be really helpful if you provide any more information than that, its pretty hard to debug otherwise …
        so whats the the PS cmd you executed?
        do you get any error messages?
        have you tried using the -Verbose flag?
        do the other cmdlets work?
        does the DBFS file exist?
rafael veloso on 2021-03-25 at 14:33 said:

And also, about the Data Bases, do you know what is the best way to migrate them?

Reply ↓
- Gerhard Brueckl on 2021-03-25 at 19:47 said:
  
  to be honest, I am not aware of a good solution for databases/SQL objects
  you cannot import/Export them as there are no APIs – the only way would be to execute some code to get the definition of the SQL objects
  but I have not automated something like this yet
  
  Reply ↓
  - Brian Custer on 2021-03-26 at 14:14 said:
    
    Try data studio’s SQL database project extension. I think it works pretty good.
    
    Reply ↓
    - Gerhard Brueckl on 2021-03-26 at 14:22 said:
      
      With Databricks SQL objects? does it work? I mean sync of objects etc.?
      
      Reply ↓
      - Brian Custer on 2021-03-26 at 14:25 said:
        
        I’m sorry I thought you were talking about regular SQL objects. This would not work with Databricks.
        
        Reply ↓
Haruma on 2021-04-28 at 10:23 said:

Hi, is there a way to export the models out?

Reply ↓
- Gerhard Brueckl on 2021-04-28 at 17:33 said:
  
  what do you mean by models?
  SQL databases/Tables or ML models/MLFLow?
  
  Reply ↓
  - Md Ehtesham on 2021-07-09 at 16:38 said:
    
    Is there a way to migrate the ML models/MLFLow from one Azure data bricks workspace to another?
    
    Reply ↓
    - Gerhard Brueckl on 2021-07-09 at 16:47 said:
      
      not that I am aware of
      there is a REST-API for ML-Flow on Databricks which can probably be used but I have not worked with it yet
      though, it is on my long back log to also write a PowerShell wrapper for it
      
      Reply ↓
Shiva on 2021-10-21 at 15:36 said:

how to move my existing DataBricks notebooks to azure synapse workspace ?
any help would be much appreciated.

Reply ↓
- Gerhard Brueckl on 2021-10-21 at 15:55 said:
  
  honestly I have not thought of this yet. But there is also a REST API for Synapse Analytics which you can probably use to uploaded the same notebooks again https://docs.microsoft.com/en-us/rest/api/synapse/
  
  but thats out-of-scope of this post as you also need to make sure that you do not use any Databricks proprietary features in your notebooks
  
  -gerhard
  
  Reply ↓
Vijayakumar V on 2021-11-16 at 17:42 said:

i am not able to import the users. its only importing the groups & not adding the members to it. giving me the below error.

Invoke-RestMethod : {“error_code”:”RESOURCE_DOES_NOT_EXIST”,”message”:”User
‘surbhi.bhan@abc.com’ does not exist”}
At C:\Program Files\WindowsPowerShell\Modules\DatabricksPS\1.9.6.1\Public\General.ps1:87 char:13
+ … $result = Invoke-RestMethod -Uri $apiUrl -Method $Method -Headers $ …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-
RestMethod], WebException
+ FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRes
tMethodCommand

Reply ↓
- Gerhard Brueckl on 2021-11-17 at 10:20 said:
  
  well, the error message basically tells you whats wrong: ”User ‘surbhi.bhan@abc.com’ does not exist”
  maybe you have moved a workspace from one tenant to another where this user has not been added yet?
  
  -gerhard
  
  Reply ↓
Praveen Tyagi on 2021-12-10 at 15:15 said:

Hi,

Is There any way by which we can Export OR Download Complete Data From DBFS.

Reply ↓
- Gerhard Brueckl on 2021-12-13 at 14:54 said:
  Hi Praveen,
  no, there is no built-in way to do this. However, you can use a combination of
```
Download-DatabricksFSFile
```
  and
```
Get-DatabricksFSItem
```
  to download DBFS content recursively
  
  -gerhard
  Reply ↓
Alim Khan on 2022-01-10 at 10:39 said:

Hi,

I’ve tried to use the Import-DatabricksEnvironment cmdlet in an Azure DevOps pipeline and it gives me the following error:

Cannot process command because of one or more missing mandatory parameters: NodeTypeId SparkVersion.

I specifically done this:

Import-DatabricksEnvironment `
-LocalPath “$(Agent.BuildDirectory)/s/[FOLDER]” `

When I ran this locally, I had to provide the values for NodeTypeId and SparkVersion via the prompt after executing the above cmdlet.

I executed the cmdlet locally like this:

Import-DatabricksEnvironment `
-Artifacts Workspace,Clusters,Jobs `
-LocalPath “[LOCAL DIRECTORY]”

Do you have any ideas on how I can execute Import-DatabricksEnvironment in an Azure DevOps pipeline?

Many Thanks!

Reply ↓
- Gerhard Brueckl on 2022-01-12 at 09:56 said:
  
  this must be related to the Clusters being imported – so I guess in your cluster definitions the NodeTypeID and SparkVersion is missing – can you check for this?
  can you share the definition of the cluster(s)?
  
  Reply ↓
  - Alim Khan on 2022-01-12 at 10:12 said:
    
    My cluster definition is this:
    
    {
    “autoscale”: {
    “min_workers”: 2,
    “max_workers”: 8
    },
    “cluster_name”: “Test”,
    “spark_version”: “9.1.x-scala2.12”,
    “spark_conf”: {},
    “azure_attributes”: {
    “first_on_demand”: 1,
    “availability”: “ON_DEMAND_AZURE”,
    “spot_bid_max_price”: -1
    },
    “node_type_id”: “Standard_DS3_v2”,
    “driver_node_type_id”: “Standard_DS3_v2”,
    “ssh_public_keys”: [],
    “custom_tags”: {},
    “spark_env_vars”: {
    “PYSPARK_PYTHON”: “/databricks/python3/bin/python3”
    },
    “autotermination_minutes”: 120,
    “enable_elastic_disk”: true,
    “cluster_source”: “UI”,
    “init_scripts”: [],
    “cluster_id”: “0106-173902-81dw5o61”
    }
    
    It’s just a blank cluster I created for the purpose of testing the import and export of the Databricks’ contents from one resource to another.
    
    So when I did run this locally like mentioned in my first message, the process did work and I could see the cluster in the other resource.
    
    So would there be a way for me to provide values for the nodeTypeId and sparkVersion in an Azure DevOps pipeline? I did try this and got the following error (since there aren’t parameters for the nodeTypeId/sparkVersion):
    
    A parameter cannot be found that matches parameter name ‘NodeTypeId’.
    
    FYI the PowerShell used:
    
    Set-DatabricksEnvironment `
    -AccessToken [dbw-001-token] `
    -ApiRootUrl [dbw-001-URL]
    
    Export-DatabricksEnvironment `
    -LocalPath ‘$(Agent.BuildDirectory)/s/[DIRECTORY]’ `
    -CleanLocalPath `
    -Artifacts Workspace,Clusters,Jobs
    
    Set-DatabricksEnvironment `
    -AccessToken [dbw-002-token] `
    -ApiRootUrl [dbw-002-URL]
    
    Import-DatabricksEnvironment `
    -LocalPath “$(Agent.BuildDirectory)/s/[DIRECTORY]” `
    -NodeTypeId “Standard_DS3_v2” `
    -SparkVersion “9.1.x-scala2.12”
    
    Let me know your thoughts and thank you for your reply 🙂
    
    Reply ↓
    - Gerhard Brueckl on 2022-01-12 at 13:53 said:
      
      I just published v1.9.8.1 which should fix this issue – can you please give it another try?
      
      Reply ↓
      - Alim Khan on 2022-01-13 at 17:58 said:
        
        Good news!
        
        I ran the same pipeline in Azure DevOps and it worked 🙂
        
        Thank you so much for your help and contributions.
        
        If I do need anything I’ll let you know.
        
        Many thanks and best of luck,
        
        Alim
        
        Reply ↓
        
        Gerhard Brueckl on 2022-01-17 at 09:39 said:
        
        great to hear that it works now!
        for the future, please simply open a bug-ticket in the github-repo where it can be tracked much better
        https://github.com/gbrueckl/Databricks.API.PowerShell/issues
        
        Reply ↓
Dhillan Kalyan on 2022-01-21 at 14:35 said:

I have used this before but I obviously doing something wrong.

I have created a Access Token in Databricks, connected to Azure in Powershell using Connect-AzAccount and set the token.

However I am getting this when trying to export

Export-DatabricksEnvironment -LocalPath ‘C:\Testing\MyExport’ -CleanLocalPath
WARNING: This feature is EXPERIMENTAL and still UNDER DEVELOPMENT!
Invoke-RestMethod :
Error 403 Invalid access token.
HTTP ERROR 403
Problem accessing /api/2.0/workspace/list. Reason:
Invalid access token.

Is there a specific way to export the token that I am missing?

Reply ↓
- Gerhard Brueckl on 2022-01-21 at 17:51 said:
  I think you are mixing up different tokens here. The DatabricksPS module always expects a Databricks Personal Access Token (PAT, https://docs.databricks.com/dev-tools/api/latest/authentication.html)
  This is totally independent of the other Azure modules and cmdlets like
```
Connect-AzAccount
```
  A databricks PAT will usually look like
```
dapi1234567890ab1cde2f3ab456c7d89efa
```
  Reply ↓
  - Dhillan Kalyan on 2022-01-24 at 08:05 said:
    
    Thanks for the reply. I figured out what I did wrong. When I created the token initially I forget to copy it and then went back and probably copied the GUID or something. Recreated the token and worked.
    
    I still get some errors but I am guessing it has more to do with the Databricks Environment that was setup.
    
    Reply ↓
    - Gerhard Brueckl on 2022-01-24 at 08:59 said:
      
      please open a ticket at https://github.com/gbrueckl/Databricks.API.PowerShell/issues if you encounter any other issues
      
      Reply ↓
Arush on 2023-04-10 at 00:24 said:

Hi,
This is great. Thanks for sharing! I was able to successfully import the cluster and workspace. However, I get an error when I try to import jobs. Please see the call stack below. Any suggestions on how to resolve the issue? I would appreciate it!

InvalidOperation: \PowerShell\Modules\DatabricksPS\1.11.0.8\Private\General.ps1:396:5
Line |
396 | … $hash[$object.Key] = ConvertTo-Hashtable $object.Value
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Index operation failed; the array index evaluated to null.
Invoke-RestMethod: PowerShell\Modules\DatabricksPS\1.11.0.8\Public\General.ps1:96:13
Line |
96 | … $result = Invoke-RestMethod -Uri $apiUrl -Method $Method -Headers $ …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| {“error_code”:”MALFORMED_REQUEST”,”message”:”Could not parse request object: Expected Scalar value for String field ‘on_failure’\n at
| [Source: (ByteArrayInputStream); line: 1, column: 458]\n at [Source: java.io.ByteArrayInputStream@248fbab6; line: 1, column: 458]”}
Invoke-RestMethod: \PowerShell\Modules\DatabricksPS\1.11.0.8\Public\General.ps1:96:13
Line |
96 | … $result = Invoke-RestMethod -Uri $apiUrl -Method $Method -Headers $ …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| {“error_code”:”INVALID_PARAMETER_VALUE”,”message”:”One of job_cluster_key, new_cluster, or existing_cluster_id must be specified.”}

Reply ↓
- Gerhard Brueckl on 2023-04-13 at 10:41 said:
  
  please open a ticket at https://github.com/gbrueckl/Databricks.API.PowerShell
  
  Reply ↓
Aju on 2023-08-31 at 22:35 said:

We have new envt in Databricks where the repository is cloned to Azure DevOps. Wnen we do the bulk import from lower envt, is there a way to bulk import into Databricks repository also?

Reply ↓
- Gerhard Brueckl on 2023-09-01 at 20:27 said:
  
  if you already have your code in a repo, why would you import it again into your new workspace and not simply attach the repo in the new workspace too?
  
  Reply ↓
Mike on 2024-03-27 at 20:03 said:

I’m running the latest version of DatabricksPS, I have admin on my Databricks workspace, and Owner on my Azure subscription. I can run set-databricksenvironment with my access token and API root URL for East US, and the script appears to be able to view the users and folders, but when I run Export-databricksenvironment -localpath “c:\dbricksexport” -cleanlocalpath it throws the same error continuously as it runs:

Exception calling “WriteAllBytes” with “2” argument(s): “The given path’s format is not supported.”
At C:\Users\myusername\OneDrive – mycompanyname\Documents\WindowsPowerShell\Modules\DatabricksPS\1.12.0.1\Public\WorkspaceAPI.ps1:91 char:3
+ [IO.File]::WriteAllBytes($LocalPath, $exportBytes)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : NotSupportedException

Anything I can attempt to fix?

Reply ↓
- Gerhard Brueckl on 2024-03-28 at 09:06 said:
  
  please open a bug at the repository (https://github.com/gbrueckl/Databricks.API.PowerShell/issues)
  if you are on Windows PowerShell you might try PowerShell Core (or the other way round)
  
  Reply ↓

Share this:

56 Replies to “How-To: Migrating Databricks workspaces”

Leave a Reply Cancel reply