Over the last year I worked a lot with Databricks on Azure and I have to say that I was (and still am) very impressed how well it works and how it integrates with other services of the Microsoft Azure Data Platform like Data Lake Store, Data Factory, etc.
Some of the projects I worked on also included CI/CD like pipelines using Azure DevOps where Databricks did not really shine so bright in the beginning. There are no native tasks for it or anything. But this is OK as for those scenarios, where you need to automate/script something, Databricks offers a REST API (Azure, AWS).
As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended up having 20+ single scripts which was not really maintainable any more. So I packed them into a PowerShell module and also published it to the PowerShell Gallery (https://www.powershellgallery.com/packages/DatabricksPS) for everyone to use!
The module works for Databricks on Azure and also if you run Databricks on AWS – fortunately the API endpoints are almost identical.
The usage is quite simple as for any other PowerShell module:
- Install it using Install-Module cmdlet
- Setup the Databricks environment using API key and endpoint URL
- run the actual cmdlets (e.g. to start a cluster)
Here is the same code for you to copy&paste:
Install-Module -Name DatabricksPS
$accessToken = "dapi123456789be672c4007052d4694a7c51"
$apiUrl = "https://westeurope.azuredatabricks.net"
Set-DatabricksEnvironment -AccessToken $accessToken -ApiRootUrl $apiUrl
Start-DatabricksCluster -ClusterID "1202-211320-brick1"
At the moment, the module supports the following APIs:
- Clusters API (Azure, AWS)
- Groups API (Azure, AWS)
- Jobs API (Azure, AWS)
- Secrets API (Azure, AWS)
- Token API (Azure, AWS)
- Workspace API (Azure, AWS)
- Libraries API (Azure, AWS)
- DBFS API (Azure, AWS)
- Instance Profiles API (AWS)
These APIs are not yet implemented but will be added in the near future:
All the cmdlets are documented and contain links to official documentation of the Rest API call used by the cmdlet. Some API endpoints support different variations of parameters – this was implemented using different parameter sets in PowerShell. There are still some ongoing tests (especially on AWS) and improvements but I general all cmdlets work as expected. I hope this helps anyone else who also has to deal with the Databricks APIs frequently or has to integrate it in a CI/CD pipeline.
The whole source code is also available from my Git-repository (https://github.com/gbrueckl/Databricks.API.PowerShell). If you want to provide any feedback, please use the Git-repository to do so.