Blog

Automating Azure Databricks SCIM Provisioning with Terraform

21 Feb, 2025
Xebia Background Header Wave

Great news! Automatic Identity Management is almost available for Azure. You can read more about it in Databricks’ announcement blog.

Until then, the recommended approach is to use the Azure Entra ID SCIM Enterprise app for one-way automatic synchronization of users in a group. However, SCIM provisioning has several drawbacks compared to the upcoming Automatic Identity Management solution:

FeatureAutomatic identity managementSCIM provisioning
Sync delayInstantSync every 40 min
Sync users
Sync groups✓ (direct members)
Sync nested groups
Sync service principals
Manage EntraID application✓ (negative)
Requires MS EntraId premium✓ (neutral)
Requires MS EntraID Cloud App admin✓ (neutral)
Requires identity federation✓ (neutral)

During the public preview phase, Automatic Identity Management has some UI limitations to be aware of.

Despite these limitations, this new method simplifies integration with Entra ID and should be preferred over SCIM provisioning.

Account-level SCIM provisioning

It’s likely that SCIM provisioning will be deprecated once Automatic Identity Management becomes generally available (GA). Keep in mind that SCIM provisioning is only a short-term solution and comes with several limitations.

Why write a blog about the SCIM provisioning app? Because setting up Terraform automation for SCIM is not well-documented and can be quite challenging to figure out. Some Stackoverflow threads lead to dead ends, and even ChatGPT hasn’t provided a clear solution:

It might feel like this blog is three years too late, but I hope some of you still find value in the working example.

In this guide, we use Terraform to automate the selection of groups that will be synced to Unity Catalog at the account level.

tf_scim_overview

One-time setup SCIM app

Follow the procedure to set up the SCIM app as explained in Step 2 – Configure SCIM provisioning using Microsoft Entra ID. Connect the app provisioning to Databricks at the account level. Since this is a one-time operation, you may prefer to do it manually.

Once the SCIM app is operational, you need to select which groups will be synchronized. Instead of manually updating the SCIM app, as described in Step 3 – Assign groups to the application, we can automate this process using Terraform.

Unity Catalog account-level groups can be of type external when linked with an ObjectId. The groups can no longer be updated using the UI or by Workspace admin settings page by group managers or account admins. To enforce this enable the Immutable external groups in the account console preview page.

Automation with Terraform

The list of groups may be frequently updated, so this process should be automated with the following goals:

  • Straightforward input: A list of EntraId group display names.
  • Add groups to the SCIM app for synchronization, every 40 minutes.
  • Ensure groups in Unity Catalog are created as “external” (unmodifiable through UI).
  • Pre-create groups at the Unity Catalog account level to avoid waiting for the SCIM app to trigger.
    • Requires Account Admin privileges in the account console and could be ignored.

New groups are initially created empty, and once SCIM has run, their members become available. The key advantage of directly creating groups at the account level is that they can be used immediately after the Terraform apply process completes. These groups can then be used to assign permissions or grant workspace access.

# The github gist has an extended example with providers, variable, lookups etc.

locals {
  # Mapping of the group name => object_id
  included_groups = {
    "group1": "5f47aa77-1efd-4496-badc-e41861795ab5"
    "group2": "34194136-7796-4c15-8b33-cbd9a7930cc8"
  }

  # github gist has a proper lookup example.
  scim_app_user_role_id = "f02bd5f2-5ea8-41a4-901d-c20021767e96"
}

# Create (empty) external groups in the Databrick Account-level.
# permissions: requires Account Admin
resource "databricks_group" "scim" {
  for_each      = local.included_groups
  display_name  = each.key
  external_id   = each.value
  force         = true  # don't fail when the group already exists.
}

# Create groups in SCIM-app for synchronization
# permissions: requires Owner on Enterprise Application
resource "azuread_app_role_assignment" "scim_group" {
  for_each            = local.included_groups
  app_role_id         = local.scim_app_user_role_id
  principal_object_id = each.value
  resource_object_id  = "8bac0fbd-d71f-4d2f-8e7a-b4b0f82f2f60" # SCIM-app Object Id
}

Checkout the full Terraform code in this github gist

Add Service Principal as Owner of the SCIM-app

For CI/CD automation, you need to assign a Service Principal to the SCIM app so it can handle group assignments. As the app owner, you can add additional owners.

However, while the Azure Portal allows you to add other users as owners, it does not support adding Service Principals. Fortunately, you can assign non-user owners using the Microsoft Graph API.

Checkout the Azure-CLI command in this gitHub gist.

Conclusion

Automating SCIM provisioning with Terraform simplifies user and group synchronization for Unity Catalog at the account level. While Automatic Identity Management is on the horizon, SCIM remains the go-to solution for now despite its limitations.

By leveraging Terraform, we eliminate the need for manual group assignments, ensure immediate availability of groups in Unity Catalog, and streamline the provisioning process. Additionally, assigning a Service Principal to the SCIM app via the Microsoft Graph API allows for full CI/CD automation.

As Databricks continues to improve identity management, it’s worth keeping an eye on future developments. But until then, this approach ensures a more efficient and automated way to manage identity synchronization.

🚀 Have you implemented a similar solution, or do you have any questions? Feel free to share your thoughts!

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts