Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Purview Role Assignment & Storage Access Rules #213

Merged
merged 7 commits into from
Dec 6, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/linters/.arm-ttk.psd1
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@
# Test = @( )
Skip = @(
'Template Should Not Contain Blanks',
'DeploymentTemplate Must Not Contain Hardcoded Uri'
'DeploymentTemplate Must Not Contain Hardcoded Uri',
'DependsOn Best Practices',
'Outputs Must Not Contain Secrets',
'IDs Should Be Derived From ResourceIDs'
'IDs Should Be Derived From ResourceIDs',
'apiVersions Should Be Recent'
)
}
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Lint Code Base

on:
push:
branches-ignore: [master]
branches: [main]
pull_request:
branches: [main]

Expand Down
7 changes: 4 additions & 3 deletions docs/EnterpriseScaleAnalytics-ServicePrincipal.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,10 @@ Additional required role assignments include:

| Role Name | Description | Scope |
|:----------|:------------|:------|
| [Private DNS Zone Contributor](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#private-dns-zone-contributor) | We expect you to deploy all Private DNS Zones for all data services into a single subscription and resource group. Therefor, the service principal needs to be Private DNS Zone Contributor on the global dns resource group which was created during the Data Management Zone deployment. This is required to deploy A-records for the respective private endpoints.| <div style="width: 31ch">(Resource Group Scope) `/subscriptions/{{datamanagement}subscriptionId}/resourceGroups/{resourceGroupName}`</div> |
| [Network Contributor](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#network-contributor) | In order to setup vnet peering between the Data Landing Zone vnet and the Data Management Landing Zone vnet, the service principal needs **Network Contributor** access rights on the resource group of the remote vnet. | <div style="width: 31ch">(Resource Group Scope) `/subscriptions/{{datamanagement}subscriptionId}/resourceGroups/{resourceGroupName}`</div> |
| [User Access Administrator](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) | Required to share the self-hosted integration runtime that gets deployed into the `integration-rg` resource group with other Data Factories, like the one in the `shared-integration-rg` resource group, the service principal needs **User Access Administrator** rights on the Data Factory that gets deployed into the `integration-rg` resource group. It is also required to assign the Data Factory and Synapse managed identities access on the respective storage account file systems. | <div style="width: 31ch">(Resource Scope) `/subscriptions/{{datalandingzone}subscriptionId}`</div> |
| [Private DNS Zone Contributor](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#private-dns-zone-contributor) | We expect you to deploy all Private DNS Zones for all data services into a single subscription and resource group. Therefor, the service principal needs to be Private DNS Zone Contributor on the global dns resource group which was created during the Data Management Zone deployment. This is required to deploy A-records for the respective private endpoints.| <div style="width: 31ch">(Resource Group Scope) `/subscriptions/{datamanagement-subscriptionId}/resourceGroups/{resourceGroupName}`</div> |
| [Network Contributor](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#network-contributor) | In order to setup vnet peering between the Data Landing Zone vnet and the Data Management Landing Zone vnet, the service principal needs **Network Contributor** access rights on the resource group of the remote vnet. | <div style="width: 31ch">(Resource Group Scope) `/subscriptions/{datamanagement-subscriptionId}/resourceGroups/{resourceGroupName}`</div> |
| [User Access Administrator](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) | Required to share the self-hosted integration runtime that gets deployed into the `integration-rg` resource group with other Data Factories, like the one in the `shared-integration-rg` resource group, the service principal needs **User Access Administrator** rights on the Data Factory that gets deployed into the `integration-rg` resource group. It is also required to assign the Data Factory and Synapse managed identities access on the respective storage account file systems. | <div style="width: 31ch">(Resource Scope) `/subscriptions/{datalandingzone-subscriptionId}`</div> |
| [Reader](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#reader) | Required to read properties of the Purview account from the Data Management Zone and then grant the managed identity of Purview **Reader** and **Storage Blob Data Reader** rights on the subscription. | <div style="width: 31ch">(Resource Scope) `/subscriptions/{{datamanagement-subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Purview/accounts/{purviewAccountName}`</div> |

To add these role assignments, you can use the [Azure Portal](https://portal.azure.com/) or run the following commands using Azure CLI/Azure Powershell:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Cluster tags allow to easily monitor the cost of cloud resources used by various

> **Note:** By default, Azure Databricks applies the following tags to each cluster: **Vendor**, **Creator**, **ClusterName** and **ClusterId**

In this implementation, **Regex policy** is used in order to enforce cost center tags definition when user creates a new Databricks cluster. As a result, Azure Databricks applies these tags to the cloud resources like VMs and disk volumes associated to the specific cluster.
In this implementation, a **Regular expression policy** is used in order to enforce cost center tags definition when user creates a new Databricks cluster. As a result, Azure Databricks applies these tags to the cloud resources like VMs and disk volumes associated to the specific cluster.

In this case the custom policy JSON section will be the following:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@

Due to the following service limitations, Databricks needs to be setup manually:

- Creating a Key Vault backed Databricks secret scope is not possible via Service Principle. The issue can be tracked [here](https://github.com/databricks/databricks-cli/issues/338).
- Creating a Key Vault backed Databricks secret scope is not possible via Service Principal. The issue can be tracked [here](https://github.com/databricks/databricks-cli/issues/338).

## Manual Databricks configuration

Due to the issue mentioned above, we cannot rely on the application workflow, but only rely on the on-behalf workflow. This means, that instead of using a Service Principle for authentication, we need to rely on a user being authenticated against the workspace. Only then, we can use the Databricks API to create a Key Vault backed secret scopes. If you are ok with Databricks backed secret scopes, then you can already automate the complete setup end-to-end. However, for manageability reasons, we are recommending to use Key Vaults for storing secrets.
Due to the issue mentioned above, we cannot rely on the application workflow, but only rely on the on-behalf workflow. This means, that instead of using a Service Principle for authentication, we need to rely on a user being authenticated against the workspace. Only then, we can use the Databricks API to create a Key Vault backed secret scopes. If you are OK with Databricks backed secret scopes, then you can already automate the complete setup end-to-end. However, for manageability reasons, we recommend using Azure Key Vaults for storing secrets.

In order to simplify the manual setup and configuration of Databricks, we are providing a Powershell script (`SetupDatabricksManually.ps1`) as well as pre-defined commands in the DevOps and GitHub workflows. You can copy and paste these commands into your Powershell console to setup your Databricks workspaces manually by executing a single script. The Powershell script will perform the following actions in your Databricks workspace:
In order to simplify the manual setup and configuration of Databricks, we are providing a Powershell script (`SetupDatabricksManually.ps1`) as well as predefined commands in the DevOps and GitHub workflows. You can copy and paste these commands into your Powershell console to setup your Databricks workspaces manually by executing a single script. The Powershell script will perform the following actions in your Databricks workspace:

1. Setup of Key Vault backed secret scopes and the respective ACLs. These secret scopes store the credentials that are required for connecting to the external Hive metastore as well as the Log Analytics workspace.
1. Execution of a Databricks Notebook to achieve the following:
Expand Down Expand Up @@ -60,7 +60,7 @@ After deploying the Data Landing Zone successfully, execute the following steps:

| Value | Target |
|:------------|:------------|
| {userEmail} | Replace with your user E-Mail address. |
| {userEmail} | Replace with your user email address. |
| {password} | Replace with your user password. |
| {clientId} | Replace with the **Application (client) ID** of your AAD application. |
| {tenantId} | Replace with the **Directory (tenant) ID** of your AAD application. |
Expand Down
21 changes: 21 additions & 0 deletions infra/main.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ module storageServices 'modules/storage.bicep' = {
prefix: name
tags: tagsJoined
subnetId: networkServices.outputs.servicesSubnetId
purviewId: purviewId
privateDnsZoneIdBlob: privateDnsZoneIdBlob
privateDnsZoneIdDfs: privateDnsZoneIdDfs
}
Expand All @@ -230,6 +231,7 @@ module externalStorageServices 'modules/externalstorage.bicep' = {
location: location
prefix: name
tags: tagsJoined
purviewId: purviewId
subnetId: networkServices.outputs.servicesSubnetId
privateDnsZoneIdBlob: privateDnsZoneIdBlob
}
Expand Down Expand Up @@ -359,6 +361,25 @@ resource dataProduct002ResourceGroup 'Microsoft.Resources/resourceGroups@2021-01
properties: {}
}

// Role assignment
module purviewSubscriptionRoleAssignmentReader 'modules/auxiliary/purviewRoleAssignmentSubscription.bicep' = if(!empty(purviewId)) {
name: 'purviewSubscriptionRoleAssignmentReader'
scope: subscription()
params: {
purviewId: purviewId
role: 'Reader'
}
}

module purviewSubscriptionRoleAssignmentStorageBlobReader 'modules/auxiliary/purviewRoleAssignmentSubscription.bicep' = if(!empty(purviewId)) {
name: 'purviewSubscriptionRoleAssignmentStorageBlobReader'
scope: subscription()
params: {
purviewId: purviewId
role: 'StorageBlobDataReader'
}
}

// Outputs
output vnetId string = networkServices.outputs.vnetId
output nsgId string = networkServices.outputs.nsgId
Expand Down
Loading