subcategory |
---|
Deployment |
-> Note Initialize provider with alias = "mws"
, host = "https://accounts.cloud.databricks.com"
and use provider = databricks.mws
Enables you to register aws_vpc_endpoint resources or gcp vpc_endpoint resources with Databricks such that they can be used as part of a databricks_mws_networks configuration.
It is strongly recommended that customers read the Enable AWS Private Link or the Enable GCP Private Service Connect documentation before trying to leverage this resource.
Before using this resource, you will need to create the necessary VPC Endpoints as per your VPC endpoint requirements. You can use the aws_vpc_endpoint resource for this, for example:
resource "aws_vpc_endpoint" "workspace" {
vpc_id = module.vpc.vpc_id
service_name = local.private_link.workspace_service
vpc_endpoint_type = "Interface"
security_group_ids = [module.vpc.default_security_group_id]
subnet_ids = [aws_subnet.pl_subnet.id]
depends_on = [aws_subnet.pl_subnet]
private_dns_enabled = true
}
resource "aws_vpc_endpoint" "relay" {
vpc_id = module.vpc.vpc_id
service_name = local.private_link.relay_service
vpc_endpoint_type = "Interface"
security_group_ids = [module.vpc.default_security_group_id]
subnet_ids = [aws_subnet.pl_subnet.id]
depends_on = [aws_subnet.pl_subnet]
private_dns_enabled = true
}
Depending on your use case, you may need or choose to add VPC Endpoints for the AWS Services Databricks uses. See Add VPC endpoints for other AWS services (recommended but optional) for more information. For example:
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
route_table_ids = module.vpc.private_route_table_ids
service_name = "com.amazonaws.${var.region}.s3"
depends_on = [module.vpc]
}
resource "aws_vpc_endpoint" "sts" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.sts"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [module.vpc.default_security_group_id]
depends_on = [module.vpc]
private_dns_enabled = true
}
resource "aws_vpc_endpoint" "kinesis-streams" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.kinesis-streams"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [module.vpc.default_security_group_id]
depends_on = [module.vpc]
}
Once you have created the necessary endpoints, you need to register each of them via this Terraform resource, which calls out to the Databricks Account API):
resource "databricks_mws_vpc_endpoint" "workspace" {
provider = databricks.mws
account_id = var.databricks_account_id
aws_vpc_endpoint_id = aws_vpc_endpoint.workspace.id
vpc_endpoint_name = "VPC Relay for ${module.vpc.vpc_id}"
region = var.region
depends_on = [aws_vpc_endpoint.workspace]
}
resource "databricks_mws_vpc_endpoint" "relay" {
provider = databricks.mws
account_id = var.databricks_account_id
aws_vpc_endpoint_id = aws_vpc_endpoint.relay.id
vpc_endpoint_name = "VPC Relay for ${module.vpc.vpc_id}"
region = var.region
depends_on = [aws_vpc_endpoint.relay]
}
Typically the next steps after this would be to create a databricks_mws_private_access_settings and databricks_mws_networks configuration, before passing the databricks_mws_private_access_settings.pas.private_access_settings_id
and databricks_mws_networks.this.network_id
into a databricks_mws_workspaces resource:
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
aws_region = var.region
workspace_name = local.prefix
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
network_id = databricks_mws_networks.this.network_id
private_access_settings_id = databricks_mws_private_access_settings.pas.private_access_settings_id
pricing_tier = "ENTERPRISE"
depends_on = [databricks_mws_networks.this]
}
Before using this resource, you will need to create the necessary Private Service Connect (PSC) connections on your Google Cloud VPC networks. You can see Enable Private Service Connect for your workspace for more details.
Once you have created the necessary PSC connections, you need to register each of them via this Terraform resource, which calls out to the Databricks Account API.
variable "databricks_account_id" {
description = "Account Id that could be found in https://accounts.gcp.databricks.com/"
}
variable "databricks_google_service_account" {}
variable "google_project" {}
variable "subnet_region" {}
provider "databricks" {
alias = "mws"
host = "https://accounts.gcp.databricks.com"
}
resource "databricks_mws_vpc_endpoint" "workspace" {
provider = databricks.mws
account_id = var.databricks_account_id
vpc_endpoint_name = "PSC Rest API endpoint"
gcp_vpc_endpoint_info {
project_id = var.google_project
psc_endpoint_name = "PSC Rest API endpoint"
endpoint_region = var.subnet_region
}
}
resource "databricks_mws_vpc_endpoint" "relay" {
provider = databricks.mws
account_id = var.databricks_account_id
vpc_endpoint_name = "PSC Relay endpoint"
gcp_vpc_endpoint_info {
project_id = var.google_project
psc_endpoint_name = "PSC Relay endpoint"
endpoint_region = var.subnet_region
}
}
Typically the next steps after this would be to create a databricks_mws_private_access_settings and databricks_mws_networks configuration, before passing the databricks_mws_private_access_settings.pas.private_access_settings_id
and databricks_mws_networks.this.network_id
into a databricks_mws_workspaces resource:
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
workspace_name = "gcp workspace"
location = var.subnet_region
cloud_resource_container {
gcp {
project_id = var.google_project
}
}
gke_config {
connectivity_type = "PRIVATE_NODE_PUBLIC_MASTER"
master_ip_range = "10.3.0.0/28"
}
network_id = databricks_mws_networks.this.network_id
private_access_settings_id = databricks_mws_private_access_settings.pas.private_access_settings_id
pricing_tier = "PREMIUM"
depends_on = [databricks_mws_networks.this]
}
The following arguments are required:
account_id
- Account Id that could be found in the Accounts Console for AWS or GCPaws_vpc_endpoint_id
- (AWS only) ID of configured aws_vpc_endpointvpc_endpoint_name
- Name of VPC Endpoint in Databricks Accountregion
- (AWS only) Region of AWS VPCgcp_vpc_endpoint_info
- (GCP only) a block consists of Google Cloud specific information for this PSC endpoint. It has the following fields:project_id
- The Google Cloud project ID of the VPC network where the PSC connection resides.psc_endpoint_name
- The name of the PSC endpoint in the Google Cloud project.endpoint_region
- Region of the PSC endpoint.
In addition to all arguments above, the following attributes are exported:
id
- the ID of VPC Endpoint in form ofaccount_id/vpc_endpoint_id
vpc_endpoint_id
- Canonical unique identifier of VPC Endpoint in Databricks Accountaws_endpoint_service_id
- (AWS Only) The ID of the Databricks endpoint service that this VPC endpoint is connected to. Please find the list of endpoint service IDs for each supported region in the Databricks PrivateLink documentationstate
- (AWS Only) State of VPC Endpointgcp_vpc_endpoint_info
- (GCP only) a block consists of Google Cloud specific information for this PSC endpoint. It has the following fields exported:psc_connection_id
- The unique ID of this PSC connection.service_attachment_id
- The service attachment this PSC connection connects to.
-> Note Importing this resource is not currently supported.
The following resources are used in the same context:
- Provisioning Databricks on AWS guide.
- Provisioning Databricks on AWS with PrivateLink guide.
- Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection guide.
- Provisioning Databricks workspaces on GCP with Private Service Connect guide.
- databricks_mws_networks to configure VPC & subnets for new workspaces within AWS.
- databricks_mws_private_access_settings to create a Private Access Setting that can be used as part of a databricks_mws_workspaces resource to create a Databricks Workspace that leverages AWS PrivateLink.
- databricks_mws_workspaces to set up workspaces in E2 architecture on AWS.