Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] CreateCluster is missing data_security_mode attribute #225

Closed
judahrand opened this issue Jul 10, 2023 · 7 comments
Closed

[ISSUE] CreateCluster is missing data_security_mode attribute #225

judahrand opened this issue Jul 10, 2023 · 7 comments
Labels
OpenAPI issues related to metadata across all SDKs

Comments

@judahrand
Copy link
Contributor

judahrand commented Jul 10, 2023

Description
We have policies in place which require data_security_mode to be set when creating a Cluster. Because this attribute is missing we cannot create clusters with the SDK.

Expected behavior
One should be able to set data_security_mode when calling ClustersAPI.create.

Debug Logs
The SDK logs helpful debugging information when debug logging is enabled. Set the log level to debug by adding logging.basicConfig(level=logging.DEBUG) to your program, and include the logs here.

Other Information

  • OS: [e.g. macOS]
  • Version: [e.g. 0.1.0]

Additional context
Add any other context about the problem here.

@judahrand judahrand changed the title [ISSUE] Can't create CreateCluster is missing data_security_mode attribute [ISSUE] CreateCluster is missing data_security_mode attribute Jul 11, 2023
@judahrand
Copy link
Contributor Author

@mgyucht could you please look at this issue? Does this mean that this attribute is missing from the OpenAPI spec? It definitely is accepted and used by the actually endpoint.

@narquette
Copy link

Agree with adding this to the Class (ClusterCreate) and related method (clusters.create).
Since the edit method (clusters.edit) has data_security_mode as an argument, the workflow I use calls the create method and saves the response in a variable so I can follow the create step with an edit using the response cluster id and save arguments as the create. It is not ideal but it works.

Example:

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.compute import AutoScale, AwsAttributes, AwsAvailability, ClusterSource, DataSecurityMode, \
    RuntimeEngine
import time

w = WorkspaceClient(profile='DEFAULT')

cluster_policies = [pol for pol in w.cluster_policies.list() if pol.name == 'HIPAA_intelli_curvgh']
cluster_policy = cluster_policies[0]
spark_version = '13.2.x-cpu-ml-scala2.12'

cluster_info = {
    'spark_version': spark_version,
    'autoscale': AutoScale(min_workers=2, max_workers=8),
    'autotermination_minutes': 30,
    'aws_attributes': AwsAttributes(
        availability=AwsAvailability('SPOT_WITH_FALLBACK'),
        ebs_volume_count=0,
        first_on_demand=1,
        instance_profile_arn='<add_arn_role>',
        spot_bid_price_percent=100,
        zone_id='auto'
    ),
    'cluster_name': 'Nick Cluster Copy',
    'cluster_source': ClusterSource('API'),
    'data_security_mode': DataSecurityMode('SINGLE_USER'),
    'driver_node_type_id': 'i3en.2xlarge',
    'enable_elastic_disk': True,
    'enable_local_disk_encryption': False,
    'enable_unity_catalog': True,
    'node_type_id': 'i3en.2xlarge',
    'policy_id': cluster_policy.policy_id,
    'runtime_engine': RuntimeEngine('STANDARD'),
    'single_user_name': '<add_your_user_name>',
    'spark_conf': {'spark.databricks.service.port': '8787', 'spark.databricks.service.server.enabled': 'true'},
    'spark_env_vars': None,
    'ssh_public_keys': None
}

resp = w.clusters.create(**cluster_info)

## wait until the cluster is running
while w.clusters.get(resp.response.cluster_id).state.name == 'PENDING':
    time.sleep(60)

w.clusters.edit(cluster_id=resp.response.cluster_id, **cluster_info)

@nfx
Copy link
Contributor

nfx commented Aug 2, 2023

@narquette w.clusters.create(..).get() should wait until cluster is properly running or fail. please update your code.

@nfx nfx added the OpenAPI issues related to metadata across all SDKs label Aug 2, 2023
@judahrand
Copy link
Contributor Author

judahrand commented Aug 2, 2023

@narquette w.clusters.create(..).get() should wait until cluster is properly running or fail. please update your code.

This is true but is also kind of weird behaviour from Databricks imo. It isn't clear that creating a cluster should also start it.

Once #227 is merged I'd argue that the obvious (though you're correct that it is unnecessary) code would be:

resp = w.clusters.create(**cluster_info)
w.clusters.ensure_cluster_is_running(resp.response.cluster_id)

But Databricks has a lot of unintuitive behaviour 🤷

@nfx
Copy link
Contributor

nfx commented Aug 2, 2023

This is true but is also kind of weird behaviour from Databricks imo. It isn't clear that creating a cluster should also start it.

@judahrand it's starting a cluster, yes. will need to make it clear in the documentation.

But Databricks has a lot of unintuitive behaviour 🤷

SDK docs will get improved over time. Please keep an eye on them :)

@judahrand
Copy link
Contributor Author

More importantly, is this issue likely to be fixed any time soon? It isn't one that the community can help with since the OpenAPI spec isn't publicly available (I'm still somewhat unclear as to why).

@mgyucht
Copy link
Contributor

mgyucht commented Aug 14, 2023

Hi @judahrand, sorry I missed your tag. In the meantime, this field was added to the OpenAPI spec. It is included in the latest release of the SDK: https://github.com/databricks/databricks-sdk-py/blob/main/databricks/sdk/service/compute.py#L4090.

As for the OpenAPI spec, we will eventually make the spec public but have not prioritized it yet. We understand that your ability to contribute to the SDK is very limited without the spec. For now, we've primarily focused on improving the SDK development cycle for internal contributors, but over time we expect that others will be able to contribute. Thank you for your understanding.

@mgyucht mgyucht closed this as completed Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenAPI issues related to metadata across all SDKs
Projects
None yet
Development

No branches or pull requests

4 participants