Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBUtils implementation for Volumes #623

Merged
merged 13 commits into from
Apr 24, 2024
Merged

DBUtils implementation for Volumes #623

merged 13 commits into from
Apr 24, 2024

Conversation

mgyucht
Copy link
Contributor

@mgyucht mgyucht commented Apr 22, 2024

Changes

UC Volumes has been released for some time, but users are unable to use UC Volumes with dbutils.fs from the SDK. This PR implements support for /Volumes paths in DBUtils in the SDK.

I've done this primarily by extending DbfsExt to work with UC Volumes paths. A new class _VolumePath is supported, implementing the set of base operations defined on the _Path parent abstract class for volume paths. An accompanying _VolumesIO is also implemented to provide a consistent interface for reading from and writing to UC Volumes files, especially for writing: the download API already returns a BinaryIO, but the upload API accepts a BinaryIO, so this adapter allows for a user to "open" a Volumes path for writing.

In order to properly implement ls, I changed the existing implementation of list in _Path subclasses to return a generator of FileInfos. This allows for better reuse of this common functionality between ls, cp and mv.

Open questions:

  • Should we do anything to distinguish between a path named /Volumes on disk and UC Volumes? E.g. via a different scheme?

Tests

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

Copy link

github-actions bot commented Apr 22, 2024

This PR breaks backwards compatibility for databrickslabs/ucx downstream. See build logs for more details.

Running from downstreams #65

return
queue = [self]
while queue:
next_path, queue = queue[0], queue[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing the list will create a copy so this will be O(N), we can improve this to O(1) using deque and use .popleft() to get the first element.

return
queue = [self]
while queue:
next_path, queue = queue[0], queue[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

return _DbfsPath(self, src)
if src.startswith('dbfs:'):
src = src[len('dbfs:'):]
if str(src).startswith('/Volumes'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider typos here: https://docs.databricks.com/en/connect/unity-catalog/volumes.html

Paths are also reserved for potential typos for these paths from Apache Spark APIs and dbutils, including /volumes, /Volume, /volume, whether or not they are preceded by dbfs:/. The path /dbfs/Volumes is also reserved, but cannot be used to access volumes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those paths are reserved but may not be usable themselves for looking up volumes (hopefully). Given that we use the REST API, we may need to be more narrow in what we accept than in DBR, since these paths correspond directly to REST API parameters.

@mgyucht mgyucht enabled auto-merge April 24, 2024 15:21
@mgyucht mgyucht added this pull request to the merge queue Apr 24, 2024
Merged via the queue into main with commit 68feadf Apr 24, 2024
8 of 9 checks passed
@mgyucht mgyucht deleted the dbutils-for-volumes branch April 24, 2024 15:31
github-merge-queue bot pushed a commit that referenced this pull request Apr 26, 2024
…#631)

## Changes
#623 introduced DBUtils support for volumes but also caused a small
regression in listing behavior: `dbutils.fs.ls()` should not include the
`dbfs:` scheme. This PR makes that fix. Additionally, it fixes a small
bug in volumes recursive listing, only including the file paths as is
the behavior with DBFS.

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
mgyucht added a commit that referenced this pull request May 3, 2024
### New Features

* DBUtils implementation for Volumes ([#623](#623), [#634](#634), [#631](#631)).

### Bug Fixes

* Fixed codecov for repository ([#636](#636)).

API Changes:

 * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.CreatePipeline`.
 * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.EditPipeline`.
 * Added `ingestion_definition` field for `databricks.sdk.service.pipelines.PipelineSpec`.
 * Added `databricks.sdk.service.pipelines.IngestionConfig` dataclass.
 * Added `databricks.sdk.service.pipelines.ManagedIngestionPipelineDefinition` dataclass.
 * Added `databricks.sdk.service.pipelines.SchemaSpec` dataclass.
 * Added `databricks.sdk.service.pipelines.TableSpec` dataclass.
 * Changed `create()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service . New request type is `databricks.sdk.service.serving.CreateAppRequest` dataclass.
 * Changed `create()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service to return `databricks.sdk.service.serving.App` dataclass.
 * Removed `delete_app()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Removed `get_app()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Removed `get_app_deployment_status()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Removed `get_apps()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Removed `get_events()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `create_deployment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `delete()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `get()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `get_deployment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `get_environment()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `list()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `list_deployments()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `stop()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `update()` method for [w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html) workspace-level service.
 * Added `get_open_api()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service.
 * Removed `databricks.sdk.service.serving.AppEvents` dataclass.
 * Removed `databricks.sdk.service.serving.AppManifest` dataclass.
 * Removed `databricks.sdk.service.serving.AppServiceStatus` dataclass.
 * Removed `databricks.sdk.service.serving.DeleteAppResponse` dataclass.
 * Removed `databricks.sdk.service.serving.DeployAppRequest` dataclass.
 * Removed `databricks.sdk.service.serving.DeploymentStatus` dataclass.
 * Removed `databricks.sdk.service.serving.DeploymentStatusState` dataclass.
 * Removed `databricks.sdk.service.serving.GetAppDeploymentStatusRequest` dataclass.
 * Removed `databricks.sdk.service.serving.GetAppResponse` dataclass.
 * Removed `databricks.sdk.service.serving.GetEventsRequest` dataclass.
 * Removed `databricks.sdk.service.serving.ListAppEventsResponse` dataclass.
 * Changed `apps` field for `databricks.sdk.service.serving.ListAppsResponse` to `databricks.sdk.service.serving.AppList` dataclass.
 * Added `databricks.sdk.service.serving.App` dataclass.
 * Added `databricks.sdk.service.serving.AppDeployment` dataclass.
 * Added `databricks.sdk.service.serving.AppDeploymentState` dataclass.
 * Added `databricks.sdk.service.serving.AppDeploymentStatus` dataclass.
 * Added `databricks.sdk.service.serving.AppEnvironment` dataclass.
 * Added `databricks.sdk.service.serving.AppState` dataclass.
 * Added `databricks.sdk.service.serving.AppStatus` dataclass.
 * Added `databricks.sdk.service.serving.CreateAppDeploymentRequest` dataclass.
 * Added `databricks.sdk.service.serving.CreateAppRequest` dataclass.
 * Added `databricks.sdk.service.serving.EnvVariable` dataclass.
 * Added `databricks.sdk.service.serving.GetAppDeploymentRequest` dataclass.
 * Added `databricks.sdk.service.serving.GetAppEnvironmentRequest` dataclass.
 * Added `databricks.sdk.service.serving.GetOpenApiRequest` dataclass.
 * Added `any` dataclass.
 * Added `databricks.sdk.service.serving.ListAppDeploymentsRequest` dataclass.
 * Added `databricks.sdk.service.serving.ListAppDeploymentsResponse` dataclass.
 * Added `databricks.sdk.service.serving.ListAppsRequest` dataclass.
 * Added `databricks.sdk.service.serving.StopAppRequest` dataclass.
 * Added `any` dataclass.
 * Added `databricks.sdk.service.serving.UpdateAppRequest` dataclass.
 * Removed [w.csp_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/csp_enablement.html) workspace-level service.
 * Removed [w.esm_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/esm_enablement.html) workspace-level service.
 * Added [w.compliance_security_profile](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/compliance_security_profile.html) workspace-level service.
 * Added [w.enhanced_security_monitoring](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/enhanced_security_monitoring.html) workspace-level service.
 * Removed `databricks.sdk.service.settings.CspEnablement` dataclass.
 * Removed `databricks.sdk.service.settings.CspEnablementSetting` dataclass.
 * Removed `databricks.sdk.service.settings.EsmEnablement` dataclass.
 * Removed `databricks.sdk.service.settings.EsmEnablementSetting` dataclass.
 * Removed `databricks.sdk.service.settings.GetCspEnablementSettingRequest` dataclass.
 * Removed `databricks.sdk.service.settings.GetEsmEnablementSettingRequest` dataclass.
 * Removed `databricks.sdk.service.settings.UpdateCspEnablementSettingRequest` dataclass.
 * Removed `databricks.sdk.service.settings.UpdateEsmEnablementSettingRequest` dataclass.
 * Added `databricks.sdk.service.settings.ComplianceSecurityProfile` dataclass.
 * Added `databricks.sdk.service.settings.ComplianceSecurityProfileSetting` dataclass.
 * Added `databricks.sdk.service.settings.EnhancedSecurityMonitoring` dataclass.
 * Added `databricks.sdk.service.settings.EnhancedSecurityMonitoringSetting` dataclass.
 * Added `databricks.sdk.service.settings.GetComplianceSecurityProfileSettingRequest` dataclass.
 * Added `databricks.sdk.service.settings.GetEnhancedSecurityMonitoringSettingRequest` dataclass.
 * Added `databricks.sdk.service.settings.UpdateComplianceSecurityProfileSettingRequest` dataclass.
 * Added `databricks.sdk.service.settings.UpdateEnhancedSecurityMonitoringSettingRequest` dataclass.
 * Added `tags` field for `databricks.sdk.service.sql.DashboardEditContent`.
 * Added `tags` field for `databricks.sdk.service.sql.QueryEditContent`.
 * Added `catalog` field for `databricks.sdk.service.sql.QueryOptions`.
 * Added `schema` field for `databricks.sdk.service.sql.QueryOptions`.
 * Added `tags` field for `databricks.sdk.service.sql.QueryPostContent`.
 * Added `query` field for `databricks.sdk.service.sql.Visualization`.

OpenAPI SHA: 9bb7950fa3390afb97abaa552934bc0a2e069de5, Date: 2024-05-02
@mgyucht mgyucht mentioned this pull request May 3, 2024
github-merge-queue bot pushed a commit that referenced this pull request May 3, 2024
### New Features

* DBUtils implementation for Volumes
([#623](#623),
[#634](#634),
[#631](#631)). You
can now use `w.dbutils.fs` with UC volumes paths. Error handling for
non-UC, non-DBFS and non-local paths has also been improved.

### Bug Fixes

* Fixed codecov for repository
([#636](#636)).

API Changes:

* Added `ingestion_definition` field for
`databricks.sdk.service.pipelines.CreatePipeline`.
* Added `ingestion_definition` field for
`databricks.sdk.service.pipelines.EditPipeline`.
* Added `ingestion_definition` field for
`databricks.sdk.service.pipelines.PipelineSpec`.
 * Added `databricks.sdk.service.pipelines.IngestionConfig` dataclass.
* Added
`databricks.sdk.service.pipelines.ManagedIngestionPipelineDefinition`
dataclass.
 * Added `databricks.sdk.service.pipelines.SchemaSpec` dataclass.
 * Added `databricks.sdk.service.pipelines.TableSpec` dataclass.
* Changed `create()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service . New request type is
`databricks.sdk.service.serving.CreateAppRequest` dataclass.
* Changed `create()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service to return `databricks.sdk.service.serving.App`
dataclass.
* Removed `delete_app()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Removed `get_app()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Removed `get_app_deployment_status()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Removed `get_apps()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Removed `get_events()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `create_deployment()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `delete()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `get()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `get_deployment()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `get_environment()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `list()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `list_deployments()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `stop()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `update()` method for
[w.apps](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html)
workspace-level service.
* Added `get_open_api()` method for
[w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html)
workspace-level service.
 * Removed `databricks.sdk.service.serving.AppEvents` dataclass.
 * Removed `databricks.sdk.service.serving.AppManifest` dataclass.
 * Removed `databricks.sdk.service.serving.AppServiceStatus` dataclass.
 * Removed `databricks.sdk.service.serving.DeleteAppResponse` dataclass.
 * Removed `databricks.sdk.service.serving.DeployAppRequest` dataclass.
 * Removed `databricks.sdk.service.serving.DeploymentStatus` dataclass.
* Removed `databricks.sdk.service.serving.DeploymentStatusState`
dataclass.
* Removed `databricks.sdk.service.serving.GetAppDeploymentStatusRequest`
dataclass.
 * Removed `databricks.sdk.service.serving.GetAppResponse` dataclass.
 * Removed `databricks.sdk.service.serving.GetEventsRequest` dataclass.
* Removed `databricks.sdk.service.serving.ListAppEventsResponse`
dataclass.
* Changed `apps` field for
`databricks.sdk.service.serving.ListAppsResponse` to
`databricks.sdk.service.serving.AppList` dataclass.
 * Added `databricks.sdk.service.serving.App` dataclass.
 * Added `databricks.sdk.service.serving.AppDeployment` dataclass.
 * Added `databricks.sdk.service.serving.AppDeploymentState` dataclass.
 * Added `databricks.sdk.service.serving.AppDeploymentStatus` dataclass.
 * Added `databricks.sdk.service.serving.AppEnvironment` dataclass.
 * Added `databricks.sdk.service.serving.AppState` dataclass.
 * Added `databricks.sdk.service.serving.AppStatus` dataclass.
* Added `databricks.sdk.service.serving.CreateAppDeploymentRequest`
dataclass.
 * Added `databricks.sdk.service.serving.CreateAppRequest` dataclass.
 * Added `databricks.sdk.service.serving.EnvVariable` dataclass.
* Added `databricks.sdk.service.serving.GetAppDeploymentRequest`
dataclass.
* Added `databricks.sdk.service.serving.GetAppEnvironmentRequest`
dataclass.
 * Added `databricks.sdk.service.serving.GetOpenApiRequest` dataclass.
 * Added `any` dataclass.
* Added `databricks.sdk.service.serving.ListAppDeploymentsRequest`
dataclass.
* Added `databricks.sdk.service.serving.ListAppDeploymentsResponse`
dataclass.
 * Added `databricks.sdk.service.serving.ListAppsRequest` dataclass.
 * Added `databricks.sdk.service.serving.StopAppRequest` dataclass.
 * Added `any` dataclass.
 * Added `databricks.sdk.service.serving.UpdateAppRequest` dataclass.
* Removed
[w.csp_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/csp_enablement.html)
workspace-level service.
* Removed
[w.esm_enablement](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/esm_enablement.html)
workspace-level service.
* Added
[w.compliance_security_profile](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/compliance_security_profile.html)
workspace-level service.
* Added
[w.enhanced_security_monitoring](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/enhanced_security_monitoring.html)
workspace-level service.
 * Removed `databricks.sdk.service.settings.CspEnablement` dataclass.
* Removed `databricks.sdk.service.settings.CspEnablementSetting`
dataclass.
 * Removed `databricks.sdk.service.settings.EsmEnablement` dataclass.
* Removed `databricks.sdk.service.settings.EsmEnablementSetting`
dataclass.
* Removed
`databricks.sdk.service.settings.GetCspEnablementSettingRequest`
dataclass.
* Removed
`databricks.sdk.service.settings.GetEsmEnablementSettingRequest`
dataclass.
* Removed
`databricks.sdk.service.settings.UpdateCspEnablementSettingRequest`
dataclass.
* Removed
`databricks.sdk.service.settings.UpdateEsmEnablementSettingRequest`
dataclass.
* Added `databricks.sdk.service.settings.ComplianceSecurityProfile`
dataclass.
* Added
`databricks.sdk.service.settings.ComplianceSecurityProfileSetting`
dataclass.
* Added `databricks.sdk.service.settings.EnhancedSecurityMonitoring`
dataclass.
* Added
`databricks.sdk.service.settings.EnhancedSecurityMonitoringSetting`
dataclass.
* Added
`databricks.sdk.service.settings.GetComplianceSecurityProfileSettingRequest`
dataclass.
* Added
`databricks.sdk.service.settings.GetEnhancedSecurityMonitoringSettingRequest`
dataclass.
* Added
`databricks.sdk.service.settings.UpdateComplianceSecurityProfileSettingRequest`
dataclass.
* Added
`databricks.sdk.service.settings.UpdateEnhancedSecurityMonitoringSettingRequest`
dataclass.
* Added `tags` field for
`databricks.sdk.service.sql.DashboardEditContent`.
 * Added `tags` field for `databricks.sdk.service.sql.QueryEditContent`.
 * Added `catalog` field for `databricks.sdk.service.sql.QueryOptions`.
 * Added `schema` field for `databricks.sdk.service.sql.QueryOptions`.
 * Added `tags` field for `databricks.sdk.service.sql.QueryPostContent`.
 * Added `query` field for `databricks.sdk.service.sql.Visualization`.

OpenAPI SHA: 9bb7950fa3390afb97abaa552934bc0a2e069de5, Date: 2024-05-02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants