Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Models in dbutils.fs operations #750

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

shichengzhou-db
Copy link
Contributor

@shichengzhou-db shichengzhou-db commented Sep 5, 2024

Changes

  • Support files operations in WorkspaceClient.Files for Databricks UC Model artifacts so that user can use databricks sdk to download UC model artifacts.
  • This PR is part of the work to migrate mlflow client towards using databricks sdk for model artifacts download/upload operations for better security.

Tests

  • Existing tests in test_dbfs_mixins.py, similar to how _VolumesPath is tested
  • The following code works
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
resp = w.files.download("/Models/system/ai/dbrx_instruct/3/MLmodel")
  • make test run locally
  • make fmt applied
  • relevant integration tests applied

@shichengzhou-db shichengzhou-db changed the title support Models in files operations [Internal] support Models in files operations Sep 5, 2024
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these basically duplicate _VolumesPath and _VolumesIO. Let's reuse those instead of defining duplicate implementations. We can rename those to _FilesPath and _FilesIO to indicate that they can be used generally for resources backed by the Files API.

@@ -467,6 +565,75 @@ def __repr__(self) -> str:
return f'<_VolumesPath {self._path}>'


class _ModelsPath(_Path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new Path implementation? It seems like this just uses the Files API under the hood. From what I can tell, the only difference is open(), which returns ModelsIO now. That is in turn almost identical to _FilesIO.

If we changed https://github.com/databricks/databricks-sdk-py/pull/750/files#diff-469b96ba8e49ef7cbe8d3b4577d2a0ddf3a3552638b76d87bfccb936fda65d18R759 to check whether the path started with either /Volumes or /Models, I think the existing implementation would work. Is this correct? If so, we should just rename that to _FilesPath (as it uses the Files API) and reuse it.

Ultimately, that _path method should probably use some kind of prefix allowlist to determine whether the _FilesPath should be used, and we can add /Volumes and /Models as the initial prefixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgyucht that sounds better. thanks for the suggestion, updated.

(I am not in the osp team but offered to do this work so we can speed up mlflow adoption of the databricks sdk, but I think since it's all Files API / Presigned URLs behind this sdk, we should be able to reuse the same path and io class for all securables)

@mgyucht mgyucht changed the title [Internal] support Models in files operations [Feature] Support Models in dbutils.fs operations Sep 9, 2024
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this contribution!

@shichengzhou-db
Copy link
Contributor Author

jenkins merge

@shichengzhou-db
Copy link
Contributor Author

hey @mgyucht how you know how to merge this pr? not seeing anything on my UI for merging

auto-merge was automatically disabled September 10, 2024 22:55

Head branch was pushed to by a user without write access

@shichengzhou-db shichengzhou-db force-pushed the support_Models branch 3 times, most recently from a26ecae to c7367a5 Compare September 11, 2024 22:56
@mgyucht mgyucht added this pull request to the merge queue Sep 12, 2024
Merged via the queue into databricks:main with commit 3162545 Sep 12, 2024
14 checks passed
tanmay-db added a commit that referenced this pull request Sep 16, 2024
### New Features and Improvements

 * Support Models in `dbutils.fs` operations ([#750](#750)).

### Bug Fixes

 * Do not specify --tenant flag when fetching managed identity access token from the CLI ([#748](#748)).
 * Fix deserialization of 401/403 errors ([#758](#758)).
 * Use correct optional typing in `WorkspaceClient` for `mypy` ([#760](#760)).
github-merge-queue bot pushed a commit that referenced this pull request Sep 17, 2024
### New Features and Improvements

* Support Models in `dbutils.fs` operations
([#750](#750)).


### Bug Fixes

* Do not specify --tenant flag when fetching managed identity access
token from the CLI
([#748](#748)).
* Fix deserialization of 401/403 errors
([#758](#758)).
* Use correct optional typing in `WorkspaceClient` for `mypy`
([#760](#760)).
aravind-segu pushed a commit to aravind-segu/databricks-sdk-py that referenced this pull request Sep 18, 2024
### New Features and Improvements

* Support Models in `dbutils.fs` operations
([databricks#750](databricks#750)).


### Bug Fixes

* Do not specify --tenant flag when fetching managed identity access
token from the CLI
([databricks#748](databricks#748)).
* Fix deserialization of 401/403 errors
([databricks#758](databricks#758)).
* Use correct optional typing in `WorkspaceClient` for `mypy`
([databricks#760](databricks#760)).
shichengzhou-db added a commit to shichengzhou-db/databricks-sdk-py that referenced this pull request Oct 3, 2024
github-merge-queue bot pushed a commit that referenced this pull request Oct 7, 2024
)

This reverts commit 3162545.
Verified that /Models download still work correctly.

## Changes
<!-- Summary of your changes that are easy to understand -->

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
parthban-db added a commit that referenced this pull request Oct 7, 2024
### Bug Fixes

 * Fix Model Serving Test ([#781](#781)).
 * Include package name for external types when deserializing responses ([#786](#786)).

### Internal Changes

 * Refactor ApiClient into `_BaseClient` and `ApiClient` ([#785](#785)).
 * Update to latest OpenAPI spec ([#787](#787)).
 * revert Support Models in `dbutils.fs` operations ([#750](#750)) ([#778](#778)).

### API Changes:

 * Added [w.disable_legacy_dbfs](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/disable_legacy_dbfs.html) workspace-level service.
 * Added `default_source_code_path` and `resources` fields for `databricks.sdk.service.apps.App`.
 * Added `resources` field for `databricks.sdk.service.apps.CreateAppRequest`.
 * Added `resources` field for `databricks.sdk.service.apps.UpdateAppRequest`.

OpenAPI SHA: bc17b474818138f19b78a7bea0675707dead2b87, Date: 2024-10-07
github-merge-queue bot pushed a commit that referenced this pull request Oct 7, 2024
### Bug Fixes

* Fix Model Serving Test
([#781](#781)).
* Include package name for external types when deserializing responses
([#786](#786)).


### Internal Changes

* Refactor ApiClient into `_BaseClient` and `ApiClient`
([#785](#785)).
* Update to latest OpenAPI spec
([#787](#787)).
* revert Support Models in `dbutils.fs` operations
([#750](#750))
([#778](#778)).


### API Changes:

* Added
[w.disable_legacy_dbfs](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/disable_legacy_dbfs.html)
workspace-level service.
* Added `default_source_code_path` and `resources` fields for
`databricks.sdk.service.apps.App`.
* Added `resources` field for
`databricks.sdk.service.apps.CreateAppRequest`.
* Added `resources` field for
`databricks.sdk.service.apps.UpdateAppRequest`.

OpenAPI SHA: bc17b474818138f19b78a7bea0675707dead2b87, Date: 2024-10-07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants