Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update databricks-sdk requirement from ~=0.37.0 to >=0.37,<0.39 #3329

Closed

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Nov 18, 2024

Updates the requirements on databricks-sdk to permit the latest version.

Changelog

Sourced from databricks-sdk's changelog.

Version changelog

[Release] Release v0.37.0

Bug Fixes

  • Correctly generate classes with nested body fields (#808).

Internal Changes

  • Add cleanrooms package (#806).
  • Add test instructions for external contributors (#804).
  • Always write message for manual test execution (#811).
  • Automatically trigger integration tests on PR (#800).
  • Better isolate ML serving auth unit tests (#803).
  • Move templates in the code generator (#809).

API Changes:

  • Added w.aibi_dashboard_embedding_access_policy workspace-level service and w.aibi_dashboard_embedding_approved_domains workspace-level service.
  • Added w.credentials workspace-level service.
  • Added app_deployment field for databricks.sdk.service.apps.CreateAppDeploymentRequest.
  • Added app field for databricks.sdk.service.apps.CreateAppRequest.
  • Added app field for databricks.sdk.service.apps.UpdateAppRequest.
  • Added table field for databricks.sdk.service.catalog.CreateOnlineTableRequest.
  • Added azure_aad field for databricks.sdk.service.catalog.GenerateTemporaryTableCredentialResponse.
  • Added full_name field for databricks.sdk.service.catalog.StorageCredentialInfo.
  • Added dashboard field for databricks.sdk.service.dashboards.CreateDashboardRequest.
  • Added schedule field for databricks.sdk.service.dashboards.CreateScheduleRequest.
  • Added subscription field for databricks.sdk.service.dashboards.CreateSubscriptionRequest.
  • Added warehouse_id field for databricks.sdk.service.dashboards.Schedule.
  • Added dashboard field for databricks.sdk.service.dashboards.UpdateDashboardRequest.
  • Added schedule field for databricks.sdk.service.dashboards.UpdateScheduleRequest.
  • Added page_token field for databricks.sdk.service.oauth2.ListServicePrincipalSecretsRequest.
  • Added next_page_token field for databricks.sdk.service.oauth2.ListServicePrincipalSecretsResponse.
  • Added connection_name field for databricks.sdk.service.pipelines.IngestionGatewayPipelineDefinition.
  • Added is_no_public_ip_enabled field for databricks.sdk.service.provisioning.CreateWorkspaceRequest.
  • Added external_customer_info and is_no_public_ip_enabled fields for databricks.sdk.service.provisioning.Workspace.
  • Added last_used_day field for databricks.sdk.service.settings.TokenInfo.
  • Changed create() method for w.apps workspace-level service with new required argument order.
  • Changed execute_message_query() method for w.genie workspace-level service . New request type is databricks.sdk.service.dashboards.GenieExecuteMessageQueryRequest dataclass.
  • Changed execute_message_query() method for w.genie workspace-level service to type execute_message_query() method for w.genie workspace-level service.
  • Changed create(), create_schedule(), create_subscription() and update_schedule() methods for w.lakeview workspace-level service with new required argument order.
  • Removed w.clean_rooms workspace-level service.
  • Removed deployment_id, mode and source_code_path fields for databricks.sdk.service.apps.CreateAppDeploymentRequest.
  • Removed description, name and resources fields for databricks.sdk.service.apps.CreateAppRequest.
  • Removed description and resources fields for databricks.sdk.service.apps.UpdateAppRequest.
  • Removed name and spec fields for databricks.sdk.service.catalog.CreateOnlineTableRequest.

... (truncated)

Commits
  • 197b5f9 [Internal] Bump release number to 0.38.0 (#828)
  • d516d1e [Release] Release v0.38.0 (#826)
  • e8b7916 [Fix] Rewind seekable streams before retrying (#821)
  • ee6e70a [Internal] Reformat SDK with YAPF 0.43. (#822)
  • 271502b [Internal] Update Jobs GetRun API to support paginated responses for jobs and...
  • 2143e35 [Feature] Read streams by 1MB chunks by default. (#817)
  • f7f9a68 [Internal] Update PR template (#814)
  • See full diff in compare view

Most Recent Ignore Conditions Applied to This Pull Request
Dependency Name Ignore Conditions
databricks-sdk [>= 0.25.a, < 0.26]

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [databricks-sdk](https://github.com/databricks/databricks-sdk-py) to permit the latest version.
- [Release notes](https://github.com/databricks/databricks-sdk-py/releases)
- [Changelog](https://github.com/databricks/databricks-sdk-py/blob/main/CHANGELOG.md)
- [Commits](databricks/databricks-sdk-py@v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: databricks-sdk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot requested a review from a team as a code owner November 18, 2024 16:34
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Nov 18, 2024
@JCZuurmond JCZuurmond self-requested a review November 19, 2024 08:58
@JCZuurmond
Copy link
Member

Prefer #3332 to run CI

@JCZuurmond JCZuurmond closed this Nov 19, 2024
Copy link
Contributor Author

dependabot bot commented on behalf of github Nov 19, 2024

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot bot deleted the dependabot/pip/databricks-sdk-gte-0.37-and-lt-0.39 branch November 19, 2024 09:49
github-merge-queue bot pushed a commit that referenced this pull request Nov 20, 2024
Copy of #3329 as @dependabot cannot run the integration CI, thus
blocking merging that PR

Updates the requirements on
[databricks-sdk](https://github.com/databricks/databricks-sdk-py) to
permit the latest version.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/databricks/databricks-sdk-py/blob/main/CHANGELOG.md">databricks-sdk's
changelog</a>.</em></p>
<blockquote>
<h1>Version changelog</h1>
<h2>[Release] Release v0.37.0</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Correctly generate classes with nested body fields (<a
href="https://github.com/databricks/databricks-sdk-py/pull/808">#808</a>).</li>
</ul>
<h3>Internal Changes</h3>
<ul>
<li>Add <code>cleanrooms</code> package (<a
href="https://github.com/databricks/databricks-sdk-py/pull/806">#806</a>).</li>
<li>Add test instructions for external contributors (<a
href="https://github.com/databricks/databricks-sdk-py/pull/804">#804</a>).</li>
<li>Always write message for manual test execution (<a
href="https://github.com/databricks/databricks-sdk-py/pull/811">#811</a>).</li>
<li>Automatically trigger integration tests on PR (<a
href="https://github.com/databricks/databricks-sdk-py/pull/800">#800</a>).</li>
<li>Better isolate ML serving auth unit tests (<a
href="https://github.com/databricks/databricks-sdk-py/pull/803">#803</a>).</li>
<li>Move templates in the code generator (<a
href="https://github.com/databricks/databricks-sdk-py/pull/809">#809</a>).</li>
</ul>
<h3>API Changes:</h3>
<ul>
<li>Added <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/aibi_dashboard_embedding_access_policy.html">w.aibi_dashboard_embedding_access_policy</a>
workspace-level service and <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/aibi_dashboard_embedding_approved_domains.html">w.aibi_dashboard_embedding_approved_domains</a>
workspace-level service.</li>
<li>Added <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/credentials.html">w.credentials</a>
workspace-level service.</li>
<li>Added <code>app_deployment</code> field for
<code>databricks.sdk.service.apps.CreateAppDeploymentRequest</code>.</li>
<li>Added <code>app</code> field for
<code>databricks.sdk.service.apps.CreateAppRequest</code>.</li>
<li>Added <code>app</code> field for
<code>databricks.sdk.service.apps.UpdateAppRequest</code>.</li>
<li>Added <code>table</code> field for
<code>databricks.sdk.service.catalog.CreateOnlineTableRequest</code>.</li>
<li>Added <code>azure_aad</code> field for
<code>databricks.sdk.service.catalog.GenerateTemporaryTableCredentialResponse</code>.</li>
<li>Added <code>full_name</code> field for
<code>databricks.sdk.service.catalog.StorageCredentialInfo</code>.</li>
<li>Added <code>dashboard</code> field for
<code>databricks.sdk.service.dashboards.CreateDashboardRequest</code>.</li>
<li>Added <code>schedule</code> field for
<code>databricks.sdk.service.dashboards.CreateScheduleRequest</code>.</li>
<li>Added <code>subscription</code> field for
<code>databricks.sdk.service.dashboards.CreateSubscriptionRequest</code>.</li>
<li>Added <code>warehouse_id</code> field for
<code>databricks.sdk.service.dashboards.Schedule</code>.</li>
<li>Added <code>dashboard</code> field for
<code>databricks.sdk.service.dashboards.UpdateDashboardRequest</code>.</li>
<li>Added <code>schedule</code> field for
<code>databricks.sdk.service.dashboards.UpdateScheduleRequest</code>.</li>
<li>Added <code>page_token</code> field for
<code>databricks.sdk.service.oauth2.ListServicePrincipalSecretsRequest</code>.</li>
<li>Added <code>next_page_token</code> field for
<code>databricks.sdk.service.oauth2.ListServicePrincipalSecretsResponse</code>.</li>
<li>Added <code>connection_name</code> field for
<code>databricks.sdk.service.pipelines.IngestionGatewayPipelineDefinition</code>.</li>
<li>Added <code>is_no_public_ip_enabled</code> field for
<code>databricks.sdk.service.provisioning.CreateWorkspaceRequest</code>.</li>
<li>Added <code>external_customer_info</code> and
<code>is_no_public_ip_enabled</code> fields for
<code>databricks.sdk.service.provisioning.Workspace</code>.</li>
<li>Added <code>last_used_day</code> field for
<code>databricks.sdk.service.settings.TokenInfo</code>.</li>
<li>Changed <code>create()</code> method for <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/apps.html">w.apps</a>
workspace-level service with new required argument order.</li>
<li>Changed <code>execute_message_query()</code> method for <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/genie.html">w.genie</a>
workspace-level service . New request type is
<code>databricks.sdk.service.dashboards.GenieExecuteMessageQueryRequest</code>
dataclass.</li>
<li>Changed <code>execute_message_query()</code> method for <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/genie.html">w.genie</a>
workspace-level service to type <code>execute_message_query()</code>
method for <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/genie.html">w.genie</a>
workspace-level service.</li>
<li>Changed <code>create()</code>, <code>create_schedule()</code>,
<code>create_subscription()</code> and <code>update_schedule()</code>
methods for <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/lakeview.html">w.lakeview</a>
workspace-level service with new required argument order.</li>
<li>Removed <a
href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/clean_rooms.html">w.clean_rooms</a>
workspace-level service.</li>
<li>Removed <code>deployment_id</code>, <code>mode</code> and
<code>source_code_path</code> fields for
<code>databricks.sdk.service.apps.CreateAppDeploymentRequest</code>.</li>
<li>Removed <code>description</code>, <code>name</code> and
<code>resources</code> fields for
<code>databricks.sdk.service.apps.CreateAppRequest</code>.</li>
<li>Removed <code>description</code> and <code>resources</code> fields
for <code>databricks.sdk.service.apps.UpdateAppRequest</code>.</li>
<li>Removed <code>name</code> and <code>spec</code> fields for
<code>databricks.sdk.service.catalog.CreateOnlineTableRequest</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/197b5f9f3723158ea82389ed061329239c3fbdba"><code>197b5f9</code></a>
[Internal] Bump release number to 0.38.0 (<a
href="https://github.com/databricks/databricks-sdk-py/issues/828">#828</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/d516d1ee6239f78b097df5dd19452837e0375fed"><code>d516d1e</code></a>
[Release] Release v0.38.0 (<a
href="https://github.com/databricks/databricks-sdk-py/issues/826">#826</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/e8b79166503fc2daf80bc8d64df7802bb8705c0e"><code>e8b7916</code></a>
[Fix] Rewind seekable streams before retrying (<a
href="https://github.com/databricks/databricks-sdk-py/issues/821">#821</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/ee6e70a1e3a38b465405d41f39f4447ad7dd3090"><code>ee6e70a</code></a>
[Internal] Reformat SDK with YAPF 0.43. (<a
href="https://github.com/databricks/databricks-sdk-py/issues/822">#822</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/271502bd55e916f245c6e1563f1528938db0774e"><code>271502b</code></a>
[Internal] Update Jobs GetRun API to support paginated responses for
jobs and...</li>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/2143e35cab88e22c2d4e1d79be9dd09422b30983"><code>2143e35</code></a>
[Feature] Read streams by 1MB chunks by default. (<a
href="https://github.com/databricks/databricks-sdk-py/issues/817">#817</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sdk-py/commit/f7f9a685c0f11d3bac1ebfe9ef829b3d061f8501"><code>f7f9a68</code></a>
[Internal] Update PR template (<a
href="https://github.com/databricks/databricks-sdk-py/issues/814">#814</a>)</li>
<li>See full diff in <a
href="https://github.com/databricks/databricks-sdk-py/compare/v0.37.0...v0.38.0">compare
view</a></li>
</ul>
</details>
<br />

<details>
<summary>Most Recent Ignore Conditions Applied to This Pull
Request</summary>

| Dependency Name | Ignore Conditions |
| --- | --- |
| databricks-sdk | [>= 0.25.a, < 0.26] |
</details>


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
gueniai added a commit that referenced this pull request Dec 2, 2024
* Added `assign-owner-group` command ([#3111](#3111)). The Databricks Labs Unity Catalog Exporter (UCX) tool now includes a new `assign-owner-group` command, allowing users to assign an owner group to the workspace. This group will be designated as the owner for all migrated tables and views, providing better control and organization of resources. The command can be executed in the context of a specific workspace or across multiple workspaces. The implementation includes new classes, methods, and attributes in various files, such as `cli.py`, `config.py`, and `groups.py`, enhancing ownership management functionality. The `assign-owner-group` command replaces the functionality of issue [#3075](#3075) and addresses issue [#2890](#2890), ensuring proper schema ownership and handling of crawled grants. Developers should be aware that running the `migrate-tables` workflow will result in assigning a new owner group for the Hive Metastore instance in the workspace installation.
* Added `opencensus` to known list ([#3052](#3052)). In this release, we have added OpenCensus to the list of known libraries in our configuration file. OpenCensus is a popular set of tools for distributed tracing and monitoring, and its inclusion in our system will enhance support and integration for users who utilize this tool. This change does not affect existing functionality, but instead adds a new entry in the configuration file for OpenCensus. This enhancement will allow our library to better recognize and work with OpenCensus, enabling improved performance and functionality for our users.
* Added default owner group selection to the installer ([#3370](#3370)). A new class, AccountGroupLookup, has been added to the AccountGroupLookup module to select the default owner group during the installer process, addressing previous issue [#3111](#3111). This class uses the workspace_client to determine the owner group, and a pick_owner_group method to prompt the user for a selection if necessary. The ownership selection process has been improved with the addition of a check in the installer's `_static_owner` method to determine if the current user is part of the default owner group. The GroupManager class has been updated to use the new AccountGroupLookup class and its methods, `pick_owner_group` and `validate_owner_group`. A new variable, `default_owner_group`, is introduced in the ConfigureGroups class to configure groups during installation based on user input. The installer now includes a unit test, "test_configure_with_default_owner_group", to demonstrate how it sets expected workspace configuration values when a default owner group is specified during installation.
* Added handling for non UTF-8 encoded notebook error explicitly ([#3376](#3376)). A new enhancement has been implemented to address the issue of non-UTF-8 encoded notebooks failing to load by introducing explicit error handling for this case. A UnicodeDecodeError exception is now caught and logged as a warning, while the notebook is skipped and returned as None. This change is implemented in the load_dependency method in the loaders.py file, which is a part of the assessment workflow. Additionally, a new unit test has been added to verify the behavior of this change, and the assessment workflow has been updated accordingly. The new test function in test_loaders.py checks for different types of exceptions, specifically PermissionError and UnicodeDecodeError, ensuring that the system can handle notebooks with non-UTF-8 encoding gracefully. This enhancement resolves issue [#3374](#3374), thereby improving the overall robustness of the application.
* Added migration progress documentation ([#3333](#3333)). In this release, we have updated the `migration-progress-experimental` workflow to track the migration progress of a subset of inventory tables related to workspace resources being migrated to Unity Catalog (UCX). The workflow updates the inventory tables and tracks the migration progress in the UCX catalog tables. To use this workflow, users must attach a UC metastore to the workspace, create a UCX catalog, and ensure that the assessment job has run successfully. The `Migration Progress` section in the documentation has been updated with a new markdown file that provides details about the migration progress, including a migration progress dashboard and an experimental migration progress workflow that generates historical records of inventory objects relevant to the migration progress. These records are stored in the UCX UC catalog, which contains a historical table with information about the object type, object ID, data, failures, owner, and UCX version. The migration process also tracks dangling Hive or workspace objects that are not referenced by business resources, and the progress is persisted in the UCX UC catalog, allowing for cross-workspace tracking of migration progress.
* Added note about running assessment once ([#3398](#3398)). In this release, we have introduced an update to the UCX assessment workflow, which will now only be executed once and will not update existing results in repeated runs. To accommodate this change, we have updated the README file with a note clarifying that the assessment workflow is a one-time process. Additionally, we have provided instructions on how to update the inventory and findings by uninstalling and reinstalling the UCX. This will ensure that the inventory and findings for a workspace are up-to-date and accurate. We recommend that software engineers take note of this change and follow the updated instructions when using the UCX assessment workflow.
* Allowing skipping TACLs migration during table migration ([#3384](#3384)). A new optional flag, "skip_tacl_migration", has been added to the configuration file, providing users with more flexibility during migration. This flag allows users to control whether or not to skip the Table Access Control Language (TACL) migration during table migrations. It can be set when creating catalogs and schemas, as well as when migrating tables or using the `migrate_grants` method in `application.py`. Additionally, the `install.py` file now includes a new variable, `skip_tacl_migration`, which can be set to `True` during the installation process to skip TACL migration. New test cases have been added to verify the functionality of skipping TACL migration during grants management and table migration. These changes enhance the flexibility of the system for users managing table migrations and TACL operations in their infrastructure, addressing issues [#3384](#3384) and [#3042](#3042).
* Bump `databricks-sdk` and `databricks-labs-lsql` dependencies ([#3332](#3332)). In this update, the `databricks-sdk` and `databricks-labs-lsql` dependencies are upgraded to versions 0.38 and 0.14.0, respectively. The `databricks-sdk` update addresses conflicts, bug fixes, and introduces new API additions and changes, notably impacting methods like `create()`, `execute_message_query()`, and others in workspace-level services. While `databricks-labs-lsql` updates ensure compatibility, its changelog and specific commits are not provided. This pull request also includes ignore conditions for the `databricks-sdk` dependency to prevent future Dependabot requests. It is strongly advised to rigorously test these updates to avoid any compatibility issues or breaking changes with the existing codebase. This pull request mirrors another ([#3329](#3329)), resolving integration CI issues that prevented the original from merging.
* Explain failures when cluster encounters Py4J error ([#3318](#3318)). In this release, we have made significant improvements to the error handling mechanism in our open-source library. Specifically, we have addressed issue [#3318](#3318), which involved handling failures when the cluster encounters Py4J errors in the `databricks/labs/ucx/hive_metastore/tables.py` file. We have added code to raise noisy failures instead of swallowing the error with a warning when a Py4J error occurs. The functions `_all_databases()` and `_list_tables()` have been updated to check if the error message contains "py4j.security.Py4JSecurityException", and if so, log an error message with instructions to update or reinstall UCX. If the error message does not contain "py4j.security.Py4JSecurityException", the functions log a warning message and return an empty list. These changes also resolve the linked issue [#3271](#3271). The functionality has been thoroughly tested and verified on the labs environment. These improvements provide more informative error messages and enhance the overall reliability of our library.
* Rearranged job summary dashboard columns and make job_name clickable ([#3311](#3311)). In this update, the job summary dashboard columns have been improved and the need for the `30_3_job_details.sql` file, which contained a SQL query for selecting job details from the `inventory.jobs` table, has been eliminated. The dashboard columns have been rearranged, and the `job_name` column is now clickable, providing easy access to job details via the corresponding job ID. The changes include modifying the dashboard widget and adding new methods for making the `job_name` column clickable and linking it to the job ID. Additionally, the column titles have been updated to display more relevant information. These improvements have been manually tested and verified in a labs environment.
* Refactor refreshing of migration-status information for tables, eliminate another redundant refresh ([#3270](#3270)). This pull request refactors the way table records are enriched with migration-status information during encoding for the history log in the `migration-progress-experimental` workflow. It ensures that the refresh of migration-status information is explicit and under the control of the workflow, addressing a previously expressed intent. A redundant refresh of migration-status information has been eliminated and additional unit test coverage has been added to the `migration-progress-experimental` workflow. The changes include modifying the existing workflow, adding new methods for refreshing table migration status without updating the history log, and splitting the crawl and update-history-log tasks into three steps. The `TableMigrationStatusRefresher` class has been introduced to obtain the migration status of a table, and new tests have been added to ensure correctness, making the `migration-progress-experimental` workflow more efficient and reliable.
* Safe read files in more places ([#3394](#3394)). This release introduces significant improvements to file handling, addressing issue [#3386](#3386). A new function, `safe_read_text`, has been implemented for safe reading of files, catching and handling exceptions and returning None if reading fails. This function is utilized in the `is_a_notebook` function and replaces the existing `read_text` method in specific locations, enhancing error handling and robustness. The `databricks labs ucx lint-local-code` command and the `assessment` workflow have been updated accordingly. Additionally, new test files and methods have been added under the `tests/integration/source_code` directory to ensure comprehensive testing of file handling, including handling of unsupported file types, encoding checks, and ignorable files.
* Track `DirectFsAccess` on `JobsProgressEncoder` ([#3375](#3375)). In this release, the open-source library has been updated with new features related to tracking Direct File System Access (DirectFsAccess) in the JobsProgressEncoder. This change includes the addition of a new `_direct_fs_accesses` method, which detects direct filesystem access by code used in a job and generates corresponding failure messages. The DirectFsAccessCrawler object is used to crawl and track file system access for directories and queries, providing more detailed tracking and encoding of job progress. Additionally, new methods `make_job` and `make_dashboard` have been added to create instances of Job and Dashboard, respectively, and new unit and integration tests have been added to ensure the proper functionality of the updated code. These changes improve the functionality of JobsProgressEncoder by providing more comprehensive job progress information, making the code more modular and maintainable for easier management of jobs and dashboards. This release resolves issue [#3059](#3059) and enhances the tracking and encoding of job progress in the system, ensuring more comprehensive and accurate reporting of job status and issues.
* Track `UsedTables` on `TableProgressEncoder` ([#3373](#3373)). In this release, the tracking of `UsedTables` has been implemented on the `TableProgressEncoder` in the `tables_progress` function, addressing issue [#3061](#3061). The workflow `migration-progress-experimental` has been updated to incorporate this change. New objects, `self.used_tables_crawler_for_paths` and `self.used_tables_crawler_for_queries`, have been added as instances of a class responsible for crawling used tables. A `full_name` property has been introduced as a read-only attribute for a source code class, providing a more convenient way of accessing and manipulating the full name of the source code object. A new integration test for the `TableProgressEncoder` component has also been added, specifically testing table failure scenarios. The `TableProgressEncoder` class has been updated to track `UsedTables` using the `UsedTablesCrawler` class, and a new class, `UsedTable`, has been introduced to represent the catalog, schema, and table name of a table. Two new unit tests have been added to ensure the correct functionality of this feature.
@gueniai gueniai mentioned this pull request Dec 2, 2024
gueniai added a commit that referenced this pull request Dec 2, 2024
* Added `assign-owner-group` command
([#3111](#3111)). The
Databricks Labs Unity Catalog Exporter (UCX) tool now includes a new
`assign-owner-group` command, allowing users to assign an owner group to
the workspace. This group will be designated as the owner for all
migrated tables and views, providing better control and organization of
resources. The command can be executed in the context of a specific
workspace or across multiple workspaces. The implementation includes new
classes, methods, and attributes in various files, such as `cli.py`,
`config.py`, and `groups.py`, enhancing ownership management
functionality. The `assign-owner-group` command replaces the
functionality of issue
[#3075](#3075) and addresses
issue [#2890](#2890),
ensuring proper schema ownership and handling of crawled grants.
Developers should be aware that running the `migrate-tables` workflow
will result in assigning a new owner group for the Hive Metastore
instance in the workspace installation.
* Added `opencensus` to known list
([#3052](#3052)). In this
release, we have added OpenCensus to the list of known libraries in our
configuration file. OpenCensus is a popular set of tools for distributed
tracing and monitoring, and its inclusion in our system will enhance
support and integration for users who utilize this tool. This change
does not affect existing functionality, but instead adds a new entry in
the configuration file for OpenCensus. This enhancement will allow our
library to better recognize and work with OpenCensus, enabling improved
performance and functionality for our users.
* Added default owner group selection to the installer
([#3370](#3370)). A new
class, AccountGroupLookup, has been added to the AccountGroupLookup
module to select the default owner group during the installer process,
addressing previous issue
[#3111](#3111). This class
uses the workspace_client to determine the owner group, and a
pick_owner_group method to prompt the user for a selection if necessary.
The ownership selection process has been improved with the addition of a
check in the installer's `_static_owner` method to determine if the
current user is part of the default owner group. The GroupManager class
has been updated to use the new AccountGroupLookup class and its
methods, `pick_owner_group` and `validate_owner_group`. A new variable,
`default_owner_group`, is introduced in the ConfigureGroups class to
configure groups during installation based on user input. The installer
now includes a unit test, "test_configure_with_default_owner_group", to
demonstrate how it sets expected workspace configuration values when a
default owner group is specified during installation.
* Added handling for non UTF-8 encoded notebook error explicitly
([#3376](#3376)). A new
enhancement has been implemented to address the issue of non-UTF-8
encoded notebooks failing to load by introducing explicit error handling
for this case. A UnicodeDecodeError exception is now caught and logged
as a warning, while the notebook is skipped and returned as None. This
change is implemented in the load_dependency method in the loaders.py
file, which is a part of the assessment workflow. Additionally, a new
unit test has been added to verify the behavior of this change, and the
assessment workflow has been updated accordingly. The new test function
in test_loaders.py checks for different types of exceptions,
specifically PermissionError and UnicodeDecodeError, ensuring that the
system can handle notebooks with non-UTF-8 encoding gracefully. This
enhancement resolves issue
[#3374](#3374), thereby
improving the overall robustness of the application.
* Added migration progress documentation
([#3333](#3333)). In this
release, we have updated the `migration-progress-experimental` workflow
to track the migration progress of a subset of inventory tables related
to workspace resources being migrated to Unity Catalog (UCX). The
workflow updates the inventory tables and tracks the migration progress
in the UCX catalog tables. To use this workflow, users must attach a UC
metastore to the workspace, create a UCX catalog, and ensure that the
assessment job has run successfully. The `Migration Progress` section in
the documentation has been updated with a new markdown file that
provides details about the migration progress, including a migration
progress dashboard and an experimental migration progress workflow that
generates historical records of inventory objects relevant to the
migration progress. These records are stored in the UCX UC catalog,
which contains a historical table with information about the object
type, object ID, data, failures, owner, and UCX version. The migration
process also tracks dangling Hive or workspace objects that are not
referenced by business resources, and the progress is persisted in the
UCX UC catalog, allowing for cross-workspace tracking of migration
progress.
* Added note about running assessment once
([#3398](#3398)). In this
release, we have introduced an update to the UCX assessment workflow,
which will now only be executed once and will not update existing
results in repeated runs. To accommodate this change, we have updated
the README file with a note clarifying that the assessment workflow is a
one-time process. Additionally, we have provided instructions on how to
update the inventory and findings by uninstalling and reinstalling the
UCX. This will ensure that the inventory and findings for a workspace
are up-to-date and accurate. We recommend that software engineers take
note of this change and follow the updated instructions when using the
UCX assessment workflow.
* Allowing skipping TACLs migration during table migration
([#3384](#3384)). A new
optional flag, "skip_tacl_migration", has been added to the
configuration file, providing users with more flexibility during
migration. This flag allows users to control whether or not to skip the
Table Access Control Language (TACL) migration during table migrations.
It can be set when creating catalogs and schemas, as well as when
migrating tables or using the `migrate_grants` method in
`application.py`. Additionally, the `install.py` file now includes a new
variable, `skip_tacl_migration`, which can be set to `True` during the
installation process to skip TACL migration. New test cases have been
added to verify the functionality of skipping TACL migration during
grants management and table migration. These changes enhance the
flexibility of the system for users managing table migrations and TACL
operations in their infrastructure, addressing issues
[#3384](#3384) and
[#3042](#3042).
* Bump `databricks-sdk` and `databricks-labs-lsql` dependencies
([#3332](#3332)). In this
update, the `databricks-sdk` and `databricks-labs-lsql` dependencies are
upgraded to versions 0.38 and 0.14.0, respectively. The `databricks-sdk`
update addresses conflicts, bug fixes, and introduces new API additions
and changes, notably impacting methods like `create()`,
`execute_message_query()`, and others in workspace-level services. While
`databricks-labs-lsql` updates ensure compatibility, its changelog and
specific commits are not provided. This pull request also includes
ignore conditions for the `databricks-sdk` dependency to prevent future
Dependabot requests. It is strongly advised to rigorously test these
updates to avoid any compatibility issues or breaking changes with the
existing codebase. This pull request mirrors another
([#3329](#3329)), resolving
integration CI issues that prevented the original from merging.
* Explain failures when cluster encounters Py4J error
([#3318](#3318)). In this
release, we have made significant improvements to the error handling
mechanism in our open-source library. Specifically, we have addressed
issue [#3318](#3318), which
involved handling failures when the cluster encounters Py4J errors in
the `databricks/labs/ucx/hive_metastore/tables.py` file. We have added
code to raise noisy failures instead of swallowing the error with a
warning when a Py4J error occurs. The functions `_all_databases()` and
`_list_tables()` have been updated to check if the error message
contains "py4j.security.Py4JSecurityException", and if so, log an error
message with instructions to update or reinstall UCX. If the error
message does not contain "py4j.security.Py4JSecurityException", the
functions log a warning message and return an empty list. These changes
also resolve the linked issue
[#3271](#3271). The
functionality has been thoroughly tested and verified on the labs
environment. These improvements provide more informative error messages
and enhance the overall reliability of our library.
* Rearranged job summary dashboard columns and make job_name clickable
([#3311](#3311)). In this
update, the job summary dashboard columns have been improved and the
need for the `30_3_job_details.sql` file, which contained a SQL query
for selecting job details from the `inventory.jobs` table, has been
eliminated. The dashboard columns have been rearranged, and the
`job_name` column is now clickable, providing easy access to job details
via the corresponding job ID. The changes include modifying the
dashboard widget and adding new methods for making the `job_name` column
clickable and linking it to the job ID. Additionally, the column titles
have been updated to display more relevant information. These
improvements have been manually tested and verified in a labs
environment.
* Refactor refreshing of migration-status information for tables,
eliminate another redundant refresh
([#3270](#3270)). This pull
request refactors the way table records are enriched with
migration-status information during encoding for the history log in the
`migration-progress-experimental` workflow. It ensures that the refresh
of migration-status information is explicit and under the control of the
workflow, addressing a previously expressed intent. A redundant refresh
of migration-status information has been eliminated and additional unit
test coverage has been added to the `migration-progress-experimental`
workflow. The changes include modifying the existing workflow, adding
new methods for refreshing table migration status without updating the
history log, and splitting the crawl and update-history-log tasks into
three steps. The `TableMigrationStatusRefresher` class has been
introduced to obtain the migration status of a table, and new tests have
been added to ensure correctness, making the
`migration-progress-experimental` workflow more efficient and reliable.
* Safe read files in more places
([#3394](#3394)). This
release introduces significant improvements to file handling, addressing
issue [#3386](#3386). A new
function, `safe_read_text`, has been implemented for safe reading of
files, catching and handling exceptions and returning None if reading
fails. This function is utilized in the `is_a_notebook` function and
replaces the existing `read_text` method in specific locations,
enhancing error handling and robustness. The `databricks labs ucx
lint-local-code` command and the `assessment` workflow have been updated
accordingly. Additionally, new test files and methods have been added
under the `tests/integration/source_code` directory to ensure
comprehensive testing of file handling, including handling of
unsupported file types, encoding checks, and ignorable files.
* Track `DirectFsAccess` on `JobsProgressEncoder`
([#3375](#3375)). In this
release, the open-source library has been updated with new features
related to tracking Direct File System Access (DirectFsAccess) in the
JobsProgressEncoder. This change includes the addition of a new
`_direct_fs_accesses` method, which detects direct filesystem access by
code used in a job and generates corresponding failure messages. The
DirectFsAccessCrawler object is used to crawl and track file system
access for directories and queries, providing more detailed tracking and
encoding of job progress. Additionally, new methods `make_job` and
`make_dashboard` have been added to create instances of Job and
Dashboard, respectively, and new unit and integration tests have been
added to ensure the proper functionality of the updated code. These
changes improve the functionality of JobsProgressEncoder by providing
more comprehensive job progress information, making the code more
modular and maintainable for easier management of jobs and dashboards.
This release resolves issue
[#3059](#3059) and enhances
the tracking and encoding of job progress in the system, ensuring more
comprehensive and accurate reporting of job status and issues.
* Track `UsedTables` on `TableProgressEncoder`
([#3373](#3373)). In this
release, the tracking of `UsedTables` has been implemented on the
`TableProgressEncoder` in the `tables_progress` function, addressing
issue [#3061](#3061). The
workflow `migration-progress-experimental` has been updated to
incorporate this change. New objects,
`self.used_tables_crawler_for_paths` and
`self.used_tables_crawler_for_queries`, have been added as instances of
a class responsible for crawling used tables. A `full_name` property has
been introduced as a read-only attribute for a source code class,
providing a more convenient way of accessing and manipulating the full
name of the source code object. A new integration test for the
`TableProgressEncoder` component has also been added, specifically
testing table failure scenarios. The `TableProgressEncoder` class has
been updated to track `UsedTables` using the `UsedTablesCrawler` class,
and a new class, `UsedTable`, has been introduced to represent the
catalog, schema, and table name of a table. Two new unit tests have been
added to ensure the correct functionality of this feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant