Release v0.34.0 · databrickslabs/ucx@cd61b46

Commit

Release v0.34.0
* Added a check for No isolation shared clusters and MLR ([#2484](#2484)). This change introduces a check for `No Isolation shared clusters` using Machine Learning Runtime (MLR) in the assessment workflow and cluster crawler, resolving issue [#846](#846). A new function has been added to verify if a Spark version is an MLR and another function checks for No Isolation shared clusters with MLR. The `_check_cluster_failures` method now includes an additional condition to verify if the data security mode of the cluster is set to NONE and if the cluster has MLR enabled. New unit tests have been added to ensure the proper functioning of the new feature. The Assessment workflow and cluster crawler have been modified to include the new checks. No new documentation, CLI command, or tables have been added as part of this change.
* Added a section in migration dashboard to list the failed tables, etc ([#2406](#2406)). In this enhancement, we have added a `failed-to-migrate` warning message to notify users when specific table migration and ACL migration operations in `table_migrate.py` fail. This change is part of the resolution of issue [#1754](#1754) and includes modifications to existing workflows. We have also introduced a new SQL file, `05_1_failed_table_migration.sql`, located in `src/databricks/labs/ucx/queries/migration/main/`, which lists the failed tables during a migration process. The SQL file contains a query that retrieves messages from the `inventory.logs` table indicating a failed migration and displays the relevant message. The changes have been manually tested on a staging environment, ensuring reliable and consistent performance. No new commands, tables, or user documentation have been added as part of this update.
* Added clean up activities when `migrate-credentials` cmd fails intermittently ([#2479](#2479)). This pull request includes enhancements to the `databricks/labs/ucx` module, specifically for Azure management functionalities. The primary changes involve improving error handling during the creation of access connectors and storage credentials for storage accounts. Two new methods, `delete_storage_credentials` and `delete_access_connectors`, have been introduced to handle deletion of resources when errors occur. Additionally, the `migrate-credentials` command has been improved to ensure that any created but unvalidated access connectors and storage credentials are deleted when intermittent failures happen. A new `delete_access_connector` method has been added to the `resources.py` file to improve cleanup activities. Moreover, the test suite has been updated with new test cases and fixtures to enhance Azure-specific migration functionality test coverage. These changes aim to increase the overall robustness and consistency of the system when handling failures and to avoid leaving orphaned resources in the Azure environment.
* Added standalone migrate ACLs ([#2284](#2284)). In this commit, the team has added a new `migrate-acls` command to the `labs.yml` file, which allows for the migration of Access Control Lists (ACLs) from a legacy metastore to a Unity Catalog (UC) metastore. The command includes optional flags for specifying the target catalog and handling HMS-FED ACLs. Additionally, the `migrate-dbsql-dashboards` command has been updated to include a new flag for specifying the target workspace ID. A new `ACLMigrator` class has been introduced to manage the migration of ACLs for tables and databases, and a new `migrate_acls` command has been added to the `cli.py` file for migrating ACLs in table migration scenarios involving HMS federation. The commit also includes a new test file, `test_migrate_acls.py`, with unit tests for the `migrate_acls` function in the `hive_metastore` package, as well as a new test function, `test_migrate_acls_should_produce_proper_queries`, for testing the behavior of the `migrate_acls` function and ensuring that it produces the correct SQL queries.
* Appends metastore_id or location_name to roles for uniqueness ([#2471](#2471)). In this release, we have introduced a new method `_generate_role_name` in the `access.py` module of the AWS package to generate a unique role name for the `create-missing-principals` functionality. This addresses the issue [#233](#233)
* Cache workspace content ([#2497](#2497)). This commit introduces a caching mechanism for workspace content to improve load times and bypass rate limits, implemented through the new `WorkspaceCache` class which stores cached instances of various objects. The `_CachedPath` class, a subclass of `WorkspacePath`, is used to cache the content of the workspace path using an LRU cache. New classes and methods, such as `_CachedIO`, `_PathLruCache`, and `WorkspaceCache.get_path()`, have been added for handling caching of input/output operations. The `TaskRunner` class has been updated to use `WorkspaceCache` to retrieve workspace paths. Unit tests have been added to verify the functionality of the new caching mechanism. This change is expected to significantly improve the user experience by reducing the time taken to load workspace content.
* Changes the security mode for assessment cluster ([#2472](#2472)). In this update, we have enhanced the security of the `main` cluster for assessment jobs in the open-source library. We have modified the `_job_clusters` function in the `workflows.py` file, changing the `data_security_mode` parameter in the `compute.ClusterSpec` constructor from LEGACY_SINGLE_USER to LEGACY_SINGLE_USER_STANDARD using the `databricks labs ucx` command line interface. This change resolves issue [#1717](#1717) and disables passthrough in the LEGACY_SINGLE_USER_STANDARD mode. The modification has been manually tested to ensure correct implementation. This improvement is backward-compatible, as it modifies existing functionality without introducing new methods. Software engineers familiar with the project can conveniently adopt this change, which strengthens security settings for the assessment `main` cluster.
* Do not normalize cases when reformatting SQL queries in CI check ([#2495](#2495)). In this release, we have made a modification to the `Reformat SQL queries` job in our continuous integration (CI) workflow to address case normalization issues that were causing blocks. The `databricks labs lsql fmt` command has been updated to include the `--normalize-case false` flag, which prevents case normalization during query reformatting. This change ensures that case-sensitive columns are not altered during the reformatting process, thus avoiding CI blockages. A specific example of this change can be seen in the SQL query used for cluster summary assessment in the interactive mode, where the case sensitivity of the `cluster_name` and `cluster_id` columns has been preserved. This modification enhances the adoption experience for software engineers working with the project by ensuring that SQL queries are formatted without altering case-sensitive elements, enabling a smoother CI check process. No new methods have been added, and existing functionality has only been changed to exclude case normalization.
* Drop source table after successful table move not before ([#2430](#2430)). A fix has been implemented to address an issue where the source table was unintentionally dropped before a new table could be successfully created during a table move operation in the Hive metastore. This resulted in the process failing with a [DELTA_CREATE_TABLE_WITH_DIFFERENT_PROPERTY] error. To resolve this, the source table is now dropped after the new table creation, preserving the source table in case of any issues. This change includes an updated order of dropping and creating the table within the `_recreate_table` method, as well as an added integration test (test_move_tables_table_properties_mismatch_preserves_original) to simulate table move with mismatched table properties and verify if the original table is preserved. The test uses the TableMove class and creates catalogs, schemas, tables, and access groups to perform the test. The import section has been updated to include pytest and BadRequest from the sdk.errors module.
* Enabled `principal-prefix-access` command to run as collection ([#2450](#2450)). In this release, we have introduced several enhancements to our open-source library aimed at improving functionality and maintainability for software engineers. The `principal-prefix-access` command can now be run as a collection, allowing for more flexible and efficient execution. We have also introduced a new `get_workspace_context` function that consolidates common functionalities, simplifying the codebase and improving maintainability. Additionally, we have added a `run-as-collection` flag to the `aws-subscription-scan` command, allowing users to specify whether to run the command as a collection or not. The `create-missing-principals` command for AWS has also been improved to identify all S3 locations missing a UC-compatible role more effectively. Overall, these changes enable the `principal-prefix-access` command to run as a collection, enhance the functionality of the codebase, and make it easier for users to manage their AWS subscriptions and identities.
* Fixed Driver OOM error by increasing the min memory requirement for node from 16GB to 32 GB ([#2473](#2473)). In this update, the `policy.py` file in the `databricks/labs/ucx/installer` directory has been modified to increase the minimum memory requirement for the node type from 16 GB to 32 GB. This change is intended to prevent driver crashes during assessment runs by providing additional memory for the workflow job. The function `_definition` has been updated to incorporate the new minimum memory requirement in the `node_type_id` configuration. No new methods have been added, and the existing functionality remains unchanged beyond the updated memory requirement.
* Fixed issue when running create-missing-credential cmd tries to create the role again if already created ([#2456](#2456)). A modification has been implemented in the `list_uc_roles` method within the `access.py` file of the `databricks/labs/ucx/aws` directory to address an issue with the `create-missing-credential` command. Previously, the command would try to recreate a role even if it had already been created due to a mismatch in the comparison of the `location` attribute of an `external_location` object with the `resource_path` attribute of a `role` object. This comparison has now been updated to use the `startswith` method instead of the `match` method. An early return has also been added if there are no missing paths to avoid unnecessary processing. These changes resolve issue [#2413](#2413) and have been tested through unit tests. However, there is no mention of integration tests or staging environment verifications, and more information about the testing performed would be helpful to ensure the changes are functioning as intended. No user documentation, CLI commands, workflows, or tables have been added, modified, or removed as part of this change.
* Fixed issue with Interactive Dashboard not showing output ([#2476](#2476)). In this release, we have resolved an issue in the Interactive Dashboard feature where the output was not being displayed due to a case sensitivity bug in an SQL query. The query was incorrectly using `request_params.clusterid` instead of `request_params.clusterId` in the join, select, and having clauses, causing no output to be displayed. We have fixed this issue by changing the lowercase `d` to an uppercase `I` in the affected clauses. This change is limited to the SQL file used in the Interactive Dashboard feature and does not affect other parts of the system. The changes have been manually tested, but no unit or integration tests have been mentioned. Additionally, there is a modification to the test case for selecting node type in the `test_job_cluster_policy` function, which now selects a node type with min_memory_gb as 32 instead of 16. No new documentation, CLI commands, workflows, tables, or existing functionalities have been changed in this release.
* Fixed support for table/schema scope for the revert table cli command ([#2428](#2428)). The recent change to the open-source library includes an update to the `revert table` CLI command to support table/schema scoping. This modification allows users to specify a schema and table for reverting migrations, with a default value of None. The `revert_migrated_tables` function and `print_revert_report` method have been updated to include schema and table as keyword arguments with a default value of None, enabling more precise control over the revert operation and preventing unintended actions. Additionally, a new dictionary, 'reverse_seen', has been implemented to store the mapping from original table keys to new keys after migration, improving support for reverting table migrations in specific schemas. The `print_revert_report` method now accepts optional `schema` and `table` parameters for filtering the report based on a specific schema and/or table, enhancing the overall user experience.
* Make lint log references clickable ([#2474](#2474)). This commit introduces a significant enhancement to the lint log references, making them clickable for easier navigation and reference. By updating the format of the message relative to method, a leading "." has been added to the path to make it relative to the current directory, addressing issue [#2474](#2474) and also closing issue [#2408](#2408). The message relative to method has been modified to include the path, start line, start column, code, and message of the lint advice in the format "<path>:<line>:<column>: [<code>] <message>". The Advisory class remains unchanged. This improvement simplifies the process of navigating and referencing lint logs, ultimately providing a better user experience.
* Refactor view sequencing and return sequenced views if recursion is found ([#2499](#2499)). In this release, the `_migrate_views` function in the `table_migrate.py` file has been refactored to improve view sequencing during table migrations, resolving issue [#2494](#2494). The `ViewsMigrationSequencer` object now takes a `migration_index` parameter and returns sequenced views if recursion is found during the migration process. The `table-migration` workflow has been modified to sequence views based on their dependencies, and a new `sql_migrate_view` method has been added to the `ViewToMigrate` class. Unit tests have been updated to reflect these changes and include a variety of scenarios, such as empty sequences, direct and indirect views, deep indirect views, and invalid view queries. Additionally, several fixtures have been added to simplify view migration testing. This refactoring enhances view migration functionality, making it more robust and flexible.
* Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 ([#2489](#2489)). In this pull request, we update the requirement on the `databricks-labs-lsql` package from a version greater than or equal to 0.5 and less than 0.9, to a version greater than or equal to 0.5 and less than 0.10. This update allows for the use of the latest version of the package and avoids any potential compatibility issues. The change includes updates to the requirements.txt file, and adds the `normalize-case` option to the `databricks labs lsql fmt` command, allowing users to control the normalization of query text to lowercase. The `deploy_dashboard` method has been removed and replaced with the `create` method of the `lakeview` attribute of the WorkspaceClient object. A new test function, 'test_dashboards_creates_dashboard_with_replace_database', has been added, which is currently marked to be skipped due to missing permissions to create a schema. Additionally, the project has been updated to use Databricks Python SDK version 0.30.0, with changes to the `execute` and `fetch_value` functions to use the new `StatementResponse` type instead of 'ExecuteStatementResponse'. Please refer to the release notes and changelog for the `databricks-labs-lsql` package version 0.9.0 for more information on the changes.
* Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 ([#2417](#2417)). In this update, the requirement for the `databricks-sdk` package has been changed from '~=0.29.0' to '>=0.29,<0.31', allowing for the use of the latest version of the package while ensuring compatibility. The new version includes features such as DataPlane support, partner support in the SDK, and various bug fixes and improvements. The update also includes changes to the 'redash.py' file, modifying import statements and updating the `Query` class to use the new `LegacyQuery` class. Additionally, there have been changes to the unit tests in the 'test_access.py' file to ensure compatibility with the updated package. The commit includes release notes, a changelog, and a list of commits for the updated version.
* Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 ([#2431](#2431)). In this pull request, we have updated the version range of the sqlglot dependency in the pyproject.toml file from >=25.5.0,<25.12 to >=25.5.0,<25.13. This change allows us to use the latest version of the sqlglot library, which provides parsing and analysis functionality for SQL code, while also specifying a maximum version to avoid any potential breaking changes that may be introduced in future releases. By keeping our dependencies up-to-date, we can ensure that our project is making use of the latest features and bug fixes, and is compatible with the most recent versions of other libraries and tools.
* Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 ([#2453](#2453)). In this pull request, we have updated the version range constraint for the sqlglot requirement in the pyproject.toml file, from v25.5.0 to v25.14.9999, to allow for the latest version of sqlglot (v25.14.0). This update includes several bug fixes and new features related to ClickHouse support and optimizer functionality. However, it also includes a couple of breaking changes related to schema and database substitution, and nullable comparison in is_type. Therefore, it is crucial to thoroughly test your codebase to ensure compatibility with the new version of sqlglot. The previous constraint allowed for versions between 25.5.0 and 25.12.9999, but we have updated it to 25.14.9999 to accommodate the latest version of sqlglot and its new features.
* Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 ([#2480](#2480)). In this update, we are upgrading the `sqlglot` dependency in our `pyproject.toml` file from version `>=25.5.0,<25.15` to `>=25.5.0,<25.17`. This change resolves issues [#2452](#2452) and [#2451](#2451), which were caused by bugs in the previous version of `sqlglot`. `sqlglot` is a library used for parsing, analyzing, and rewriting SQL queries. The new version of `sqlglot` includes bug fixes, new features, and some breaking changes. The specific details of these changes can be found in the commit history. Once this pull request is merged and the new version of `sqlglot` is installed, the aforementioned issues should be resolved.
* Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 ([#2488](#2488)). In this update, we have modified the requirements for the sqlglot library to allow the most recent version. The previous requirement of '<25.17,>=25.5.0' has been changed to '>=25.5.0,<25.18'. This change permits the adoption of the latest improvements and bug fixes made to the sqlglot library. The commit message includes a reference to the sqlglot changelog and a list of commits since the last permitted version. As a software engineer implementing this update, you can be assured of the compatibility of the project with the newest version of sqlglot, including all the added enhancements and bug fixes.
* Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 ([#2509](#2509)). In the latest update, the sqlglot dependency has been upgraded to version 25.18.1, introducing several new features and addressing a variety of issues. The new version includes support for the IS JSON predicate in PostgreSQL, the GLOB table function in DuckDB, and the table statement in INSERT for Spark. Several bug fixes are also included, such as improvements to the SQLite IS parser, proper handling of LTRIM/RTRIM usage in Oracle, and fixes for DIV0 case handling in Snowflake. Additionally, there are changes to the default naming of STRUCT fields in Spark and a fix in the binding of TABLESAMPLE to exp.Subquery instead of the top-level exp.Select. This release aims to improve the overall functionality and reliability of the library for software engineers working with various SQL databases.
* [chore] make `GRANT` migration logic isolated to `MigrateGrants` component ([#2492](#2492)). In this release, the `MigrateGrants` component has been introduced to handle the migration logic related to grants, improving code organization and maintainability. This component is responsible for applying grants to a Unity Catalog (UC) table based on a given source table, using instances of `SqlBackend`, `GroupManager`, and a list of grant loaders. The `ACLMigrator` class has also been introduced, which takes instances of `TablesCrawler`, `WorkspaceInfo`, `MigrationStatusRefresher`, and `MigrateGrants`, and applies ACLs to the migrated tables in the given catalog. The `Mapping` class has been updated to include return type annotations of `str` for the `as_uc_table_key` and `as_hms_table_key` methods. Additionally, the `migrate_tables` method in the `TableMigration` class has been modified to remove the `acl_strategy` parameter in several methods and instead, interacts with the new `MigrateGrants` component, reducing code duplication and simplifying the implementation of the `migrate_tables` method. The `PrincipalACL` class has been removed, and the `MigrateGrants` class has been introduced, which handles the remapping of group names and the migration of grants.

Dependency updates:

 * Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 ([#2417](#2417)).
 * Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 ([#2431](#2431)).
 * Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 ([#2453](#2453)).
 * Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 ([#2480](#2480)).
 * Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 ([#2489](#2489)).
 * Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 ([#2488](#2488)).
 * Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 ([#2509](#2509)).
Loading branch information
nfx committed Aug 30, 2024
1 parent cedad88 commit cd61b46
0 comments on commit `cd61b46`

Please sign in to comment.
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `cd61b46`

Commit

There are no files selected for viewing

0 comments on commit cd61b46

0 comments on commit `cd61b46`