Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect conflicts between workspaces in the account for groups and tables #299

Closed
wants to merge 1 commit into from

Conversation

nfx
Copy link
Collaborator

@nfx nfx commented Sep 26, 2023

Fixes #83
Fixes #22

@nfx nfx added to be discussed feat/account-level cross-workspace installations labels Sep 26, 2023
@codecov
Copy link

codecov bot commented Sep 26, 2023

Codecov Report

Attention: 111 lines in your changes are missing coverage. Please review.

Comparison is base (4fe6ed8) 83.27% compared to head (f910b1f) 79.24%.
Report is 122 commits behind head on main.

Files Patch % Lines
src/databricks/labs/ucx/account/conflicts.py 0.00% 94 Missing ⚠️
src/databricks/labs/ucx/workspace_access/groups.py 32.00% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #299      +/-   ##
==========================================
- Coverage   83.27%   79.24%   -4.03%     
==========================================
  Files          30       31       +1     
  Lines        2146     2265     +119     
  Branches      366      394      +28     
==========================================
+ Hits         1787     1795       +8     
- Misses        279      390     +111     
  Partials       80       80              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nfx
Copy link
Collaborator Author

nfx commented Sep 26, 2023

Depends on #288, #249,

@FastLee
Copy link
Contributor

FastLee commented Sep 27, 2023

Should we consider a "warehouse" for assessment/migration? Pulling from all workspaces?

@nfx
Copy link
Collaborator Author

nfx commented Sep 27, 2023

@FastLee so how warehouse should work across 1000 workspaces? Do we want to select a warehouse from each of those?.. what if warehouses have different names or not existing?

We also may need to create a "main workspace" for migration, but i am not sure if it's simpler than storing configuration on a laptop

@zpappa
Copy link

zpappa commented Sep 27, 2023

Customers will not love the idea of creating a workspace to run a tool, and no customer thus far has EVER upgraded all their workspaces at once. This will probably never happen.

Therefore, we should aim to make this chunkable using the workspace as a chunking point and we should follow simple and predictable logic when trying to reconcile workspace local groups with account groups, and document this so people know how it works.

Here are the scenarios that come to mind.

Scenario 1
Workspace Local SCIM is configured
Workspace Local Group Is an External Group
Workspace Local Group Does not exist in the account console
Result: Migrate workspace local group, notify user that this group needs to be added to account console SCIM connector

Scenario 2
Workspace Local SCIM is configured
Workspace Local Group is an External Group
Workspace Local Group exists in the account console and is an external group
Result: Do not migrate workspace local group, but use the account group that already exists

Scenario 3
Workspace Local SCIM is not configured
Workspace local group is not an external group
Workspace Local group does not exist in the account console
Result: Create the workspace local group in the account console with the same members

Scenario 4
Workspace Local SCIM is not configured
Workspace local group is not an external group
Workspace Local group exists in the account console and does not have the same members
Result: Create a new group with the workspace name as a prefix, and create in the account console

Scenario 5
Workspace Local SCIM is not configured
Workspace local group is not an external group
Workspace Local group exists in the account console and has the same members
Result: Use the existing group in the account console

@nfx
Copy link
Collaborator Author

nfx commented Sep 27, 2023

We have no way of knowing if SCIM sync is configured, because it's the external tool calling our APIs. Scim sync is something done by identity admins

@nfx
Copy link
Collaborator Author

nfx commented Sep 28, 2023

upgraded all their workspaces at once
The requirement is to run assessment job to collect metadata about HMS and flag inconveniences/conflicts. Then we collect accout/local groups and flag inconveniences/conflicts. It's an easier single ticket for Identity Engineering (or Administrative) team about making changes to Okta/Aad/Terraform for user management (i'd say, majority of UC target audience).

(identity admins) Create the workspace local group in the account console with the same members
I think that the single ticket to that team has to be as simple as "install Databricks CLI and run the 'databricks labs ucx report-groups' and modify okta/aad/tf to change groups according to output".

We need an issue for this scenario, please create.

(identity admins) Create the workspace local group in the account console with the same members
We need an issue for this scenario, please create. More than that - it's exactly the same scenario as "Create a new group with the workspace name as a prefix, and create in the account console", but without the need to create a prefix in most of the cases.

Use the existing group in the account console

Migration probably would happen in workspace-by-workspace job-in-UI triggering. We can automate that also, if customers asks/votes.

That's for groups.


We didn't cover service principals at all 😉 on aws, account-level and service level spns may have the same/purpose, but different application_id, that is recorded in both grants and generic permissions.


Now for tables, there also needs to be a report on table/db inconsistency - like
A: db1.tbl1, db1.tbl3
B: db1.tbl2

And the team(s) that are driving UC Migration within account would make a decision after some time in review (of excel spreadsheet). By the way, we can split UCX installation across different Azure Subscriptions. And every installation would just focus on defining target catalog mapping per database. But here are unanswered questions:

  • two workspaces, same dbs, all different tables and columns (all managed tables, effectively)
  • two workspaces, same dbs, 90% same tables, 10% are different tables
  • two workspaces, two different dbs

We can technically support both db_to_catalog and workspace_to_catalog, and even at the same time, but db_to_catalog will override workspace_to_catalog. We also need default_catalog_for_workspace, if workspace_to_catalog is set (default catalog for all workspaces is set per metastore)..

We can also do another override for tables, but we have unanswered questions:

  • what if same db, same workspace, same table, but different columns/order/types? Ignore and keep in hive metastore? And then rerun the scan for tables and grants?
  • what if during migration catalog/database/table were deleted either from hms and/or uc?

Speaking of metastores, in the beginning, there needs to be workspace_to_metastore mapping with default_metastore_for_workspace. Can we come up with a good default mapping here? Coarse or fine grained? Select between the two? Ask for inline input? How many conflicts we expect to justify the need to create/support custom mapping?

the last very important question is what future-proof configuration format might we need for this mapping. That's why I don't want any configuration after the assessment step.

@nfx
Copy link
Collaborator Author

nfx commented Sep 28, 2023

Workspace to metastore mapping can be solved with region:

We should ideally be checking if there is a metastore in region, and in region only to the workspace, not in general if there is one setup.

@pohlposition pohlposition added the step/assessment go/uc/upgrade - Assessment Step label Sep 28, 2023
@CLAassistant
Copy link

CLAassistant commented Nov 27, 2023

CLA assistant check
All committers have signed the CLA.

@nfx
Copy link
Collaborator Author

nfx commented Mar 4, 2024

stale PR, will need to start from scratch

@nfx nfx closed this Mar 4, 2024
@nfx nfx deleted the account/users branch March 4, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/account-level cross-workspace installations step/assessment go/uc/upgrade - Assessment Step
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Groups with same name across workspaces Add ability to compare workspace and account group size and members
5 participants