Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure fixed casing for object names in inventory tables. #288

Closed
larsgeorge-db opened this issue Sep 25, 2023 · 6 comments · Fixed by #684
Closed

Ensure fixed casing for object names in inventory tables. #288

larsgeorge-db opened this issue Sep 25, 2023 · 6 comments · Fixed by #684
Assignees
Labels
enhancement New feature or request step/assessment go/uc/upgrade - Assessment Step

Comments

@larsgeorge-db
Copy link
Contributor

Names like catalog, databases and tables etc. should be uppercased in tables.scala. This makes it easier to process them downstream in the jobs.

@larsgeorge-db larsgeorge-db added the enhancement New feature or request label Sep 25, 2023
@larsgeorge-db larsgeorge-db added this to the 1 month milestone Sep 25, 2023
@nfx
Copy link
Collaborator

nfx commented Sep 25, 2023

@larsgeorge-db maybe they should be lowercased?

@pohlposition pohlposition added the step/assessment go/uc/upgrade - Assessment Step label Sep 28, 2023
@nfx nfx modified the milestones: 1 month, 1 week Oct 2, 2023
@zpappa zpappa removed this from the 1 week milestone Oct 2, 2023
@nfx nfx added this to UCX Oct 3, 2023
@nfx nfx moved this to Triage in UCX Oct 3, 2023
@pohlposition
Copy link
Contributor

This is 2 weeks old.

Is this something we need to do? Do we have naming conventions for things like this?

@nfx nfx moved this from Triage to Active Backlog in UCX Oct 10, 2023
@larsgeorge-db
Copy link
Contributor Author

I do not think we do... I am actually looking at this and was asking myself the same question. Which column should be converted to what?

col_name data_type casing
catalog string lower
database string lower
name string lower
object_type string upper
table_format string upper
location string leave
view_text string leave
upgraded_to string leave

Looking at what we have so far, object_type is upper already, so matching table_format to it makes sense. Given SQL objects are treated case-insensitive by DBSQL we could lower case the names, making them easier on the eyes mostly. But, this does not apply to all RDBMSs, some are case-sensitive, like Oracle, so names of federated tables are likely sensitive too. Assuming we deal with the HMS side here, even a federated table would be in the connection/location details, so the case-insensitivity still applies here.

@larsgeorge-db
Copy link
Contributor Author

Looking at our dashboard queries, the only column we have to UPPER() is table_format. In the code we LOWER() the database and table names at a few places. Do we expect any impact on applying the above conversions directly while scanning HMS and taking stock of its data structures?

@HariGS-DB HariGS-DB self-assigned this Dec 4, 2023
@nfx
Copy link
Collaborator

nfx commented Dec 6, 2023

@HariGS-DB what's the progress here?

@HariGS-DB
Copy link
Contributor

@nfx Have started working on it today. Had one question to confirm.

  • The requirement is to convert all fields of all tables created by ucx to lower case? and ensure read operation dont do any upper anywhere like in dashboard queries.
  • only the tables used by tables.scala (tables and table_failures)?

@HariGS-DB HariGS-DB linked a pull request Dec 14, 2023 that will close this issue
@github-project-automation github-project-automation bot moved this from Active Backlog to Archive in UCX Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request step/assessment go/uc/upgrade - Assessment Step
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants