-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove character set and collate column info by default #9316
Conversation
@@ -161,6 +161,7 @@ class BaseEngineSpec: # pylint: disable=too-many-public-methods | |||
utils.DbColumnType.STRING: ( | |||
re.compile(r".*CHAR.*", re.IGNORECASE), | |||
re.compile(r".*STRING.*", re.IGNORECASE), | |||
re.compile(r".*TEXT.*", re.IGNORECASE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly unrelated to this PR, but I noticed that the TEXT
types were not being identified as string column types (I'm surprised this hadn't come up previously).
@villebro looking at the example data (using a MySQL database),
it seems that the column are defined with the >>> from superset.utils.core import get_example_database
>>> example_db = get_example_database()
>>> sqla_table = example_db.get_table("energy_usage")
>>> col = next(iter(sqla_table.columns))
>>> col
Column('source', VARCHAR(collation='utf8_unicode_ci', length=255), table=<energy_usage>) and thus I'm a little perplexed as to why we need |
@john-bodley what you're saying makes total sense. The columns from which these strings are created are constructed here: I'm thinking we should loop through these columns prior to returning the Edit: on second thought best to not mutate the original table, probably wise to do in |
@villebro just to clarify what is the issue with having these collate and character set defined at the column level? |
@john-bodley right now the column type column is |
CATEGORY
Choose one
SUMMARY
It is common for SQL engines to offer the option to define a custom character set and collation scheme for columns. The convention is to define this using the following syntax (example from MySQL):
VARCHAR(255) CHARACTER SET LATIN1 COLLATE UTF8MB4_GENERAL_CI
. A few examples:Currently Superset already removes collation info, but retains the character set info. This often causes an overflow when saving metadata due to the column type only being 32 characters wide. This PR makes removal of collation and character set info the new default behaviour.
TEST PLAN
Local test + CI
ADDITIONAL INFORMATION
REVIEWERS
@john-bodley @mistercrunch