Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create dtype option for csv upload #23716

Merged
merged 3 commits into from
Apr 24, 2023

Conversation

eschutho
Copy link
Member

@eschutho eschutho commented Apr 17, 2023

SUMMARY

Redshift automatically converts all text columns to a varchar(256) which means for uploads, and in particular csv uploads for this case, the only way to upload a column with a large text field is to first create the table in sql lab with the correct column definitions and then upload to that existing table and replace the data. This pr adds a new dtype field for the upload, and using the "string" property, converts any field of this type to a varchar(max) for redshift. Currently all string types are uploaded as "object", so this change shouldn't impact any uploads that aren't explicitly passing the "string" property. We could extend this to other dbs, but it looks like the default behavior is for uploaded string columns to be converted to text.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

After: (added a new field)
_DEV__Superset

In this example, I updated just the column type to nvarchar(max) by passing {"Question":"string"} into the upload form. The other string field remains as varchar(256).
_DEV__Superset

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@eschutho eschutho marked this pull request as draft April 17, 2023 23:37
@eschutho eschutho changed the title feature: create dtype option for csv upload feat: create dtype option for csv upload Apr 17, 2023
@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Merging #23716 (2b6a084) into master (42e8d1b) will increase coverage by 2.01%.
The diff coverage is 65.54%.

❗ Current head 2b6a084 differs from pull request most recent head ef00cd0. Consider uploading reports for the commit ef00cd0 to get more accurate results

@@            Coverage Diff             @@
##           master   #23716      +/-   ##
==========================================
+ Coverage   65.96%   67.98%   +2.01%     
==========================================
  Files        1907     1936      +29     
  Lines       73590    74928    +1338     
  Branches     7982     8140     +158     
==========================================
+ Hits        48546    50942    +2396     
+ Misses      22996    21894    -1102     
- Partials     2048     2092      +44     
Flag Coverage Δ
hive 53.00% <ø> (+0.26%) ⬆️
mysql 78.80% <ø> (+0.39%) ⬆️
postgres 78.87% <ø> (+0.37%) ⬆️
presto 52.92% <ø> (+0.25%) ⬆️
python 82.67% <ø> (+3.74%) ⬆️
sqlite 77.39% <ø> (?)
unit 52.81% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ackages/superset-ui-chart-controls/src/fixtures.ts 100.00% <ø> (ø)
...t-ui-chart-controls/src/shared-controls/mixins.tsx 16.66% <ø> (ø)
...d/packages/superset-ui-chart-controls/src/types.ts 100.00% <ø> (ø)
.../packages/superset-ui-core/src/chart/types/Base.ts 100.00% <ø> (ø)
...s/superset-ui-core/src/components/SafeMarkdown.tsx 85.71% <0.00%> (+19.04%) ⬆️
...ackages/superset-ui-core/src/query/types/Filter.ts 100.00% <ø> (ø)
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
...s/legacy-plugin-chart-country-map/src/countries.ts 100.00% <ø> (ø)
...plugins/legacy-plugin-chart-heatmap/src/Heatmap.js 0.00% <0.00%> (ø)
...gins/legacy-plugin-chart-world-map/src/WorldMap.js 0.00% <0.00%> (ø)
... and 141 more

... and 332 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@eschutho eschutho force-pushed the elizabeth/redshift-text-csv branch from 2472ebe to 2bed125 Compare April 19, 2023 00:13
@eschutho eschutho force-pushed the elizabeth/redshift-text-csv branch from 2bed125 to e76b921 Compare April 19, 2023 00:25
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

dtype = StringField(
_("Column Data Types"),
description=_(
"A dictionary with column names and their data types if you need to change the defaults. Example: {“Column”:“data type”}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a concrete example here, something like:

Suggested change
"A dictionary with column names and their data types if you need to change the defaults. Example: {“Column”:“data type”}"
"A dictionary with column names and their data types if you need to change the defaults. Example: {“user_id”:“integer”}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

@pull-request-size pull-request-size bot added size/L and removed size/M labels Apr 20, 2023
@eschutho eschutho force-pushed the elizabeth/redshift-text-csv branch 2 times, most recently from 322178e to 8055592 Compare April 20, 2023 00:15
@eschutho eschutho force-pushed the elizabeth/redshift-text-csv branch from 8055592 to 17792e2 Compare April 20, 2023 00:16
@eschutho eschutho marked this pull request as ready for review April 20, 2023 00:16
@eschutho eschutho force-pushed the elizabeth/redshift-text-csv branch from bd36bce to ef00cd0 Compare April 24, 2023 18:59
@eschutho eschutho merged commit 71106cf into apache:master Apr 24, 2023
@eschutho eschutho deleted the elizabeth/redshift-text-csv branch April 24, 2023 19:53
jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Apr 27, 2023
sebastianliebscher pushed a commit to sebastianliebscher/superset that referenced this pull request Apr 28, 2023
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2023.17 size/L 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants