Fix array casting #8253

betodealmeida · 2019-09-18T20:20:23Z

SUMMARY

The fix introduced in #8226 is not working for some types. We have a query returning the following error:

<U10 cannot be converted to an IntegerDtype

This happens because before creating the Pandas dataframe we cast the data into a Numpy array, and Numpy is casting all columns to the same type. I fixed it by keeping the dtype as "object".

TEST PLAN

Query now runs successfully.

ADDITIONAL INFORMATION

REVIEWERS

@khtruong

codecov-io · 2019-09-18T20:43:26Z

Codecov Report

Merging #8253 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master    #8253   +/-   ##
=======================================
  Coverage   65.68%   65.68%           
=======================================
  Files         481      481           
  Lines       23348    23348           
  Branches     2572     2572           
=======================================
  Hits        15335    15335           
  Misses       7875     7875           
  Partials      138      138

Impacted Files	Coverage Δ
superset/dataframe.py	`94.48% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12fb8e7...02f51d4. Read the comment docs.

DiggidyDave · 2019-09-18T20:54:42Z

what are the cases where they would not be the same type? is that suggestive of a bigger issue?

otherwise, LGTM

betodealmeida · 2019-09-18T21:30:51Z

what are the cases where they would not be the same type? is that suggestive of a bigger issue?

otherwise, LGTM

@DiggidyDave, we're casting the results from the DB — a list of tuples — into a Numpy array so we can address each column efficiently:

>>> data = [("a", 1), ("b", 10)]
>>> np.array(data)
array([['a', '1'],
       ['b', '10']], dtype='<U2')
>>> np.array(data)[:,0]  # first column
array(['a', 'b'], dtype='<U2')
>>> np.array(data)[:,1]  # second column
array(['1', '10'], dtype='<U2')

Note that the numbers were cast to unicode, since that's the common type between int and unicode.

If we use "object", though:

>>> np.array(data, dtype='object')
array([['a', 1],
       ['b', 10]], dtype=object)

Fix array casting

02f51d4

pull-request-size bot added the size/XS label Sep 18, 2019

khtruong approved these changes Sep 18, 2019

View reviewed changes

betodealmeida merged commit 8e1fc2b into apache:master Sep 18, 2019

betodealmeida mentioned this pull request Sep 20, 2019

Fix no data in Presto #8268

Merged

12 tasks

DanyRay420 mentioned this pull request Feb 23, 2024

[Snyk] Upgrade deck.gl from 8.8.27 to 8.9.34 DanyRay420/superset#2

Open

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix array casting #8253

Fix array casting #8253

betodealmeida commented Sep 18, 2019

codecov-io commented Sep 18, 2019 •

edited

Loading

DiggidyDave commented Sep 18, 2019

betodealmeida commented Sep 18, 2019

Fix array casting #8253

Fix array casting #8253

Conversation

betodealmeida commented Sep 18, 2019

CATEGORY

SUMMARY

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

codecov-io commented Sep 18, 2019 • edited Loading

Codecov Report

DiggidyDave commented Sep 18, 2019

betodealmeida commented Sep 18, 2019

codecov-io commented Sep 18, 2019 •

edited

Loading