Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array casting #8253

Merged
merged 1 commit into from
Sep 18, 2019
Merged

Fix array casting #8253

merged 1 commit into from
Sep 18, 2019

Conversation

betodealmeida
Copy link
Member

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

The fix introduced in #8226 is not working for some types. We have a query returning the following error:

<U10 cannot be converted to an IntegerDtype

This happens because before creating the Pandas dataframe we cast the data into a Numpy array, and Numpy is casting all columns to the same type. I fixed it by keeping the dtype as "object".

TEST PLAN

Query now runs successfully.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@khtruong

@betodealmeida betodealmeida merged commit 8e1fc2b into apache:master Sep 18, 2019
@codecov-io
Copy link

codecov-io commented Sep 18, 2019

Codecov Report

Merging #8253 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8253   +/-   ##
=======================================
  Coverage   65.68%   65.68%           
=======================================
  Files         481      481           
  Lines       23348    23348           
  Branches     2572     2572           
=======================================
  Hits        15335    15335           
  Misses       7875     7875           
  Partials      138      138
Impacted Files Coverage Δ
superset/dataframe.py 94.48% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12fb8e7...02f51d4. Read the comment docs.

@DiggidyDave
Copy link
Contributor

what are the cases where they would not be the same type? is that suggestive of a bigger issue?

otherwise, LGTM

@betodealmeida
Copy link
Member Author

what are the cases where they would not be the same type? is that suggestive of a bigger issue?

otherwise, LGTM

@DiggidyDave, we're casting the results from the DB — a list of tuples — into a Numpy array so we can address each column efficiently:

>>> data = [("a", 1), ("b", 10)]
>>> np.array(data)
array([['a', '1'],
       ['b', '10']], dtype='<U2')
>>> np.array(data)[:,0]  # first column
array(['a', 'b'], dtype='<U2')
>>> np.array(data)[:,1]  # second column
array(['1', '10'], dtype='<U2')

Note that the numbers were cast to unicode, since that's the common type between int and unicode.

If we use "object", though:

>>> np.array(data, dtype='object')
array([['a', 1],
       ['b', 10]], dtype=object)

@betodealmeida betodealmeida mentioned this pull request Sep 20, 2019
12 tasks
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XS 🚢 0.35.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants