Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix COPY data types issues #1190

Merged
merged 25 commits into from
Nov 14, 2019
Merged

Fix COPY data types issues #1190

merged 25 commits into from
Nov 14, 2019

Conversation

Jesus89
Copy link
Member

@Jesus89 Jesus89 commented Nov 12, 2019

Fixes #784.

This PR contains important fixes related to uploading and downloading data types 🎉

  • Fix upload types: the old implementation it was iterating by the row. To get a row, panda unifies the type of all the columns in the row, so, if for example, you have a column with int and another with float it assumes that the row it type float so it will add your int column data as float causing a non-deterministic bug. Now the iteration is made using the index of the row, so the final type of the value is the correct one.
  • Encode numeric types NaN, Infinity, -Infinity: tt adds a map between numpy types np.nan, np.inf and -np.inf.
  • Fix dtypes to pgtypes and pgtypes to dtypes conversion: using PostgreSQL types instead: https://www.postgresql.org/docs/7.4/datatype.html. Using pgtype allows a better mapping for the DataFrame types. The PG types aliases have been added too (int4, float8, etc).
  • Allow upload NULL values: in the old implementation, null values were ignored or converted to ' ' . In this PR NaN and None valued in the DataFrame are properly converted to NaN and null in the DB.
  • Fix download NULL and Infinite values: it allows to download null, NaN, Infinite, -Infinite values from the CARTO table.
  • Fix upload coordinates (0, 0): there is also a small fix to enable uploading the geometry POINT(0 0).

@andy-esch
Copy link
Contributor

❤️Love this one! Bugs like these are maybe the most annoying! Beyond the small refactor suggestion, I don't see any recommended changes. Most of it is over my head unless I spend more time thinking through it.

Copy link
Contributor

@alrocar alrocar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

About tests, are we adding a e2e test to check Inifinity and NaN are properly uploaded/downloaded? Same for 0,0 coordinates.

cartoframes/utils/columns.py Show resolved Hide resolved
cartoframes/utils/utils.py Outdated Show resolved Hide resolved
@Jesus89
Copy link
Member Author

Jesus89 commented Nov 13, 2019

We have just unit tests for that. I'll add a couple of e2e tests.

@Jesus89 Jesus89 requested a review from alrocar November 13, 2019 17:44
Copy link
Contributor

@alrocar alrocar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alrocar
Copy link
Contributor

alrocar commented Nov 14, 2019

Acceptance 🍏

@alrocar alrocar merged commit c9c3b54 into develop Nov 14, 2019
@alrocar alrocar deleted the 784-convert-inf branch November 14, 2019 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy 'inf' need to be converted to postgresql infinity
3 participants