Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support uploading DataFrames with non-ascii texts in Python 2 #1001

Merged
merged 6 commits into from
Sep 25, 2019

Conversation

jgoizueta
Copy link
Contributor

@jgoizueta jgoizueta commented Sep 17, 2019

Closes #1000

The fix I propose here accumulates the csv data in unicode form and encodes it only at the end.
Attributes for non-ascii test can be passed as unicode or encoded as bytes, so that simple strings '...' containing utf-8 characters will work both in Python 2 and 3. Explicit unicode string u'...' will also work.

@jgoizueta
Copy link
Contributor Author

Note: the test I have added fails also for Python 3 because it also checks that you can pass encoded attributes (as bytes string).

@jgoizueta
Copy link
Contributor Author

jgoizueta commented Sep 17, 2019

So, to be clear: as a bonus point, we now support non only (unicode) strings such as 'año' (u'año' in Python 2), but also strings that have already been encoded as bytes (using UTF-8) like 'año'.encode('utf-8') (or just 'año' in Python 2).

@jgoizueta jgoizueta changed the title Add test for uploading DataFrames with non-ascii texts Support uploading DataFrames with non-ascii texts in Python 2 Sep 17, 2019
@alrocar
Copy link
Contributor

alrocar commented Sep 23, 2019

Thanks!

Let's put PR in the Review column and unassigned, so anyone can take it for revision.

@simon-contreras-deel
Copy link
Contributor

I would like to merge #974 first, to have some test before that. Also, we are going to have some conflicts and it would be easier to solve here than there

@alrocar
Copy link
Contributor

alrocar commented Sep 24, 2019

Removing the blocked label after #974 is merged

# Conflicts:
#	cartoframes/data/dataset/registry/dataframe_dataset.py
#	test/data/dataset/test_dataset.py
@jgoizueta jgoizueta removed their assignment Sep 25, 2019
@simon-contreras-deel simon-contreras-deel self-assigned this Sep 25, 2019
@simon-contreras-deel
Copy link
Contributor

Acceptance ok

@simon-contreras-deel simon-contreras-deel merged commit 7d048e7 into develop Sep 25, 2019
@Jesus89 Jesus89 deleted the fix/1000-nonascii branch September 30, 2019 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't upload non-ascii texts in a DataFrame with Python 2
4 participants