Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload removes double quotes #1133

Closed
alrocar opened this issue Oct 29, 2019 · 4 comments · Fixed by #1142
Closed

upload removes double quotes #1133

alrocar opened this issue Oct 29, 2019 · 4 comments · Fixed by #1142
Assignees
Labels

Comments

@alrocar
Copy link
Contributor

alrocar commented Oct 29, 2019

Added by @arredond

When uploading data to CARTO via CartoFrames' Dataset.upload() method, double quotes are removed. Single quotes are not. This is probably somewhat related to https://github.com/CartoDB/support/issues/2219.

An example can be seen here:

import pandas as pd

from cartoframes.data import Dataset

df = pd.DataFrame({
    'a': [
        'This is a string with "double quotes"',
        "This is a string with 'single quotes'"
    ]
})

Dataset(df).upload(table_name='testing_quotes', if_exists='replace')

When looking at the original Pandas' DataFrame, double quotes are correctly managed:

print(df['a'])
0    This is a string with "double quotes"
1    This is a string with 'single quotes'
Name: a, dtype: object

If we use the DataFrame's internal .to_csv() method, double quotes are correctly escaped by doubling them up:

df.to_csv()
',a\n0,"This is a string with ""double quotes"""\n1,This is a string with \'single quotes\'\n'

However, when we look at the data in CARTO, double quotes have disappeared (single quotes remain correctly):

print(Dataset('testing_quotes').download()['a'])
cartodb_id
1      This is a string with double quotes
2    This is a string with 'single quotes'
Name: a, dtype: object

This happens even when escaping the double quotes, either with backslashes (\)...

df = pd.DataFrame({
    'a': [
        'This is a string with \"escaped double quotes\"',
        "This is a string with 'single quotes'"
    ]
})

Dataset(df).upload(table_name='testing_quotes', if_exists='replace')

print(Dataset('testing_quotes').download()['a'])
cartodb_id
1    This is a string with escaped double quotes
2          This is a string with 'single quotes'
Name: a, dtype: object

...or further doubling up the double quotes:

df = pd.DataFrame({
    'a': [
        'This is a string with ""escaped double quotes""',
        "This is a string with 'single quotes'"
    ]
})

Dataset(df).upload(table_name='testing_quotes', if_exists='replace')

print(Dataset('testing_quotes').download()['a'])
cartodb_id
1    This is a string with escaped double quotes
2          This is a string with 'single quotes'
Name: a, dtype: object
@alrocar alrocar added this to the [1.0rc1] Stabilization milestone Oct 29, 2019
@alrocar alrocar added the bug label Oct 29, 2019
@Jesus89 Jesus89 self-assigned this Oct 30, 2019
@Jesus89
Copy link
Member

Jesus89 commented Oct 31, 2019

Hello!

I checked the cartoframes / carto-python / pyrestcli but all seems to be OK in that part.

Then I have checked directly the COPY using curl.

This is a string with "double quotes"|1
curl -v \
     -X POST \
     -H 'Transfer-Encoding: chunked' \
     -H 'Content-Type: application/octet-stream' \
     --data @data.csv \
     "https://arroyo-carto.carto.com/api/v2/sql/copyfrom?q=COPY+testing_quotes(a,cartodb_id)+FROM+stdin+WITH+(FORMAT+csv,DELIMITER+'|')&api_key=xxx"

And this is the result:

I'm not sure if this is a bug in the COPY command, or if we should pass more parameters to avoid skipping the double-quotes.

cc @alrocar

@Jesus89
Copy link
Member

Jesus89 commented Oct 31, 2019

I have just found that we need to pass the data in the following format:

"This is a string with 'simple quotes' and ""double quotes"""|1

So not with it works:

@alrocar
Copy link
Contributor Author

alrocar commented Oct 31, 2019

good catch! I totally forgot about that param

@Jesus89
Copy link
Member

Jesus89 commented Oct 31, 2019

However, the param is " by default, so the key is the format of "..." if it contains ", | or \n and duplicate the ".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants