-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read using copy #570
Read using copy #570
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good 👍 . Just added two minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the on-going write PR I finally decided to:
- Keep the actual
context.py
code as it is - Create a new API (see datasets.py and cartoframes.py in the PR) with the new code.
Let's see how the other PR goes and we might want to first merge the other and then adapt this PR code to include the new read code in the Dataset class and wrap it around in the CartoFrames class.
What do you think?
cartoframes/context.py
Outdated
if 'cartodb_id' in df.columns: | ||
df.set_index('cartodb_id', inplace=True) | ||
|
||
for column_name in table_columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides date
s, are you sure that the schema will be faithfully represented from PG -> pandas.DataFrame? I know sometimes with postcodes/zipcodes leading zeros are removed when the column is erroneously converted to a numeric column instead of a string column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we are using COPY and not the Import API, we are creating the table with the same schema as the Dataframe has (this is being done in the write
method, still in CR). So when reading the table, the dataframe should have the same structure as it had before writing.
In summary, no more type/content guessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I have tested dates and works fine (we are doing the same in both cases)
- About the case of postcodes/zipcodes, I want to check if I understand it well: in db we have a string field with something like
01234
and in the DataFrame we should have the same string. Right? This case works fine too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Didn't realize tests were not passing. Let's wait to unblock the related PR. |
This one needs to wait until the write part will be merged |
Solves: #563
Also solves: #212
It needs https://github.com/CartoDB/carto-python/pull/103/files before merging