Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake pandas #83

Merged
merged 3 commits into from
May 29, 2020
Merged

Snowflake pandas #83

merged 3 commits into from
May 29, 2020

Conversation

fdosani
Copy link
Member

@fdosani fdosani commented May 23, 2020

Tagging @jfyu also

Would love to have a bit of a discussion about this PR before taking any action to see if folks align on the approach here. I've overridden the to_pandas function for Snowflake. If no size is provided it will use the internal snowflake function to get the pandas df. If size is provided to will use the existing inherited database functionality.

I was reluctant to change to the insert_dataframe_to_table to use write_pandas(). The 2 are slightly different in their approach. Our method is not using COPY like the internal snowflake method but inserting directly via a INSERT statement, and provides table creation and schema abilities. I updated the docs to explain this. I think if people are really keep on using write_pandas they should be able to call it directly from the cursor.

Any thoughts on the above?

@fdosani fdosani added the enhancement New feature or request label May 23, 2020
@dasjyao
Copy link
Contributor

dasjyao commented May 24, 2020

Instead of replacing, I think it might be a better idea to combine insert_dataframe_to_table and write_pandas() . More specifically, we should keep the functionality of table creation, but replace the INSERT statements to use write_pandas(). Also, we need to do some performance tests to see how write_pandas() improves the speed.

@jfyu
Copy link
Contributor

jfyu commented May 27, 2020

I really like the create table feature because that's a very big use case. If it's between one and the other, then I choose to keep the create table feature that we currently have. However, an alternative is to add an argument in insert_dataframe_to_table for large operations using COPY, that could be the simplest solution.

As an aside, it does look like snowflake-connector-python has an insertion method as well (https://docs.snowflake.com/en/user-guide/python-connector-api.html#pd_writer) but only for SQLAlchemy

Copy link
Contributor

@jborchma jborchma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Two small comments for the docstrings

locopy/snowflake.py Show resolved Hide resolved
locopy/snowflake.py Outdated Show resolved Hide resolved
@fdosani fdosani requested a review from elzzhu May 29, 2020 00:37
Copy link
Contributor

@jborchma jborchma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fdosani
Copy link
Member Author

fdosani commented May 29, 2020

There is a bigger question about the insert_dataframe_to_table method. Should we add a flag like use_snowflake_copy which will use the underlying snowflake feature if selected vs our version? It should be simple enough and can do it in the next PR if we all agree on that?

This would provide the best of both worlds :)

@fdosani fdosani merged commit 621c639 into develop May 29, 2020
@fdosani fdosani deleted the snowflake-pandas branch June 8, 2020 13:21
@fdosani fdosani mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants