Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

Closed
ilyanoskov opened this issue Mar 14, 2021 · 4 comments · Fixed by #600
Closed

awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

ilyanoskov opened this issue Mar 14, 2021 · 4 comments · Fixed by #600
Assignees
Labels
enhancement New feature or request ready to release
Milestone

Comments

@ilyanoskov
Copy link

ilyanoskov commented Mar 14, 2021

I am using this library quite extensively in my pipelines and I have noticed that even small dataframes (44K rows) take a VERY long time to get uploaded to Postgres. Would it be possible to introduce some bulk upload support? Something like described here : https://stackoverflow.com/questions/29706278/python-pandas-to-sql-with-sqlalchemy-how-to-speed-up-exporting-to-ms-sql

Otherwise, I think I will be forced to write my own custom method for bulk uploading, the waiting times are too much. Thank you very much in advance, and thanks for such an amazing project 💪

@ilyanoskov ilyanoskov added the enhancement New feature or request label Mar 14, 2021
@ilyanoskov ilyanoskov changed the title awswrangler.postgresql.to_sql is too slow when working with many rows awswrangler.postgresql.to_sql is too slow, inserting row-by-row Mar 14, 2021
@maxispeicher
Copy link
Contributor

I've added a chunksize parameter to the to_sql function, which tells how many rows should be inserted inside a single SQL query. In a local test it decreased the time for inserting from 120 to 1 second for me. Could you test if it works for you too?:

pip uninstall awswrangler -y
pip install git+https://github.com/maxispeicher/aws-data-wrangler.git@to_sql_add_batching
import awswrangler as wr
...
wr.postgresql.to_sql(..., chunksize=500)

Note that the default value is 1, so you have to explicitly set it.

@ilyanoskov
Copy link
Author

Hi @maxispeicher, thanks a lot for such a quick response! I won't be able to test this new feature this week, but I did look at your Pull Request and it looks good to me! 🚀

@igorborgest
Copy link
Contributor

@jaidisido jaidisido linked a pull request Mar 16, 2021 that will close this issue
@jaidisido
Copy link
Contributor

Covered in release 2.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ready to release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants