awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

ilyanoskov · 2021-03-14T20:46:43Z

I am using this library quite extensively in my pipelines and I have noticed that even small dataframes (44K rows) take a VERY long time to get uploaded to Postgres. Would it be possible to introduce some bulk upload support? Something like described here : https://stackoverflow.com/questions/29706278/python-pandas-to-sql-with-sqlalchemy-how-to-speed-up-exporting-to-ms-sql

Otherwise, I think I will be forced to write my own custom method for bulk uploading, the waiting times are too much. Thank you very much in advance, and thanks for such an amazing project 💪

maxispeicher · 2021-03-15T09:51:00Z

I've added a chunksize parameter to the to_sql function, which tells how many rows should be inserted inside a single SQL query. In a local test it decreased the time for inserting from 120 to 1 second for me. Could you test if it works for you too?:

pip uninstall awswrangler -y
pip install git+https://github.com/maxispeicher/aws-data-wrangler.git@to_sql_add_batching
import awswrangler as wr
...
wr.postgresql.to_sql(..., chunksize=500)

Note that the default value is 1, so you have to explicitly set it.

ilyanoskov · 2021-03-16T10:10:52Z

Hi @maxispeicher, thanks a lot for such a quick response! I won't be able to test this new feature this week, but I did look at your Pull Request and it looks good to me! 🚀

igorborgest · 2021-03-16T10:34:13Z

P.S. We end up with 200 as the default value.

jaidisido · 2021-03-16T18:51:14Z

Covered in release 2.6.0

ilyanoskov added the enhancement New feature or request label Mar 14, 2021

ilyanoskov changed the title ~~awswrangler.postgresql.to_sql is too slow when working with many rows~~ awswrangler.postgresql.to_sql is too slow, inserting row-by-row Mar 14, 2021

maxispeicher mentioned this issue Mar 15, 2021

Insert multiple rows at once in to_sql #600

Merged

igorborgest assigned maxispeicher Mar 15, 2021

igorborgest added this to the 2.6.0 milestone Mar 15, 2021

jaidisido added the ready to release label Mar 16, 2021

jaidisido linked a pull request Mar 16, 2021 that will close this issue

Insert multiple rows at once in to_sql #600

Merged

jaidisido closed this as completed Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

ilyanoskov commented Mar 14, 2021 •

edited

Loading

maxispeicher commented Mar 15, 2021

ilyanoskov commented Mar 16, 2021

igorborgest commented Mar 16, 2021

jaidisido commented Mar 16, 2021

awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

awswrangler.postgresql.to_sql is too slow, inserting row-by-row #599

Comments

ilyanoskov commented Mar 14, 2021 • edited Loading

maxispeicher commented Mar 15, 2021

ilyanoskov commented Mar 16, 2021

igorborgest commented Mar 16, 2021

jaidisido commented Mar 16, 2021

ilyanoskov commented Mar 14, 2021 •

edited

Loading