Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute_many optimisation for inserts? #123

Closed
simplesteph opened this issue Apr 13, 2017 · 4 comments
Closed

Execute_many optimisation for inserts? #123

simplesteph opened this issue Apr 13, 2017 · 4 comments

Comments

@simplesteph
Copy link

simplesteph commented Apr 13, 2017

It seems that execute_many would run an INSERT as many times as I have rows to insert. Is there an optimisation available to perform a multi row inserts as part of one INSERT statement?

insert into table values (1,1), (1,2), (1,3), (2,1);

Or would I have to manually create that statement with the desired batch size? Basically a helper would be amazing, ideally for better insert performance

@vitaly-burovoy
Copy link
Contributor

It works only for a simple INSERTs, but it is a wrong way to parse query to decide whether it is possible to rewrite it or not.

For example, I can write a DML like:

UPDATE table SET value='on' WHERE other_value = $1

and run it like

async conn.executemany(sql, [('cat',),('dog',),('pig'),])

just because sometimes it is faster than

UPDATE table 
SET value='on' 
WHERE other_value = 'cat' OR other_value = 'dog' OR other_value = 'pig'

With your proposal parsing an SQL is a wasting of a time, that optimisation can be done at the upper level by a user's code or by sort of AR library.
Moreover now the asyncpg caches queries by default as prepared statements, so different lengths of arguments lead to new queries which should be cached but it is ineffective.

PG allows to insert multiple values without impacting MVCC penalty with a static DML query.
Since PG9.4[1] it possible to write something like

INSERT INTO tbl(col1, col2 [...])
SELECT val1, val2...
FROM ROWS FROM(
  unnest($1::type4col1[]),
  unnest($2::type4col2[]) [...]
) AS t(val1, val2 [...])

and give array of values by columns. In your case it would be:

col1 = (1, 1, 1, 2,)
col2 = (1, 2, 3, 1,)
async conn.execute(sql, (col1, col2,))

[1] https://www.postgresql.org/docs/9.4/static/sql-select.html

@elprans
Copy link
Member

elprans commented Apr 13, 2017

I think the issue of inserting large numbers of rows will be best served by COPY. Alas, we don't support that part of the protocol yet, but that's something I plan for the next release.

@vitaly-burovoy
Copy link
Contributor

@elprans I mostly agree with you. Moreover the COPY command allows to send binary data the asyncpg works well enough.

Nevertheless it has some limitations.

  1. rules[1] are not invoked
  2. IDENTITIY columns[2] (a new feature aimed to replace the *SERIAL macros) are not filled[3] if the COPY data is not provided for these columns and DML ends up with "null value in column "..." violates not-null constraint" (as of 2017-04-17).

As for me the best way to insert new values is to use the INSERT command. The COPY is perfect for transferring data from one database to another or for the dump/restore processes.

P.S.: +1 to support the COPY feature by the asyncpg, it is very useful for restoring fixtures.

[1] https://www.postgresql.org/docs/9.6/static/sql-copy.html#AEN77698
[2] https://www.postgresql.org/docs/devel/static/sql-createtable.html#sql-createtable-examples
[3] https://www.postgresql.org/docs/devel/static/sql-copy.html#idp107065728

elprans added a commit that referenced this issue May 10, 2017
This commit adds two new Connection methods: copy_to_table() and
copy_records_to_table() that allow copying data to the specified
table either in text or, in the latter case, record form.

Related-To #123.
Closes #21.
elprans added a commit that referenced this issue May 11, 2017
This commit adds two new Connection methods: copy_to_table() and
copy_records_to_table() that allow copying data to the specified
table either in text or, in the latter case, record form.

Closes #123.
Closes #21.
elprans added a commit that referenced this issue May 11, 2017
This commit adds two new Connection methods: copy_to_table() and
copy_records_to_table() that allow copying data to the specified
table either in text or, in the latter case, record form.

Closes #123.
Closes #21.
elprans added a commit that referenced this issue May 11, 2017
This commit adds two new Connection methods: copy_to_table() and
copy_records_to_table() that allow copying data to the specified
table either in text or, in the latter case, record form.

Closes #123.
Closes #21.
elprans added a commit that referenced this issue May 11, 2017
This commit adds two new Connection methods: copy_to_table() and
copy_records_to_table() that allow copying data to the specified
table either in text or, in the latter case, record form.

Closes #123.
Closes #21.
@1st1
Copy link
Member

1st1 commented May 12, 2017

Please try out 0.11 relase with COPY IN support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants