-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: UPDATE is hundreds of times less effective as UPSERT statement #7010
Comments
Just to clarify one things, the explain output you showed for "ordinary UPDATE" shows an insert plan? Is this a mistake? That said, INSERTs have special routing which filter out the chunks in needs vs UPDATEs which don't have this ability and have to act on all chunks. This makes them inherently slower than UPSERTS which do actually take advantage of tuple routing. Its a known limitation of the system and there are no plans for changing this at this moment. Thanks, |
Yes, it is mistake. I have now updated original post to fix this issue. Thanks |
Taking another look at the UPDATE queries, the main difference here is that the so called ordinary UPDATE targets a specific timestamp while the bulk UPDATE is written so that its hard for us to support filtering the chunk necessary for the UPDATE. If you could rewrite the WHERE clause to include Can you give it a try? |
I have forgot to mentioned in original post, that attached SQLa are simplified examples. In attached UPSERT and UPDATE statements there are only 2 rows in VALUES, but in production SQLs there are exactly 1000 rows in VALUES in bulk-UPSERT statement (all other part of SQL is the same) and we would also like to use 1000 rows in VALUES for bulk-UPDATE. That is whole purpose of "bulk" operation, to have a lot of them. It would be fantastic if all of the rows in VALUES part of SQL have the same timestamptz (I have just talked to developer), but unfortunately all rows in VALUES part of SQL do NOT have the same timestamptz, but they have the same "enotni_ident_mm" column (= like sensorID) and "dis" column (= like companyID). They are two remaining columns from UNIQUE index (and the last one is timestamptz). From current 1-bulk_UPDATE: I did a test from your last post anyway (despite not being really useful in our case). From WHERE condition of 1_bulk_UPDATE_explain.txt: |
If you look at your plans you are generating for UPDATES, you can see it has a join with the materialized values you are using to update. We cannot run chunk exclusion based on joins so basically I don't see a way this can be even implemented on our side. Is there a way you can use UPSERTS here? Seems like it could give you the performance benefits you are looking for. As a workaround to this problem, maybe using the materialized values for checking which rows exist with your unique constraint and then doing an UPSERT to update the values might give you decent enough results. This is the best suggestion I can offer here. I hope it helps. |
We have now changed the logic from bulk-UPDATE to bulk-UPSERT for out application and it works great. Thanks for help. If this issue is of any future value, maybe leave it open and add "enhancement" request, if not then it can be closed down. I just thought I need to check with experts if we are doing it right, because for example article How Does PostgreSQL Implement Batch Update, Deletion, and Insertion in section "Bulk Update" section, recommends to use bulk-UPDATE. It would be interesting to test how vanilla PostgresSQL really handles this bulk-UPDATE e.g. using native partitions and if query plan really changes. |
I can't say that I know what would happen exactly to native partitioning and bulk UPDATEs. If you do end up trying that, feel free to respond to this ticket but I don't think we can do anything for this on our end. |
What type of bug is this?
Performance issue
What subsystems and features are affected?
Query planner
What happened?
Arhitecture
In our case sensors are sending data to accumulated storage (not in TimescaleDB) and from there we bulk-insert data into self-hosted on-prem TimescaleDB database.
Updates are required
We have found out that some sensors from time to time do not send data at all or sometimes send incorrect data into accumulated non-TimescaleDB storage. Unfortunately this is in a lot of times recognized after data are already inserted into TimescaleDB. To overcome this incorrect or missing data problem, we changed TimescaleDB INSERT statement to UPSERT statement (INSERT+UPDATE in the same SQL statement) and this has been working perfectly well for months. In the most cases inside UPSERT statement INSERTs are executed, but in like 10% UPDATES are executed. One single bulk-UPSERT statement executes in subsecond time.
The problem
So far so good. But now we have found out that some data from past months are also incorrect and we need to bulk-UPDATE data for past values. In this case we thought it is no reason we should execute bluk-UPSERT, but instead we can execute bulk-UPDATE, because it should be faster (our assumption). But... we were surprised bulk-UPDATE statement is just executing in Exclusive Mode locking thousands of database objects and after one hour (where more then 50-thousands object were locked and more then 5-thousand applications were waiting) I have terminated the bulk-UPDATE. We are talking about equivalent bulk-UPDATE that is executing more then an hour (don't know how much more... because it was terminated after 1 hour) with comparison to equilent bulk-UPSERT that is executed in sub-second.
Explains show the problem
I did EXPLAIN on bulk-UPSERT and I see only one chunk is effected as expected. Every day is in one chunk and because data in bulk-UPSERT are from single day, only single chunk is affected. This is working perfectly, just like expected. But from bulk-UPDATE explain I see all of the chunks from hypertable are for some unknown reason affected. I expect in bulk-UPDATE to only one chunk to be affected according to our one day chunk partitioning, just like at bulk-UPSERT. I expected bulk-UPDATE to be faster then bulk-UPSERT, but it is like million times slower.
Details
BTW, on hypertable where bulk-UPSERT and bulk-UPDATE are executed, UNIQUE INDEX is created on SQLs where condidtions in this order: dis, enotni_ident_mm, casovna_znacka DESC. Table definition using "\dt table": table_definition.txt
My question is why bulk-UPDATE (from point 1) does not executes equally well or (as I expected) even better then bulk-UPSERT?
TimescaleDB version affected
2.14.2
PostgreSQL version used
15.6
What operating system did you use?
Red Hat Enterprise Linux v9.3
What installation method did you use?
RPM
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
I suppose you can create table, do some bulk insert and then compare bulk-UPSERT with bulk-UPDATE statement. Our hypertable is having 80-billion rows, at least portion of this could be tested.
The text was updated successfully, but these errors were encountered: