-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ERROR when acquiring a tuple lock on OSM chunks on replica #7035
Fix ERROR when acquiring a tuple lock on OSM chunks on replica #7035
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7035 +/- ##
==========================================
+ Coverage 80.06% 81.74% +1.67%
==========================================
Files 190 199 +9
Lines 37181 37011 -170
Branches 9450 9676 +226
==========================================
+ Hits 29770 30253 +483
+ Misses 2997 2859 -138
+ Partials 4414 3899 -515 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit description is not very clear. It does fix an error, but a better description would be a functional one about what the pull request actually does, which is something along the lines "Do not lock tuples in OSM chunks on replica servers".
It would also be helpful to add comments describing why, so that your future self knows why you added it.
src/hypertable.c
Outdated
@@ -2560,6 +2563,7 @@ ts_hypertable_osm_range_update(PG_FUNCTION_ARGS) | |||
bool overlap = false, range_invalid = false; | |||
|
|||
/* Lock tuple FOR UPDATE */ | |||
Assert(!RecoveryInProgress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert cannot be used as a check in production deployments and is no guarantee that it will never be reached.
What happen if we run this code on a secondary that is in recovery? Will that cause a production issue?
If so, we should have a check in place so that the tuple is not locked when the server is in recovery, but lock it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I changed the commit message and added a comment.
The assert cannot be used as a check in production deployments and is no guarantee that it will never be reached.
It wasn't meant as a runtime check, rather a documentation for myself and other devs. (Production images are not compiled with asserts enabled, right?). I added a comment there for clarity. This function would not normally be reached from the osm code: on replica it would fail before this call. But even when called directly from psql, it would not cause crash, it would fail with the same error as in the original issue:
ERROR: cannot assign TransactionIds during recovery
when trying to update the catalog.
I can remove the assert altogether though. I only added it because this func calls ts_chunk_get_osm_slice_and_lock
and it should not happen in recovery mode, so that we could catch it early if it happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I changed the commit message and added a comment.
The assert cannot be used as a check in production deployments and is no guarantee that it will never be reached.
It wasn't meant as a runtime check, rather a documentation for myself and other devs. (Production images are not compiled with asserts enabled, right?). I added a comment there for clarity. This function would not normally be reached from the osm code: on replica it would fail before this call. But even when called directly from psql, it would not cause crash, it fail with the same error as in the original issue:
ERROR: cannot assign TransactionIds during recoverywhen trying to update catalog.
Thank you for explaining it.
I can remove the assert altogether though. I only added it because this func calls
ts_chunk_get_osm_slice_and_lock
and it should not happen in recovery mode, so that we could catch it early if it happens.
No, keeping asserts is good for documentation purposes and declaring intentions and help debugging, as you mentioned above. I was just unsure if this meant that the code after would execute in production builds, causing havoc. If that cannot happen, a comment on why this cannot happen and a test (if possible) is good enough to ensure that we do not try to execute bad code in production builds.
97ca75d
to
35380e7
Compare
src/hypertable.c
Outdated
/* this function is not meant to be run on a read-only secondary */ | ||
Assert(!RecoveryInProgress()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your future self, could you extend the comment to describe why we can't reach this point in release builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will do
35380e7
to
b9566b6
Compare
@@ -2461,14 +2461,23 @@ ts_chunk_get_osm_slice_and_lock(int32 osm_chunk_id, int32 time_dim_id, LockTuple | |||
.lockmode = tuplockmode, | |||
.waitpolicy = LockWaitBlock, | |||
}; | |||
/* | |||
* We cannot acquire a tuple lock when running in recovery mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* We cannot acquire a tuple lock when running in recovery mode | |
it is not possible to acquire tuple lock when running in recovery mode as they require a transaction id and transaction ids cannot be created in recovery mode, so we only acquire the tuple lock if we are not in recovery mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more detail to the comment to explain why tuple locks cannot be acquired in recovery mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a clarification
Planner attempts acquiring a `LockTupleKeyShare` tuple lock on an OSM chunk's dimension slice which results in an error on replica. This patch removes tuple lock when in recovery mode.
b9566b6
to
673b7d4
Compare
For PRs timescale#7088 and timescale#7035 Some more edits for docs compliance.
This release contains bug fixes since the 2.15.2 release. Best practice is to upgrade at the next available opportunity. **Migrating from self-hosted TimescaleDB v2.14.x and earlier** After you run `ALTER EXTENSION`, you must run [this SQL script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.15.X-fix_hypertable_foreign_keys.sql). For more details, see the following pull request [#6797](#6797). If you are migrating from TimescaleDB v2.15.0, v2.15.1 or v2.15.2, no changes are required. **Bugfixes** * #7061: Fix the handling of multiple unique indexes in a compressed INSERT. * #7080: Fix the `corresponding equivalence member not found` error. * #7088: Fix the leaks in the DML functions. * #7035: Fix the error when acquiring a tuple lock on the OSM chunks on the replica. **Thanks** * @Kazmirchuk for reporting the issue about leaks with the functions in DML.
This release contains bug fixes since the 2.15.2 release. Best practice is to upgrade at the next available opportunity. **Migrating from self-hosted TimescaleDB v2.14.x and earlier** After you run `ALTER EXTENSION`, you must run [this SQL script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.15.X-fix_hypertable_foreign_keys.sql). For more details, see the following pull request [#6797](#6797). If you are migrating from TimescaleDB v2.15.0, v2.15.1 or v2.15.2, no changes are required. **Bugfixes** * #7061: Fix the handling of multiple unique indexes in a compressed INSERT. * #7080: Fix the `corresponding equivalence member not found` error. * #7088: Fix the leaks in the DML functions. * #7035: Fix the error when acquiring a tuple lock on the OSM chunks on the replica. * #7091: Fix ORDER BY/GROUP BY expression not found in targetlist on PG16 **Thanks** * @Kazmirchuk for reporting the issue about leaks with the functions in DML. --------- Signed-off-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com> Co-authored-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>
Planner attempts acquiring a
LockTupleKeyShare
tuple lock on an OSM chunk's dimension slice which results in an error onreplica. This patch fixes the issue by avoiding tuple locking when in recovery mode.
Fix for timescale/Support-Dev-Collab#1847
Disable-check: force-changelog-file