Fix segfault after second ANALYZE #5054

sb230132 · 2022-12-06T09:24:11Z

Issue occurs in extended query protocol mode only where every query goes through PREPARE and EXECUTE phase. First time ANALYZE is executed, a list of relations to be vaccumed is extracted and saved in a list. This list is referenced in parsetree node. Once execution of ANALYZE is complete, this list is cleaned up, however reference to the same is not cleaned up in parsetree node. When second time ANALYZE is executed, segfault happens as we access an invalid memory location.

Fixed the issue by restoring the actual value in parsetree node once ANALYZE completes its execution.

Fixes #4857

codecov · 2022-12-06T09:50:15Z

Codecov Report

Merging #5054 (0910f45) into main (d927390) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #5054      +/-   ##
==========================================
- Coverage   89.61%   89.61%   -0.01%     
==========================================
  Files         227      227              
  Lines       51608    51609       +1     
==========================================
- Hits        46251    46247       -4     
- Misses       5357     5362       +5

Impacted Files	Coverage Δ
src/process_utility.c	`90.20% <100.00%> (+<0.01%)`	⬆️
src/loader/bgw_launcher.c	`89.51% <0.00%> (-2.55%)`	⬇️
src/bgw/job_stat.c	`91.87% <0.00%> (-0.32%)`	⬇️
tsl/src/bgw_policy/job.c	`88.31% <0.00%> (-0.05%)`	⬇️
src/bgw/scheduler.c	`85.16% <0.00%> (+1.27%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d927390...0910f45. Read the comment docs.

jnidzwetzki · 2022-12-06T09:56:38Z

src/process_utility.c

+	VacuumRelation list is freed up, however VacuumStmt is not
+	cleaned up because of which there is a crash.
+	*/
+	stmt->rels = list_copy(stmt_rels);


I am curious, do you know in which places stmt->rels is changed? Is there any change besides the one in line 963 (stmt->rels = list_concat(ctx.chunk_rels, vacuum_rels);).

Maybe we could also fix the problem by creating a new list there and using it for the comparision a few lines later. Then we would save the copy and restore operation and stmt->rels would always contain valid entries.

You are right. We can fix it either way.
I tried the same approach initially. I created a new list instead of modifying stmt->rels, but i could not pass this new list to ExecVacuum(), as this API expects VacuumStmt as 2nd parameter. Which in turn requires me to create a new VacuumStmt object and set the new list inside it.

Thus i took alternative approach of saving the original list, and restoring the saved list.

test/sql/size_utils.sql

fabriziomello · 2022-12-06T22:05:49Z

@sb230132 isn't this issue similar to 74ca546 ?

sb230132 · 2022-12-07T01:56:00Z

@fabriziomello Issue you pointed out is same as this one. I did try to allocate a new memory context for stmt rel list and free it up, however during completion of this function process_vacuum() i need to restore the stmt rel list. As changes to this data structure should not be visible to the caller of this function.
Something similar is done in src/backend/commands/vacuum.c/vacuum(). Stmt rel list which is passed to vacuum() is pass by value, thus any changes done inside vacuum() is lost once this function call execution is done.

Thanks for pointing to this issue. It was more informative.

fabriziomello · 2022-12-07T13:50:25Z

@fabriziomello Issue you pointed out is same as this one. I did try to allocate a new memory context for stmt rel list and free it up, however during completion of this function process_vacuum() i need to restore the stmt rel list. As changes to this data structure should not be visible to the caller of this function. Something similar is done in src/backend/commands/vacuum.c/vacuum(). Stmt rel list which is passed to vacuum() is pass by value, thus any changes done inside vacuum() is lost once this function call execution is done.

Didn't get your point. Where exactly Postgres do it in src/backend/commands/vacuum.c:vacuum()? The problem here is subsequent executions of an utility command. The first execution is prepared/cached (extended protocol / plpgsql do it) but the changed relation list is allocated in other memory context, so the subsequent execution will segfault due to memory violation. Am I missing something here?

sb230132 · 2022-12-07T14:26:32Z

@fabriziomello Issue you pointed out is same as this one. I did try to allocate a new memory context for stmt rel list and free it up, however during completion of this function process_vacuum() i need to restore the stmt rel list. As changes to this data structure should not be visible to the caller of this function. Something similar is done in src/backend/commands/vacuum.c/vacuum(). Stmt rel list which is passed to vacuum() is pass by value, thus any changes done inside vacuum() is lost once this function call execution is done.

Didn't get your point. Where exactly Postgres do it in src/backend/commands/vacuum.c:vacuum()? The problem here is subsequent executions of an utility command. The first execution is prepared/cached (extended protocol / plpgsql do it) but the changed relation list is allocated in other memory context, so the subsequent execution will segfault due to memory violation. Am I missing something here?

You are right.
Let me give more description:
Check below function:

void
ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)

In this function there is a call to

vacuum(vacstmt->rels, &params, NULL, isTopLevel);

vacstmt->rels hold list of relations which needs to be vaccumed.
ex:
ANALYZE t1,t2,t2;
vacstmt->rels will have 3 entries representing t1,t2,t3

ANALYZE;
In this case vacstmt->rels has 0 entries.

When timescaledb extension is installed.
First time ANALYZE is executed:

In src/process_utility.c::process_vacuum() call, stmt->rels is filled with list of relations which needs to be vacuumed.
This list is then passed on to ExecVacuum().
Once process_vacuum() is finished execution the list which holds all relations to be vacuumed is freed up.

Second time ANALYZE is executed:

In src/process_utility.c::process_vacuum(), stmt->rels has the same list which has been deallocated.
Same list is accessed in ExecVacuum() and thus causes a crash.
Thus we need to reset stmt->rels everytime at end of process_vacuum() execution.

src/process_utility.c

jnidzwetzki

One comment, rest looks good to me.

CHANGELOG.md

Issue occurs in extended query protocol mode only where every query goes through PREPARE and EXECUTE phase. First time ANALYZE is executed, a list of relations to be vaccumed is extracted and saved in a list. This list is referenced in parsetree node. Once execution of ANALYZE is complete, this list is cleaned up, however reference to the same is not cleaned up in parsetree node. When second time ANALYZE is executed, segfault happens as we access an invalid memory location. Fixed the issue by restoring the actual value in parsetree node once ANALYZE completes its execution. Fixes timescale#4857

@byazici

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket * Introduce fixed schedules for background jobs and the ability to check job errors. * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg * #5054 Fix segfault after second ANALYZE * #5086 Reset baserel cache on invalid hypertable cache **Thanks** * @byazici for reporting a problem with segmentby on compressed caggs * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded. * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression

@byazici

This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket * Introduce fixed schedules for background jobs and the ability to check job errors. * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg * #5054 Fix segfault after second ANALYZE * #5086 Reset baserel cache on invalid hypertable cache **Thanks** * @byazici for reporting a problem with segmentby on compressed caggs * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded. * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression

sb230132 requested a review from jnidzwetzki December 6, 2022 09:24

sb230132 self-assigned this Dec 6, 2022

sb230132 marked this pull request as ready for review December 6, 2022 09:30

sb230132 requested a review from mkindahl December 6, 2022 09:31

sb230132 force-pushed the analyze_crash branch from beb40f4 to 1bdb23e Compare December 6, 2022 09:38

jnidzwetzki reviewed Dec 6, 2022

View reviewed changes

sb230132 requested a review from jnidzwetzki December 6, 2022 10:11

fabriziomello reviewed Dec 6, 2022

View reviewed changes

test/sql/size_utils.sql Outdated Show resolved Hide resolved

sb230132 force-pushed the analyze_crash branch from 1bdb23e to 561d55d Compare December 7, 2022 01:58

sb230132 requested a review from fabriziomello December 7, 2022 03:57

mkindahl removed their request for review December 7, 2022 13:06

fabriziomello reviewed Dec 7, 2022

View reviewed changes

src/process_utility.c Outdated Show resolved Hide resolved

sb230132 force-pushed the analyze_crash branch from 561d55d to d6d7315 Compare December 8, 2022 02:04

sb230132 requested a review from fabriziomello December 8, 2022 02:04

fabriziomello reviewed Dec 8, 2022

View reviewed changes

src/process_utility.c Outdated Show resolved Hide resolved

fabriziomello reviewed Dec 8, 2022

View reviewed changes

src/process_utility.c Outdated Show resolved Hide resolved

sb230132 force-pushed the analyze_crash branch from d6d7315 to c297d99 Compare December 9, 2022 01:44

sb230132 requested a review from fabriziomello December 9, 2022 13:59

fabriziomello approved these changes Dec 9, 2022

View reviewed changes

jnidzwetzki approved these changes Dec 12, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

sb230132 force-pushed the analyze_crash branch from c297d99 to 0910f45 Compare December 12, 2022 11:08

sb230132 enabled auto-merge (rebase) December 12, 2022 11:09

sb230132 merged commit dd65a6b into timescale:main Dec 12, 2022

sb230132 deleted the analyze_crash branch December 20, 2022 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault after second ANALYZE #5054

Fix segfault after second ANALYZE #5054

sb230132 commented Dec 6, 2022

codecov bot commented Dec 6, 2022 •

edited

Loading

jnidzwetzki Dec 6, 2022 •

edited

Loading

sb230132 Dec 6, 2022

fabriziomello commented Dec 6, 2022

sb230132 commented Dec 7, 2022 •

edited

Loading

fabriziomello commented Dec 7, 2022

sb230132 commented Dec 7, 2022

jnidzwetzki left a comment

Fix segfault after second ANALYZE #5054

Fix segfault after second ANALYZE #5054

Conversation

sb230132 commented Dec 6, 2022

codecov bot commented Dec 6, 2022 • edited Loading

Codecov Report

jnidzwetzki Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

sb230132 Dec 6, 2022

Choose a reason for hiding this comment

fabriziomello commented Dec 6, 2022

sb230132 commented Dec 7, 2022 • edited Loading

fabriziomello commented Dec 7, 2022

sb230132 commented Dec 7, 2022

jnidzwetzki left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 6, 2022 •

edited

Loading

jnidzwetzki Dec 6, 2022 •

edited

Loading

sb230132 commented Dec 7, 2022 •

edited

Loading