-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponentially backoff when out of background workers #4567
Conversation
de6cd06
to
6877d50
Compare
Codecov Report
@@ Coverage Diff @@
## main #4567 +/- ##
==========================================
- Coverage 90.81% 90.81% -0.01%
==========================================
Files 224 224
Lines 42272 42298 +26
==========================================
+ Hits 38391 38412 +21
- Misses 3881 3886 +5
Continue to review full report at Codecov.
|
bae852d
to
f6a7043
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks good, but it would be good if you could add an outline in the commit message where you describe how:
- Jobs that fail while executing are re-tried
- Jobs that fail to start the background worker are re-tried (IIRC, this is different from the one above)
- Jobs that succeed are re-tried
add0155
to
6983392
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we had exponential backoff initially and then switched to additive backoff. The problem is with jobs that have fairly longer schedules. Say the job is scheduled to run once a day. With exponential backoff, you will quickly schedule a failed job too far into the future. A better way might be to add a hard upper bound check for the next start time after a failure.
i.e. if job failed:
new_start_time = last_failure_time + schedule interval (with some back off)
ub_start_time = last_failue_time + '1 h'
if ub_start_time < new_start_time
new_start_time = worst_case.
Also make sure to test and catch potential overflow with exponential backoffs.
Thank you so much for reviewing! Currently there is an upper bound but it is probably too high (10 times the schedule interval, and for jobs that fail to launch due to background worker shortage, the upper bound is 1 minute). I will change it to something fixed and smaller, '1h' as you suggest. |
b02e65e
to
faed7cc
Compare
132023a
to
ee27e02
Compare
ee27e02
to
e2ed742
Compare
src/bgw/job_stat.c
Outdated
else | ||
{ | ||
// retry every 2 seconds | ||
ival = IntervalPGetDatum(&retry_ival); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else | |
{ | |
// retry every 2 seconds | |
ival = IntervalPGetDatum(&retry_ival); | |
} | |
else | |
// retry every 2 seconds | |
ival = IntervalPGetDatum(&retry_ival); |
nit
src/bgw/job_stat.c
Outdated
else | ||
{ | ||
ival_max = IntervalPGetDatum(&interval_max); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else | |
{ | |
ival_max = IntervalPGetDatum(&interval_max); | |
} | |
else | |
ival_max = IntervalPGetDatum(&interval_max); |
nit
src/bgw/job_stat.c
Outdated
/* arbitrarily choose 2 for exponential growth */ | ||
for (i = 0; i < exponent - 1; i++) | ||
{ | ||
ival = DirectFunctionCall2(interval_mul, ival, Float8GetDatum(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exponential backoff works when the scheduled time is picked randomly between now()
and now() + ival
(regardless of jitter). If we always pick now() + ival
then we risk livelocking the conflicting jobs ad infinitum.
This is orthogonal to jitter, which helps with scheduling at non-integer multiples of time units, so as to not generate high load spikes in the system (e.g. due to everything happening at integer seconds instead of spreading them out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we are risking that. So for first case (out of background workers) I will change the backoff calculation to randomly pick a backoff value between [0, 2^f - 1] where f is the number of consecutive failed launches of the job.
1c5a38f
to
2cf5891
Compare
src/bgw/job_stat.c
Outdated
int64 micros_per_minute = 1000000; | ||
// will get a random int in [0, (2^f - 1) * 1000] | ||
// this represents a random amount of microseconds to backoff | ||
int64 rand_backoff = random() % (max_slots * micros_per_minute); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64 rand_backoff = random() % (max_slots * micros_per_minute); | |
int64 rand_backoff = random() % (max_slots * USECS_PER_SEC); |
I don't see anything to do with minutes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed
669792d
to
07b074f
Compare
07b074f
to
8053f1b
Compare
src/bgw/job_stat.c
Outdated
Interval interval_max = { .time = 60000000 }; | ||
Interval retry_ival = { .time = 2000000 }; | ||
retry_ival.time += rand_backoff; | ||
if (!launch_failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!launch_failure) | |
if (launch_failure) |
I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that should be the logic there. Launch failure is when the job couldn't be started because of lack of background workers, and in that case, we have the random backoff in seconds calculated above.
In other cases (job threw runtime error) we have a retry period of 5 minutes, and so the backoff there grows like 5, 10, 20 minutes until it reaches the schedule interval to take the retry period into account. Unless you meant to put the launch_failure case first which would be more readable so I'll do that
8053f1b
to
b122b46
Compare
3ef6794
to
68f49eb
Compare
68f49eb
to
5d22571
Compare
src/bgw/scheduler.c
Outdated
// if sjob->consecutive_failed_launches > 0, give the system some time to breathe, | ||
// do not attempt to immediately re-run the job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is inside in ts_bgw_job_stat_next_start()
, maybe remove?
The scheduler detects the following three types of job failures: 1.Jobs that fail to launch (due to shortage of background workers) 2.Jobs that throw a runtime error 3.Jobs that crash due to a process crashing In cases 2 and 3, additive backoff is applied in calculating the next start time of a failed job. In case 1 we previously retried to launch all jobs that failed to launch simultaneously. This commit introduces exponential backoff in case 1, randomly selecting a wait time in [2, 2 + 2^f] seconds at microsecond granularity. The aim is to reduce the collision probability for jobs that compete for a background worker. The maximum backoff value is 1 minute. It does not change the behavior for cases 2 and 3. Fixes timescale#4562
5d22571
to
56928c5
Compare
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function allowing specifying timezone to bucket * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg **Thanks** * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression * @byazici for reporting a problem with segmentby on compressed caggs * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket * Introduce fixed schedules for background jobs and the ability to check job errors. * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg * #5054 Fix segfault after second ANALYZE * #5086 Reset baserel cache on invalid hypertable cache **Thanks** * @byazici for reporting a problem with segmentby on compressed caggs * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded. * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
This release adds major new features since the 2.8.1 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) * Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket * Introduce fixed schedules for background jobs and the ability to check job errors. * Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node. This release also includes several bug fixes. **Features** * #4476 Batch rows on access node for distributed COPY * #4567 Exponentially backoff when out of background workers * #4650 Show warnings when not following best practices * #4664 Introduce fixed schedules for background jobs * #4668 Hierarchical Continuous Aggregates * #4670 Add timezone support to time_bucket_gapfill * #4678 Add interface for troubleshooting job failures * #4718 Add ability to merge chunks while compressing * #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP * #4820 Support parameterized data node scans in joins * #4830 Add function to change configuration of a data nodes * #4966 Handle DML activity when datanode is not available * #4971 Add function to drop stale chunks on a datanode **Bugfixes** * #4663 Don't error when compression metadata is missing * #4673 Fix now() constification for VIEWs * #4681 Fix compression_chunk_size primary key * #4696 Report warning when enabling compression on hypertable * #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table * #4756 Improve compression job IO performance * #4770 Continue compressing other chunks after an error * #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function * #4807 Fix segmentation fault during INSERT into compressed hypertable * #4822 Fix missing segmentby compression option in CAGGs * #4823 Fix a crash that could occur when using nested user-defined functions with hypertables * #4840 Fix performance regressions in the copy code * #4860 Block multi-statement DDL command in one query * #4898 Fix cagg migration failure when trying to resume * #4904 Remove BitmapScan support in DecompressChunk * #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks * #4910 Fix a typo in process_compressed_data_out * #4918 Cagg migration orphans cagg policy * #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15. * #4955 Fix cagg migration for hypertables using timestamp without timezone * #4968 Check for interrupts in gapfill main loop * #4988 Fix cagg migration crash when refreshing the newly created cagg * #5054 Fix segfault after second ANALYZE * #5086 Reset baserel cache on invalid hypertable cache **Thanks** * @byazici for reporting a problem with segmentby on compressed caggs * @jflambert for reporting a crash with nested user-defined functions. * @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work * @kou for fixing a typo in process_compressed_data_out * @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded. * @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate * @Xima for reporting a bug in Cagg migration * @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
The scheduler detects the following three types of job failures:
1.Jobs that fail to launch (due to shortage of background workers)
2.Jobs that throw a runtime error
3.Jobs that crash due to a process crashing
In cases 2 and 3, additive backoff is applied in calculating the next
start time of a failed job.
In case 1 we previously retried to launch all jobs that failed to launch
simultaneously.
This commit introduces exponential backoff in case 1,
randomly selecting a wait time in [2, 2 + 2^f] seconds at microsecond granularity.
The aim is to reduce the collision probability for jobs that compete
for a background worker. The maximum backoff value is 1 minute.
It does not change the behavior for cases 2 and 3.
Fixes #4562