Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponentially backoff when out of background workers #4567

Merged
merged 1 commit into from
Sep 1, 2022

Conversation

konskov
Copy link
Contributor

@konskov konskov commented Aug 2, 2022

The scheduler detects the following three types of job failures:

1.Jobs that fail to launch (due to shortage of background workers)
2.Jobs that throw a runtime error
3.Jobs that crash due to a process crashing

In cases 2 and 3, additive backoff is applied in calculating the next
start time of a failed job.
In case 1 we previously retried to launch all jobs that failed to launch
simultaneously.

This commit introduces exponential backoff in case 1,
randomly selecting a wait time in [2, 2 + 2^f] seconds at microsecond granularity.
The aim is to reduce the collision probability for jobs that compete
for a background worker. The maximum backoff value is 1 minute.
It does not change the behavior for cases 2 and 3.

Fixes #4562

@konskov konskov force-pushed the exponential_backoff branch from de6cd06 to 6877d50 Compare August 2, 2022 08:43
@konskov konskov self-assigned this Aug 2, 2022
@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Merging #4567 (56928c5) into main (1fa8373) will decrease coverage by 0.00%.
The diff coverage is 96.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4567      +/-   ##
==========================================
- Coverage   90.81%   90.81%   -0.01%     
==========================================
  Files         224      224              
  Lines       42272    42298      +26     
==========================================
+ Hits        38391    38412      +21     
- Misses       3881     3886       +5     
Impacted Files Coverage Δ
tsl/src/nodes/decompress_chunk/decompress_chunk.c 95.20% <80.00%> (-0.14%) ⬇️
src/bgw/job.c 94.58% <100.00%> (ø)
src/bgw/job_stat.c 90.18% <100.00%> (+0.84%) ⬆️
src/bgw/scheduler.c 83.62% <100.00%> (-0.16%) ⬇️
src/nodes/chunk_append/exec.c 94.50% <100.00%> (ø)
src/nodes/chunk_insert_state.c 97.61% <100.00%> (+<0.01%) ⬆️
src/loader/bgw_message_queue.c 85.52% <0.00%> (-2.64%) ⬇️
tsl/src/nodes/data_node_dispatch.c 96.72% <0.00%> (+0.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed212b4...56928c5. Read the comment docs.

@konskov konskov force-pushed the exponential_backoff branch 2 times, most recently from bae852d to f6a7043 Compare August 4, 2022 07:13
@konskov konskov marked this pull request as ready for review August 4, 2022 08:18
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks good, but it would be good if you could add an outline in the commit message where you describe how:

  • Jobs that fail while executing are re-tried
  • Jobs that fail to start the background worker are re-tried (IIRC, this is different from the one above)
  • Jobs that succeed are re-tried

@konskov konskov force-pushed the exponential_backoff branch 2 times, most recently from add0155 to 6983392 Compare August 8, 2022 06:28
Copy link
Contributor

@gayyappan gayyappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we had exponential backoff initially and then switched to additive backoff. The problem is with jobs that have fairly longer schedules. Say the job is scheduled to run once a day. With exponential backoff, you will quickly schedule a failed job too far into the future. A better way might be to add a hard upper bound check for the next start time after a failure.
i.e. if job failed:

      new_start_time = last_failure_time + schedule interval (with some back off)
      ub_start_time = last_failue_time + '1 h'
      if  ub_start_time < new_start_time
          new_start_time = worst_case.

Also make sure to test and catch potential overflow with exponential backoffs.

@konskov
Copy link
Contributor Author

konskov commented Aug 10, 2022

Thank you so much for reviewing! Currently there is an upper bound but it is probably too high (10 times the schedule interval, and for jobs that fail to launch due to background worker shortage, the upper bound is 1 minute). I will change it to something fixed and smaller, '1h' as you suggest.

@konskov konskov force-pushed the exponential_backoff branch 4 times, most recently from b02e65e to faed7cc Compare August 23, 2022 12:29
@konskov konskov marked this pull request as draft August 23, 2022 13:14
@konskov konskov force-pushed the exponential_backoff branch 5 times, most recently from 132023a to ee27e02 Compare August 24, 2022 15:34
@konskov konskov marked this pull request as ready for review August 25, 2022 06:33
@konskov konskov force-pushed the exponential_backoff branch from ee27e02 to e2ed742 Compare August 29, 2022 06:32
Comment on lines 223 to 233
else
{
// retry every 2 seconds
ival = IntervalPGetDatum(&retry_ival);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else
{
// retry every 2 seconds
ival = IntervalPGetDatum(&retry_ival);
}
else
// retry every 2 seconds
ival = IntervalPGetDatum(&retry_ival);

nit

Comment on lines 242 to 233
else
{
ival_max = IntervalPGetDatum(&interval_max);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else
{
ival_max = IntervalPGetDatum(&interval_max);
}
else
ival_max = IntervalPGetDatum(&interval_max);

nit

/* arbitrarily choose 2 for exponential growth */
for (i = 0; i < exponent - 1; i++)
{
ival = DirectFunctionCall2(interval_mul, ival, Float8GetDatum(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exponential backoff works when the scheduled time is picked randomly between now() and now() + ival (regardless of jitter). If we always pick now() + ival then we risk livelocking the conflicting jobs ad infinitum.

This is orthogonal to jitter, which helps with scheduling at non-integer multiples of time units, so as to not generate high load spikes in the system (e.g. due to everything happening at integer seconds instead of spreading them out).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we are risking that. So for first case (out of background workers) I will change the backoff calculation to randomly pick a backoff value between [0, 2^f - 1] where f is the number of consecutive failed launches of the job.

@konskov konskov force-pushed the exponential_backoff branch 2 times, most recently from 1c5a38f to 2cf5891 Compare August 31, 2022 06:33
int64 micros_per_minute = 1000000;
// will get a random int in [0, (2^f - 1) * 1000]
// this represents a random amount of microseconds to backoff
int64 rand_backoff = random() % (max_slots * micros_per_minute);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int64 rand_backoff = random() % (max_slots * micros_per_minute);
int64 rand_backoff = random() % (max_slots * USECS_PER_SEC);

I don't see anything to do with minutes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

src/bgw/job_stat.c Outdated Show resolved Hide resolved
@konskov konskov force-pushed the exponential_backoff branch from 669792d to 07b074f Compare August 31, 2022 09:31
src/bgw/job_stat.c Outdated Show resolved Hide resolved
@konskov konskov force-pushed the exponential_backoff branch from 07b074f to 8053f1b Compare August 31, 2022 10:01
Interval interval_max = { .time = 60000000 };
Interval retry_ival = { .time = 2000000 };
retry_ival.time += rand_backoff;
if (!launch_failure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!launch_failure)
if (launch_failure)

I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that should be the logic there. Launch failure is when the job couldn't be started because of lack of background workers, and in that case, we have the random backoff in seconds calculated above.
In other cases (job threw runtime error) we have a retry period of 5 minutes, and so the backoff there grows like 5, 10, 20 minutes until it reaches the schedule interval to take the retry period into account. Unless you meant to put the launch_failure case first which would be more readable so I'll do that

@konskov konskov force-pushed the exponential_backoff branch from 8053f1b to b122b46 Compare August 31, 2022 10:18
@konskov konskov requested a review from mfundul August 31, 2022 10:31
@konskov konskov force-pushed the exponential_backoff branch 3 times, most recently from 3ef6794 to 68f49eb Compare August 31, 2022 16:53
@konskov konskov changed the title Exponential backoff for jobs Exponentially backoff when out of background workers Aug 31, 2022
@konskov konskov force-pushed the exponential_backoff branch from 68f49eb to 5d22571 Compare August 31, 2022 17:02
Comment on lines 265 to 266
// if sjob->consecutive_failed_launches > 0, give the system some time to breathe,
// do not attempt to immediately re-run the job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is inside in ts_bgw_job_stat_next_start(), maybe remove?

The scheduler detects the following three types of job failures:

1.Jobs that fail to launch (due to shortage of background workers)
2.Jobs that throw a runtime error
3.Jobs that crash due to a process crashing

In cases 2 and 3, additive backoff is applied in calculating the next
start time of a failed job.
In case 1 we previously retried to launch all jobs that failed to launch
simultaneously.

This commit introduces exponential backoff in case 1,
randomly selecting a wait time in [2, 2 + 2^f] seconds at microsecond granularity.
The aim is to reduce the collision probability for jobs that compete
for a background worker. The maximum backoff value is 1 minute.
It does not change the behavior for cases 2 and 3.

Fixes timescale#4562
@konskov konskov force-pushed the exponential_backoff branch from 5d22571 to 56928c5 Compare September 1, 2022 10:02
@konskov konskov enabled auto-merge (rebase) September 1, 2022 12:13
@konskov konskov merged commit fca9078 into timescale:main Sep 1, 2022
SachinSetiya added a commit that referenced this pull request Dec 1, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 1, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 1, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 5, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 5, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 6, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 6, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 6, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
SachinSetiya added a commit that referenced this pull request Dec 8, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg

**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
svenklemm pushed a commit that referenced this pull request Dec 15, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket
* Introduce fixed schedules for background jobs and the ability to check job errors.
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg
* #5054 Fix segfault after second ANALYZE
* #5086 Reset baserel cache on invalid hypertable cache

**Thanks**
* @byazici for reporting a problem with segmentby on compressed caggs
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded.
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
svenklemm pushed a commit that referenced this pull request Dec 15, 2022
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function to allow specifying the timezone to bucket
* Introduce fixed schedules for background jobs and the ability to check job errors.
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.

This release also includes several bug fixes.

**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode

**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg
* #5054 Fix segfault after second ANALYZE
* #5086 Reset baserel cache on invalid hypertable cache

**Thanks**
* @byazici for reporting a problem with segmentby on compressed caggs
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @kyrias for reporting a crash when ANALYZE is executed on extended query protocol mode with extension loaded.
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @Xima for reporting a bug in Cagg migration
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: Job scheduler - exponential backoffs
6 participants