-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage][Sharding][Pruner] Refactor ledger pruner and cleanups. #8443
Conversation
84a57f4
to
1052bdb
Compare
eb6da45
to
e991a3a
Compare
self.record_progress_impl( | ||
current_batch_target_version, | ||
/*has_pending_work_before_min_readable_version=*/ true, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give it its own name, maybe record_pending_progress()
, to be symmetric with record_progress()
, or just use the same record_progress_impl()
call in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
PRUNER_VERSIONS | ||
.with_label_values(&["ledger_pruner", "min_readable"]) | ||
.set(min_readable_version as i64); | ||
fn target_version(&self) -> Version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this back up before record_progress()
? (why change the order?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
let stored_min_readable_version = self.min_readable_version.load(Ordering::SeqCst); | ||
ensure!( | ||
min_readable_version == stored_min_readable_version || stored_min_readable_version == 0, | ||
"Min readable version for TransactionAccumulator doesn't match, {min_readable_version} vs {stored_min_readable_version}.", | ||
); | ||
|
||
self.min_readable_version | ||
.store(target_version, Ordering::SeqCst); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract a helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
let min_readable_version = if let Some(v) = transaction_accumulator_db | ||
.get::<DbMetadataSchema>(&DbMetadataKey::TransactionAccumulatorPrunerProgress)? | ||
{ | ||
v.expect_version() | ||
} else { | ||
transaction_accumulator_db.put::<DbMetadataSchema>( | ||
&DbMetadataKey::TransactionAccumulatorPrunerProgress, | ||
&DbMetadataValue::Version(metadata_min_readable_version), | ||
)?; | ||
metadata_min_readable_version | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extract a helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
self.min_readable_version | ||
.store(min_readable_version, Ordering::SeqCst); | ||
batch.put::<DbMetadataSchema>( | ||
&DbMetadataKey::TransactionAccumulatorPrunerProgress, | ||
&DbMetadataValue::Version(min_readable_version), | ||
)?; | ||
|
||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extract a helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
write_set_pruner: Arc<dyn DBSubPruner + Send + Sync>, | ||
|
||
// (min_readable_version, has_pending_work_before_min_readable_version) | ||
progress: Mutex<(Version, bool)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe try to eliminate the need for the boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
75088d6
to
4606761
Compare
544c195
to
ffd75a8
Compare
fn prune(&self, min_readable_version: Version, target_version: Version) -> anyhow::Result<()>; | ||
|
||
// Returns the progress of the pruner. | ||
fn progress(&self) -> Version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with min_readable_version()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to have two concepts, progress is updated after all work is done, while min_readable_version is updated before (managed by the managers)
@@ -82,3 +80,21 @@ where | |||
{ | |||
Ok(get_progress(state_merkle_db.metadata_db(), &S::tag())?.unwrap_or(0)) | |||
} | |||
|
|||
pub(crate) fn get_and_maybe_update_ledger_subpruner_progress( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"get_or_initialize_ledger_subpruner_progress"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ledger_metadata_db.get::<DbMetadataSchema>(&DbMetadataKey::LedgerPrunerProgress)? | ||
{ | ||
v.expect_version() | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document in what case the progress is not written
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually quite sure if there is a such case, since we've started writing this progress quite long time ago. Just want to play on a safer side to have this fallback logic.
event_store_pruner: Arc<dyn DBSubPruner + Send + Sync>, | ||
write_set_pruner: Arc<dyn DBSubPruner + Send + Sync>, | ||
|
||
progress: AtomicVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with min_readable_version
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, the pending
version is theoretically not readable.
return Ok(self.min_readable_version()); | ||
} | ||
fn prune(&self, max_versions: usize) -> Result<Version> { | ||
let mut progress = self.progress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why tracking this separately instead of proxying the metadata progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The progress is updated after all sub pruner finish.
a5a3e11
to
25e347e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
25e347e
to
0bf0cc4
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
* init * run lint * delete verfiy and add schema * pass linter test * fix lint * merge aptos main * run lint * [python] return int balance of coins * [python] async sleep for async calls * [python] use aptos_account::transfer instead of coin::transfer * [python] expose payload generators for token client * [python] support inserting sequence numbers in transaction helpers * [python] add a transaction management layer This provides a framework for managing as many transactions from a single account at once * The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path. * The transaction manager provides async workers that push a transaction from submission through to validating completion Together they provide the basic harness for scaling transaction submission on the Aptos blockchain from a single account. * [python] Add testing coverage * [python] cleaning up with feedback * [docs] update transaction management * [python] add a modest reliablity layer to transaction management this handles all the failures associated with network congestion, meaning this is ready to ship for now... need more testing on other failure cases.... such as intermittent network connectivity, lost connections, bad upstreams. * [python] remove unnecessary python dependencies * [Execution Benchmark] Calibrate Threshold (#8591) * clean AptosVmImpl::new() up a bit * [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576) * Add a zip functions to iterate over 2 vectors concurrently (#8584) * [TF] Add health check for waypoint service in GCP testnet-addons Also, add a default for "gke_maintenance_policy" in the aptos-node-testnet module. * [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443) * use assume to replace require Cointype != Aptos * fix linter --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com> Co-authored-by: aldenhu <msmouse@gmail.com> Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com> Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com> Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
* init * run lint * delete verfiy and add schema * pass linter test * fix lint * merge aptos main * run lint * [python] return int balance of coins * [python] async sleep for async calls * [python] use aptos_account::transfer instead of coin::transfer * [python] expose payload generators for token client * [python] support inserting sequence numbers in transaction helpers * [python] add a transaction management layer This provides a framework for managing as many transactions from a single account at once * The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path. * The transaction manager provides async workers that push a transaction from submission through to validating completion Together they provide the basic harness for scaling transaction submission on the Aptos blockchain from a single account. * [python] Add testing coverage * [python] cleaning up with feedback * [docs] update transaction management * [python] add a modest reliablity layer to transaction management this handles all the failures associated with network congestion, meaning this is ready to ship for now... need more testing on other failure cases.... such as intermittent network connectivity, lost connections, bad upstreams. * [python] remove unnecessary python dependencies * [Execution Benchmark] Calibrate Threshold (#8591) * clean AptosVmImpl::new() up a bit * [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576) * Add a zip functions to iterate over 2 vectors concurrently (#8584) * [TF] Add health check for waypoint service in GCP testnet-addons Also, add a default for "gke_maintenance_policy" in the aptos-node-testnet module. * [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443) * use assume to replace require Cointype != Aptos * fix linter --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com> Co-authored-by: aldenhu <msmouse@gmail.com> Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com> Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com> Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
* init * run lint * delete verfiy and add schema * pass linter test * fix lint * merge aptos main * run lint * [python] return int balance of coins * [python] async sleep for async calls * [python] use aptos_account::transfer instead of coin::transfer * [python] expose payload generators for token client * [python] support inserting sequence numbers in transaction helpers * [python] add a transaction management layer This provides a framework for managing as many transactions from a single account at once * The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path. * The transaction manager provides async workers that push a transaction from submission through to validating completion Together they provide the basic harness for scaling transaction submission on the Aptos blockchain from a single account. * [python] Add testing coverage * [python] cleaning up with feedback * [docs] update transaction management * [python] add a modest reliablity layer to transaction management this handles all the failures associated with network congestion, meaning this is ready to ship for now... need more testing on other failure cases.... such as intermittent network connectivity, lost connections, bad upstreams. * [python] remove unnecessary python dependencies * [Execution Benchmark] Calibrate Threshold (#8591) * clean AptosVmImpl::new() up a bit * [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576) * Add a zip functions to iterate over 2 vectors concurrently (#8584) * [TF] Add health check for waypoint service in GCP testnet-addons Also, add a default for "gke_maintenance_policy" in the aptos-node-testnet module. * [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443) * use assume to replace require Cointype != Aptos * fix linter --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com> Co-authored-by: aldenhu <msmouse@gmail.com> Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com> Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com> Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
* init * run lint * delete verfiy and add schema * pass linter test * fix lint * merge aptos main * run lint * [python] return int balance of coins * [python] async sleep for async calls * [python] use aptos_account::transfer instead of coin::transfer * [python] expose payload generators for token client * [python] support inserting sequence numbers in transaction helpers * [python] add a transaction management layer This provides a framework for managing as many transactions from a single account at once * The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path. * The transaction manager provides async workers that push a transaction from submission through to validating completion Together they provide the basic harness for scaling transaction submission on the Aptos blockchain from a single account. * [python] Add testing coverage * [python] cleaning up with feedback * [docs] update transaction management * [python] add a modest reliablity layer to transaction management this handles all the failures associated with network congestion, meaning this is ready to ship for now... need more testing on other failure cases.... such as intermittent network connectivity, lost connections, bad upstreams. * [python] remove unnecessary python dependencies * [Execution Benchmark] Calibrate Threshold (#8591) * clean AptosVmImpl::new() up a bit * [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576) * Add a zip functions to iterate over 2 vectors concurrently (#8584) * [TF] Add health check for waypoint service in GCP testnet-addons Also, add a default for "gke_maintenance_policy" in the aptos-node-testnet module. * [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443) * use assume to replace require Cointype != Aptos * fix linter --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com> Co-authored-by: aldenhu <msmouse@gmail.com> Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com> Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com> Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
* init * run lint * delete verfiy and add schema * pass linter test * fix lint * merge aptos main * run lint * [python] return int balance of coins * [python] async sleep for async calls * [python] use aptos_account::transfer instead of coin::transfer * [python] expose payload generators for token client * [python] support inserting sequence numbers in transaction helpers * [python] add a transaction management layer This provides a framework for managing as many transactions from a single account at once * The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path. * The transaction manager provides async workers that push a transaction from submission through to validating completion Together they provide the basic harness for scaling transaction submission on the Aptos blockchain from a single account. * [python] Add testing coverage * [python] cleaning up with feedback * [docs] update transaction management * [python] add a modest reliablity layer to transaction management this handles all the failures associated with network congestion, meaning this is ready to ship for now... need more testing on other failure cases.... such as intermittent network connectivity, lost connections, bad upstreams. * [python] remove unnecessary python dependencies * [Execution Benchmark] Calibrate Threshold (#8591) * clean AptosVmImpl::new() up a bit * [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576) * Add a zip functions to iterate over 2 vectors concurrently (#8584) * [TF] Add health check for waypoint service in GCP testnet-addons Also, add a default for "gke_maintenance_policy" in the aptos-node-testnet module. * [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443) * use assume to replace require Cointype != Aptos * fix linter --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com> Co-authored-by: aldenhu <msmouse@gmail.com> Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com> Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com> Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
Description
Make ledger pruner to use separate batch for each data type.
Remove prune_genesis.
Test Plan
Existing UTs.