Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage][Sharding][Pruner] Refactor ledger pruner and cleanups. #8443

Merged
merged 1 commit into from
Jun 9, 2023

Conversation

grao1991
Copy link
Contributor

@grao1991 grao1991 commented May 31, 2023

Description

Make ledger pruner to use separate batch for each data type.
Remove prune_genesis.

Test Plan

Existing UTs.

@grao1991 grao1991 force-pushed the grao_split_ledger_db branch 7 times, most recently from 84a57f4 to 1052bdb Compare June 1, 2023 20:58
@grao1991 grao1991 marked this pull request as ready for review June 1, 2023 21:02
@grao1991 grao1991 requested a review from areshand June 1, 2023 21:02
@grao1991 grao1991 force-pushed the grao_split_ledger_db branch from eb6da45 to e991a3a Compare June 5, 2023 19:14
@grao1991 grao1991 requested a review from zekun000 June 5, 2023 20:20
Comment on lines 72 to 75
self.record_progress_impl(
current_batch_target_version,
/*has_pending_work_before_min_readable_version=*/ true,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give it its own name, maybe record_pending_progress(), to be symmetric with record_progress(), or just use the same record_progress_impl() call in both places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

PRUNER_VERSIONS
.with_label_values(&["ledger_pruner", "min_readable"])
.set(min_readable_version as i64);
fn target_version(&self) -> Version {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this back up before record_progress()? (why change the order?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 23 to 31
let stored_min_readable_version = self.min_readable_version.load(Ordering::SeqCst);
ensure!(
min_readable_version == stored_min_readable_version || stored_min_readable_version == 0,
"Min readable version for TransactionAccumulator doesn't match, {min_readable_version} vs {stored_min_readable_version}.",
);

self.min_readable_version
.store(target_version, Ordering::SeqCst);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract a helper function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 60 to 70
let min_readable_version = if let Some(v) = transaction_accumulator_db
.get::<DbMetadataSchema>(&DbMetadataKey::TransactionAccumulatorPrunerProgress)?
{
v.expect_version()
} else {
transaction_accumulator_db.put::<DbMetadataSchema>(
&DbMetadataKey::TransactionAccumulatorPrunerProgress,
&DbMetadataValue::Version(metadata_min_readable_version),
)?;
metadata_min_readable_version
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract a helper function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 84 to 91
self.min_readable_version
.store(min_readable_version, Ordering::SeqCst);
batch.put::<DbMetadataSchema>(
&DbMetadataKey::TransactionAccumulatorPrunerProgress,
&DbMetadataValue::Version(min_readable_version),
)?;

Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract a helper function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

write_set_pruner: Arc<dyn DBSubPruner + Send + Sync>,

// (min_readable_version, has_pending_work_before_min_readable_version)
progress: Mutex<(Version, bool)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe try to eliminate the need for the boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@grao1991 grao1991 force-pushed the grao_split_ledger_db branch 3 times, most recently from 75088d6 to 4606761 Compare June 7, 2023 23:36
@grao1991 grao1991 changed the title [Storage][Sharding][Pruner] Restructure ledger pruner. [Storage][Sharding][Pruner] Refactoring. Jun 7, 2023
@grao1991 grao1991 changed the title [Storage][Sharding][Pruner] Refactoring. [Storage][Sharding][Pruner] Refactor ledger pruner and cleanups. Jun 7, 2023
@grao1991 grao1991 force-pushed the grao_split_ledger_db branch 2 times, most recently from 544c195 to ffd75a8 Compare June 8, 2023 00:59
@grao1991 grao1991 requested a review from msmouse June 8, 2023 00:59
fn prune(&self, min_readable_version: Version, target_version: Version) -> anyhow::Result<()>;

// Returns the progress of the pruner.
fn progress(&self) -> Version;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with min_readable_version()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to have two concepts, progress is updated after all work is done, while min_readable_version is updated before (managed by the managers)

@@ -82,3 +80,21 @@ where
{
Ok(get_progress(state_merkle_db.metadata_db(), &S::tag())?.unwrap_or(0))
}

pub(crate) fn get_and_maybe_update_ledger_subpruner_progress(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"get_or_initialize_ledger_subpruner_progress"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ledger_metadata_db.get::<DbMetadataSchema>(&DbMetadataKey::LedgerPrunerProgress)?
{
v.expect_version()
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document in what case the progress is not written

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually quite sure if there is a such case, since we've started writing this progress quite long time ago. Just want to play on a safer side to have this fallback logic.

event_store_pruner: Arc<dyn DBSubPruner + Send + Sync>,
write_set_pruner: Arc<dyn DBSubPruner + Send + Sync>,

progress: AtomicVersion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with min_readable_version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, the pending version is theoretically not readable.

return Ok(self.min_readable_version());
}
fn prune(&self, max_versions: usize) -> Result<Version> {
let mut progress = self.progress();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why tracking this separately instead of proxying the metadata progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress is updated after all sub pruner finish.

@grao1991 grao1991 force-pushed the grao_split_ledger_db branch 2 times, most recently from a5a3e11 to 25e347e Compare June 8, 2023 21:51
Copy link
Contributor

@msmouse msmouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@grao1991 grao1991 force-pushed the grao_split_ledger_db branch from 25e347e to 0bf0cc4 Compare June 8, 2023 23:42
@grao1991 grao1991 enabled auto-merge (squash) June 8, 2023 23:42
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 0bf0cc47e96e10498602f3b61159eedaa4488416

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 0bf0cc47e96e10498602f3b61159eedaa4488416 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : committed: 9217 txn/s, latency: 3664 ms, (p50: 3600 ms, p90: 4900 ms, p99: 6100 ms), latency samples: 304180
2. Upgrading first Validator to new version: 0bf0cc47e96e10498602f3b61159eedaa4488416
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4309 txn/s, latency: 7339 ms, (p50: 7500 ms, p90: 10500 ms, p99: 10700 ms), latency samples: 168080
3. Upgrading rest of first batch to new version: 0bf0cc47e96e10498602f3b61159eedaa4488416
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4383 txn/s, latency: 7285 ms, (p50: 7300 ms, p90: 10000 ms, p99: 10700 ms), latency samples: 162200
4. upgrading second batch to new version: 0bf0cc47e96e10498602f3b61159eedaa4488416
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 6982 txn/s, latency: 4694 ms, (p50: 4800 ms, p90: 6800 ms, p99: 8700 ms), latency samples: 244380
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 0bf0cc47e96e10498602f3b61159eedaa4488416 passed
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

✅ Forge suite land_blocking success on 0bf0cc47e96e10498602f3b61159eedaa4488416

performance benchmark : committed: 5716 txn/s, submitted: 5717 txn/s, expired: 1 txn/s, latency: 6907 ms, (p50: 6000 ms, p90: 9600 ms, p99: 19900 ms), latency samples: 2440929
Max round gap was 1 [limit 4] at version 778271. Max no progress secs was 3.767721 [limit 10] at version 1471217.
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

✅ Forge suite framework_upgrade success on aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 0bf0cc47e96e10498602f3b61159eedaa4488416

Compatibility test results for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 0bf0cc47e96e10498602f3b61159eedaa4488416 (PR)
Upgrade the nodes to version: 0bf0cc47e96e10498602f3b61159eedaa4488416
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 4846 txn/s, latency: 6564 ms, (p50: 6600 ms, p90: 9200 ms, p99: 16300 ms), latency samples: 184160
5. check swarm health
Compatibility test for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 0bf0cc47e96e10498602f3b61159eedaa4488416 passed
Test Ok

@grao1991 grao1991 merged commit 13aaec4 into main Jun 9, 2023
@grao1991 grao1991 deleted the grao_split_ledger_db branch June 9, 2023 00:24
UIZorrot pushed a commit to movebit/aptos-core that referenced this pull request Jun 9, 2023
rahxephon89 pushed a commit that referenced this pull request Jun 12, 2023
* init

* run lint

* delete verfiy and add schema

* pass linter test

* fix lint

* merge aptos main

* run lint

* [python] return int balance of coins

* [python] async sleep for async calls

* [python] use aptos_account::transfer instead of coin::transfer

* [python] expose payload generators for token client

* [python] support inserting sequence numbers in transaction helpers

* [python] add a transaction management layer

This provides a framework for managing as many transactions from a
single account at once
* The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path.
* The transaction manager provides async workers that push a transaction from submission through to validating completion
Together they provide the basic harness for scaling transaction
submission on the Aptos blockchain from a single account.

* [python] Add testing coverage

* [python] cleaning up with feedback

* [docs] update transaction management

* [python] add a modest reliablity layer to transaction management

this handles all the failures associated with network congestion,
meaning this is ready to ship for now... need more testing on other
failure cases.... such as intermittent network connectivity, lost
connections, bad upstreams.

* [python] remove unnecessary python dependencies

* [Execution Benchmark] Calibrate Threshold (#8591)

* clean AptosVmImpl::new() up a bit

* [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576)

* Add a zip functions to iterate over 2 vectors concurrently (#8584)

* [TF] Add health check for waypoint service in GCP testnet-addons

Also, add a default for "gke_maintenance_policy" in the
aptos-node-testnet module.

* [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443)

* use assume to replace require Cointype != Aptos

* fix linter

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com>
Co-authored-by: aldenhu <msmouse@gmail.com>
Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
vusirikala pushed a commit that referenced this pull request Jun 21, 2023
* init

* run lint

* delete verfiy and add schema

* pass linter test

* fix lint

* merge aptos main

* run lint

* [python] return int balance of coins

* [python] async sleep for async calls

* [python] use aptos_account::transfer instead of coin::transfer

* [python] expose payload generators for token client

* [python] support inserting sequence numbers in transaction helpers

* [python] add a transaction management layer

This provides a framework for managing as many transactions from a
single account at once
* The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path.
* The transaction manager provides async workers that push a transaction from submission through to validating completion
Together they provide the basic harness for scaling transaction
submission on the Aptos blockchain from a single account.

* [python] Add testing coverage

* [python] cleaning up with feedback

* [docs] update transaction management

* [python] add a modest reliablity layer to transaction management

this handles all the failures associated with network congestion,
meaning this is ready to ship for now... need more testing on other
failure cases.... such as intermittent network connectivity, lost
connections, bad upstreams.

* [python] remove unnecessary python dependencies

* [Execution Benchmark] Calibrate Threshold (#8591)

* clean AptosVmImpl::new() up a bit

* [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576)

* Add a zip functions to iterate over 2 vectors concurrently (#8584)

* [TF] Add health check for waypoint service in GCP testnet-addons

Also, add a default for "gke_maintenance_policy" in the
aptos-node-testnet module.

* [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443)

* use assume to replace require Cointype != Aptos

* fix linter

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com>
Co-authored-by: aldenhu <msmouse@gmail.com>
Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
banool pushed a commit that referenced this pull request Jul 7, 2023
* init

* run lint

* delete verfiy and add schema

* pass linter test

* fix lint

* merge aptos main

* run lint

* [python] return int balance of coins

* [python] async sleep for async calls

* [python] use aptos_account::transfer instead of coin::transfer

* [python] expose payload generators for token client

* [python] support inserting sequence numbers in transaction helpers

* [python] add a transaction management layer

This provides a framework for managing as many transactions from a
single account at once
* The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path.
* The transaction manager provides async workers that push a transaction from submission through to validating completion
Together they provide the basic harness for scaling transaction
submission on the Aptos blockchain from a single account.

* [python] Add testing coverage

* [python] cleaning up with feedback

* [docs] update transaction management

* [python] add a modest reliablity layer to transaction management

this handles all the failures associated with network congestion,
meaning this is ready to ship for now... need more testing on other
failure cases.... such as intermittent network connectivity, lost
connections, bad upstreams.

* [python] remove unnecessary python dependencies

* [Execution Benchmark] Calibrate Threshold (#8591)

* clean AptosVmImpl::new() up a bit

* [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576)

* Add a zip functions to iterate over 2 vectors concurrently (#8584)

* [TF] Add health check for waypoint service in GCP testnet-addons

Also, add a default for "gke_maintenance_policy" in the
aptos-node-testnet module.

* [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443)

* use assume to replace require Cointype != Aptos

* fix linter

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com>
Co-authored-by: aldenhu <msmouse@gmail.com>
Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
vusirikala added a commit that referenced this pull request Jul 25, 2023
* init

* run lint

* delete verfiy and add schema

* pass linter test

* fix lint

* merge aptos main

* run lint

* [python] return int balance of coins

* [python] async sleep for async calls

* [python] use aptos_account::transfer instead of coin::transfer

* [python] expose payload generators for token client

* [python] support inserting sequence numbers in transaction helpers

* [python] add a transaction management layer

This provides a framework for managing as many transactions from a
single account at once
* The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path.
* The transaction manager provides async workers that push a transaction from submission through to validating completion
Together they provide the basic harness for scaling transaction
submission on the Aptos blockchain from a single account.

* [python] Add testing coverage

* [python] cleaning up with feedback

* [docs] update transaction management

* [python] add a modest reliablity layer to transaction management

this handles all the failures associated with network congestion,
meaning this is ready to ship for now... need more testing on other
failure cases.... such as intermittent network connectivity, lost
connections, bad upstreams.

* [python] remove unnecessary python dependencies

* [Execution Benchmark] Calibrate Threshold (#8591)

* clean AptosVmImpl::new() up a bit

* [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576)

* Add a zip functions to iterate over 2 vectors concurrently (#8584)

* [TF] Add health check for waypoint service in GCP testnet-addons

Also, add a default for "gke_maintenance_policy" in the
aptos-node-testnet module.

* [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443)

* use assume to replace require Cointype != Aptos

* fix linter

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com>
Co-authored-by: aldenhu <msmouse@gmail.com>
Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
gedigi pushed a commit that referenced this pull request Aug 2, 2023
* init

* run lint

* delete verfiy and add schema

* pass linter test

* fix lint

* merge aptos main

* run lint

* [python] return int balance of coins

* [python] async sleep for async calls

* [python] use aptos_account::transfer instead of coin::transfer

* [python] expose payload generators for token client

* [python] support inserting sequence numbers in transaction helpers

* [python] add a transaction management layer

This provides a framework for managing as many transactions from a
single account at once
* The AccountSequenceNumber allocates up to 100 outstanding sequence numbers to maximize the number of concurrent transactions in the happy path.
* The transaction manager provides async workers that push a transaction from submission through to validating completion
Together they provide the basic harness for scaling transaction
submission on the Aptos blockchain from a single account.

* [python] Add testing coverage

* [python] cleaning up with feedback

* [docs] update transaction management

* [python] add a modest reliablity layer to transaction management

this handles all the failures associated with network congestion,
meaning this is ready to ship for now... need more testing on other
failure cases.... such as intermittent network connectivity, lost
connections, bad upstreams.

* [python] remove unnecessary python dependencies

* [Execution Benchmark] Calibrate Threshold (#8591)

* clean AptosVmImpl::new() up a bit

* [Helm] Add charts for kube-state-metrics and prometheus-node-exporter (#8576)

* Add a zip functions to iterate over 2 vectors concurrently (#8584)

* [TF] Add health check for waypoint service in GCP testnet-addons

Also, add a default for "gke_maintenance_policy" in the
aptos-node-testnet module.

* [Storage][Sharding][Pruner] Restructure ledger pruner. (#8443)

* use assume to replace require Cointype != Aptos

* fix linter

---------

Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: danielx <66756900+danielxiangzl@users.noreply.github.com>
Co-authored-by: aldenhu <msmouse@gmail.com>
Co-authored-by: Stelian Ionescu <stelian@aptoslabs.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants