Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Reorganize calculation and commit tasks in save_transactions/save_transaction_block. #8664

Merged
merged 1 commit into from
Jun 14, 2023

Conversation

grao1991
Copy link
Contributor

Description

To get a better performance, still have room to tune/improve.

Test Plan

@grao1991 grao1991 requested review from zekun000 and areshand June 14, 2023 19:42
@grao1991 grao1991 marked this pull request as ready for review June 14, 2023 19:42
Copy link
Contributor

@msmouse msmouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks legit

.par_chunks(chunk_size)
.enumerate()
.try_for_each(|(chunk_index, txns_in_chunk)| -> Result<()> {
let batch = SchemaBatch::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really matter to use multiple batches? (and why does it matter only for transactions?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ~10% faster in my benchmark.. (I only did it for txn because it is relatively larger comparing to other data types)

btw, I don't like it.

.with_label_values(&["save_transactions__work"])
.start_timer();
// TODO(grao): Write progress for each of the following databases, and handle the
// inconsistency at the startup time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this pr, but we can't recover if it crashes in the middle? or we just ignore anything that's not covered by ledger info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a TODO to truncate those versions that are larger than ledger info version. It's probably not that critical if we assume there is no fork.

(It seems there are some places that call get_latest_transaction_info_option to get the latest version, we probably want to get the ledger info version instead, I will work on it in a separate PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate is not a high pri, but what happens if txn has a different version than event for example. I thought state sync is using the last transaction info to fetch instead of last ledger info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a problem I was saying in the "()" above. I need to get it done before 1.6 cut. (it's not in the 1.5)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the ledger info version instead

the overall progress, you mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ^

@grao1991 grao1991 enabled auto-merge (squash) June 14, 2023 22:28
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on 5f098d329bf531005a868c8f829c343fedda8750

performance benchmark : committed: 5490 txn/s, latency: 7237 ms, (p50: 5200 ms, p90: 12900 ms, p99: 27200 ms), latency samples: 2344400
Max round gap was 1 [limit 4] at version 615660. Max no progress secs was 3.380342 [limit 10] at version 913212.
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 5f098d329bf531005a868c8f829c343fedda8750

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 5f098d329bf531005a868c8f829c343fedda8750 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : committed: 9168 txn/s, latency: 3737 ms, (p50: 3700 ms, p90: 5200 ms, p99: 6700 ms), latency samples: 311720
2. Upgrading first Validator to new version: 5f098d329bf531005a868c8f829c343fedda8750
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4627 txn/s, latency: 7007 ms, (p50: 7600 ms, p90: 8500 ms, p99: 8800 ms), latency samples: 175860
3. Upgrading rest of first batch to new version: 5f098d329bf531005a868c8f829c343fedda8750
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5007 txn/s, latency: 6435 ms, (p50: 7200 ms, p90: 7900 ms, p99: 8300 ms), latency samples: 185260
4. upgrading second batch to new version: 5f098d329bf531005a868c8f829c343fedda8750
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 6795 txn/s, latency: 4750 ms, (p50: 4500 ms, p90: 7000 ms, p99: 9600 ms), latency samples: 251420
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 5f098d329bf531005a868c8f829c343fedda8750 passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 5f098d329bf531005a868c8f829c343fedda8750

Compatibility test results for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 5f098d329bf531005a868c8f829c343fedda8750 (PR)
Upgrade the nodes to version: 5f098d329bf531005a868c8f829c343fedda8750
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 3043 txn/s, latency: 7314 ms, (p50: 8100 ms, p90: 9800 ms, p99: 11400 ms), latency samples: 167380
5. check swarm health
Compatibility test for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 5f098d329bf531005a868c8f829c343fedda8750 passed
Test Ok

@grao1991 grao1991 merged commit 42e7c40 into main Jun 14, 2023
@grao1991 grao1991 deleted the grao_commit branch June 14, 2023 23:08
vusirikala pushed a commit that referenced this pull request Jun 21, 2023
banool pushed a commit that referenced this pull request Jul 7, 2023
xbtmatt pushed a commit to xbtmatt/aptos-core that referenced this pull request Jul 25, 2023
xbtmatt pushed a commit to xbtmatt/aptos-core that referenced this pull request Jul 25, 2023
gedigi pushed a commit that referenced this pull request Aug 2, 2023

let ledger_batch = SchemaBatch::new();
self.save_ledger_info(new_root_hash, ledger_info_with_sigs, &ledger_batch)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing in the new state sync flow (until a ledger info is saved)?

.unwrap();
self.ledger_db
.metadata_db()
.write_schemas(ledger_schema_batches.ledger_metadata_batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. this already makes the LedgerCommitProgress meaningless?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants