Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually add lookup table addresses instead of sanitizing #33273

Merged
merged 11 commits into from
Oct 4, 2023

Conversation

apfitzge
Copy link
Contributor

@apfitzge apfitzge commented Sep 15, 2023

Problem

#30165

Summary of Changes

Manually load and add lookup table addresses instead of using sanitize.

Fixes #30165

@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Merging #33273 (52fbcde) into master (c6ee69e) will increase coverage by 0.0%.
Report is 1 commits behind head on master.
The diff coverage is 0.0%.

@@           Coverage Diff           @@
##           master   #33273   +/-   ##
=======================================
  Coverage    81.7%    81.7%           
=======================================
  Files         802      803    +1     
  Lines      217835   217869   +34     
=======================================
+ Hits       178125   178158   +33     
- Misses      39710    39711    +1     

@apfitzge
Copy link
Contributor Author

Was able to test this with a script that spammed creating and using an ALT. This makes it so that some ALTs used in txs do not exist at our beginning state but are used in a transaction. That transaction cannot be sanitized with the state the beginning slot bank has.

To create the snapshot: ./cargo run --release --bin solana-ledger-tool -- --ledger ./test-ledger create-snapshot --minimized --ending-slot 670 607

Master:

[2023-09-18T19:01:58.431801175Z INFO  solana_core::accounts_hash_verifier] AccountsHashVerifier has stopped
[2023-09-18T19:01:58.436755228Z INFO  solana_metrics::metrics] datapoint: calculate_accounts_hash_from_index accounts_scan=366i hash=278i hash_total=402i collect=2470i
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', thread 'ledger/src/blockstore.rs<unnamed>:' panicked at '2943called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound:', 82ledger/src/blockstore.rs
:note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rs:2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rs:2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rsthread ':<unnamed>2943' panicked at ':called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound82',
ledger/src/blockstore.rs:2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rs:2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rs:2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rs:2943:82
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: AddressLookupTableNotFound', ledger/src/blockstore.rs:2943:82

Branch:

[2023-09-18T19:11:44.732909031Z INFO  solana_ledger_tool] ledger tool took 1.4s

@apfitzge apfitzge marked this pull request as ready for review September 18, 2023 19:13
};

for address in lookup_table.addresses.iter() {
result.insert(*address);
Copy link
Member

@ryoqun ryoqun Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will work for newly created alts.

however, if the existing alt is updated to contain more entries in the slot range, i think this will make minimized snapshot unreplayable silently, if there's a alt-ed tx using the newly-added entries.

i think this is confusing and that's why i originally took the .fully_verify_transaction() approach. if Ok() is returned, this guarantees that tx is at least loadable.

that said, i see the merit of more lenient behavior like for benches.

how about plumbing a flag --non-strict-alt-scanning and panic! as before by default with mention of the flag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wish ExtendLookupTable accepted addresses in the form of account references, not instruction data with serialized Vec<Pubkey>....

if so, things would be simpler...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, manually parsing ixes in a tx won't work. that's because ExtendLookupTable could be invoked via cpi....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh didn't realize it was through ix data.
I can scan for extend ixs and manually add those as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wish ExtendLookupTable accepted addresses in the form of account references, not instruction data with serialized Vec....

Agree this would make things much simpler (for us). But I can understand why it was not doen that way - since there's no need to load those, possibly large, accounts to add them to an ALT.

Copy link
Contributor Author

@apfitzge apfitzge Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about plumbing a flag --non-strict-alt-scanning and panic! as before by default with mention of the flag?

I think these are common enough that I'd like to consider some alternatives first, because otherwise users will use default and hit this often, or have to run twice.

I'm more in favor of doing our best, and logging a warning if we detect a possible issue...which I think is possible?

@ryoqun what do you think about this:

  1. We manually scan for "simple" extension instructions
  • any txs w/ native-only programs/ixs
  • we can directly rip pubkeys from the extension ix data
  1. We try detecting potential CPI:
  • even if they CPI the ALT extension, the tx must have the ALT program as an input
  • if we see some tx which uses non-native programs & has the ALT program this means it's possible it will CPI to extend.
  • if we detect this case, we log some warning (at the end) with a description

I think this makes the usage a lot easier and robust than it currently is. And it still allows us to print a warning that it may not work, rather than just silently not working.
I'm not sure how common CPI extensions are, but I think we certainly can do better for "simple" ALT create & extends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun what do you think about this:
...

sounds nice. a little concern is that people won't notice those warnings as ledger-tool create-snapshot is already quite verbose. maybe we can arrange to display the warning at the very last of command termination for maximum discoverability as you said.

Comment on lines +588 to +590
address_lookup_table::program::id(),
loader_v4::id(),
stake::program::id(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found some missing native programs that should have always been in this list 😦

@apfitzge
Copy link
Contributor Author

Rebased on master to fix the cargo audit for quinn-proto 0.10.4

ledger-tool/src/main.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
Comment on lines 33 to 41
if !transaction
.get_message()
.message
.static_account_keys()
.iter()
.any(address_lookup_table::program::check_id)
{
return ScanResult::NotFound;
}
Copy link
Member

@ryoqun ryoqun Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly, this isn't precise.. the alt address itself could be loaded via another existing alt. this is okay as long as non-cpi ix isn't touching the alt-loaded alt program. cpi ix is allowed to execute alt-loaded programs (on-chain & native, including the alt).

Copy link
Member

@ryoqun ryoqun Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about SanitizedTransaction construction as was before then only fall back to static_account_keys()?

this will reduce otherwise unavoidable false positive warnings greatly...

let account_keys = bank
    .fully_verify_transaction(transaction)
    .map(|tx| tx.message.account_keys()).unwrap_or_else(|| {
        possibly_incomplete = true;
        transaction.get_message().message.static_account_keys()
    });
if !possibly_incompete &&
    account_keys.iter().any(address_lookup_table::program::check_id) {
    return ScanResult::NotFound;
}
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something similar, outside the scanning.
i.e. we only scan if we cannot verify.

However...it requires cloning the tx because fully_verify_transaction consumes.

Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this...

I'm moderately sure you're starting to realize why I didn't do this originally. xD

.iter()
.any(address_lookup_table::program::check_id)
{
return ScannedLookupTableExtensions::default();
Copy link
Member

@ryoqun ryoqun Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is really edge of edge case.

but, i think we still need to return from here with possibly_incomplete: true conditionally for the possibility of alt extension via cpi, even considering that this scanning now only occurs when fully_verify_transaction() failed.

that's because the possibility of cpi-based alt extension cannot be negated unless all outermost (non-cpi) iexes are of native programs.

how about removing this if itself and remove #[derive(Default)] from ScannedLookupTableExtensions?

considering this is now fallback codepath, false positive warning should be rare... and i'd really avoid false negatives as there's no more fearful thing than unwarned bank hash mismatches.. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's true. With the change from enum to a struct, we no longer need this if at all.
Simplifies implementation, and makes the check more correct.

ryoqun
ryoqun previously approved these changes Oct 3, 2023
Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nit

thanks for improving on my past lazy work.

although i wrote somewhat long nit code review comment, i don't want to block this pr anymore for my super picky nit. ;)

@apfitzge
Copy link
Contributor Author

apfitzge commented Oct 3, 2023

although i wrote somewhat long nit code review comment, i don't want to block this pr anymore for my super picky nit. ;)

I removed the if in the function, and Default impl. Only change since your last review...however, I rebased since there were a few cargo vulnerabilities which were fixed on master and flagging the CI.

Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

much much better. :)

thanks very much for addressing multi round of code reviews..

@apfitzge apfitzge merged commit 5a95e56 into solana-labs:master Oct 4, 2023
41 checks passed
@apfitzge apfitzge deleted the bugfix/minimized_alts branch October 4, 2023 15:04
tao-stones pushed a commit to tao-stones/solana that referenced this pull request Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ledger-tool snapshot --minimized option may break on ALT updates in slot range
2 participants