Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bank: Add function to replace empty account with upgradeable program on feature activation #32783

Merged
merged 33 commits into from
Oct 4, 2023

Conversation

buffalojoec
Copy link
Contributor

@buffalojoec buffalojoec commented Aug 9, 2023

Problem

Currently, replace_program_account will only replace a program's account, but not the program's data account.

In order to complete Step 1 of #32780, we need to be able to replace an empty account with an upgradeable program on feature activation.

Summary of Changes

This PR adds a function replace_empty_account_with_upgradeable_program to replace an empty account - such as Feature111111111 with a program account and a data account, effectively replacing this account with an upgradeable program.

@mergify mergify bot added community Community contribution need:merge-assist labels Aug 9, 2023
@mergify mergify bot requested a review from a team August 9, 2023 23:35
@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #32783 (91bfec7) into master (c9d04bc) will increase coverage by 0.0%.
Report is 1 commits behind head on master.
The diff coverage is 92.8%.

@@           Coverage Diff            @@
##           master   #32783    +/-   ##
========================================
  Coverage    81.7%    81.7%            
========================================
  Files         802      804     +2     
  Lines      217872   218079   +207     
========================================
+ Hits       178144   178344   +200     
- Misses      39728    39735     +7     

@mvines
Copy link
Member

mvines commented Aug 10, 2023

hey @buffalojoec, you should get triage and than write access to the monorepo per https://github.com/solana-labs/contributor-access-policy so that CI will run on your PRs automatically. I'll vouch for ya and I'm pretty sure Jon will too

@mvines mvines added the CI Pull Request is ready to enter CI label Aug 10, 2023
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Aug 10, 2023
Copy link
Contributor

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm thinking through what would happen, I'm worried that this would break things unfortunately.

Let's imagine we have an upgradeable program on address A, and we want to replace it with an upgradeable program on B. Here's the initial situation:

  • A's data is PDA(A)
  • PDA(A)'s data is the old program
  • B's data is PDA(B)
  • PDA(B)'s data is the new program

If we replace A with B, and PDA(A) with PDA(B), then we get:

  • A's data is just PDA(B) <-- Uh oh!
  • PDA(A)'s data is the new program
  • B is gone
  • PDA(B) is gone

So if we want to replace an upgradeable program with another, we only need to replace the account at PDA(A) with PDA(B).

The feature-gate situation is a little bit strange though, since we want to create a whole new program at Feature111111. I'm thinking you will need to replace A with B, but you'll also need to rewrite the data at A to be PDA(A).

Let me know if this makes sense. Could you add some more to the test to reflect the data written into the accounts to check the situation that I'm worried about?

@buffalojoec
Copy link
Contributor Author

Let me know if this makes sense. Could you add some more to the test to reflect the data written into the accounts to check the situation that I'm worried about?

Yeah, I think that makes sense. In other words, replace the data account instead of the program account.

I guess what's more confusing is how this even works to upgrade programs in the current state, without the data account being involved 🧐

Regardless, happy to revise and write some more testing !

@joncinque
Copy link
Contributor

Yeah, I think that makes sense. In other words, replace the data account instead of the program account.

Yes, and also rewrite the program account with the correct address if needed.

I guess what's more confusing is how this even works to upgrade programs in the current state, without the data account being involved 🧐

It only works with non-upgradeable programs, which just have the one account 😉

@buffalojoec
Copy link
Contributor Author

I just pushed a modified method and some fleshed out tests evaluating your points above @joncinque. Let me know if this aligns with what you were thinking.

TL/DR: The test_non_upgradable_program_replace_with_data test is the use case for the Feature Gate program to be deployed at Feature111111.

I've added verbose comments in order to fuel discussion about how to handle each of the four cases identified in the changes.

@buffalojoec
Copy link
Contributor Author

buffalojoec commented Aug 16, 2023

I wanted to create this updated comment to flesh out some of the logic articulated by d646a07.

There are four scenarios where we want to consider how to handle program replacements, now that we're involving the data account as well. Each of these four scenarios has a corresponding test and is handled by some conditional block in the replace_program_account function itself.

Definitions

  • existence: Account currently exists on-chain and is not empty
  • old: The existence of these accounts defines the program's current state
  • new: The existence of these accounts serves to replace the associated current state at these accounts' counterparts

Scenarios

Scenario 1: Program Account Swap

The existing program and the proposed new program both just have program accounts, no data accounts.

Swap the existing program account for the new one.

Current Proposed Result
Old: [Old program data] New: [*New program data] Old: [*New program data]

Scenario 2: Program Account Swap and Create Data

The existing program does not have a data account, but the proposed new one does.

Swap the existing program account for the PDA of the new data account, and create the new data account.

Current Proposed Result
Old: [Old program data] New: PDA(NewData) Old: PDA(OldData)
NewData: [*New program data] OldData: [*New program data]

Scenario 3: Program Account Swap and Remove Data

The existing program has a data account, but the proposed new one does not.

Swap the existing program account for the new program account, and remove the existing data account.

Current Proposed Result
Old: PDA(OldData) New: [*New program data] Old: [*New program data]
OldData: [Old program data]

Scenario 4: Program Account Swap

The existing program and the proposed new program both have data accounts.

Swap the existing data account for the new one, leaving the existing program account (with the PDA of the data account) unchanged.

Current Proposed Result
Old: PDA(OldData) New: PDA(NewData) Old: PDA(OldData)
Old: [Old program data] New: [*New program data] Old: [*New program data]

Do these four scenarios and their articulated results make sense and sound safe?

My initial thoughts:

  • Scenario 1 is the same functionality that exists now
  • Scenario 2 will allow us to create the feature gate program at Feature11111111
  • Scenario 3 seems sketchy
  • Scenario 4 addresses @joncinque's concerns above

@buffalojoec
Copy link
Contributor Author

After offline discussion with @joncinque I've removed Scenario 3 from the supported operations!

Copy link
Contributor

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the right track, thanks for all the comments and tests. Now we just need to make sure it'll cover the feature gate program situation

runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
@buffalojoec
Copy link
Contributor Author

Just pushed up some commits addressing your feedback.

A few questions came to mind while working on them:

  • Do we want to enforce the proper BPF loader owns a "new" account? ie. if we are tasked with replacing a program and data account with another program and data account, should we have any checks that those two new accounts are owned by the upgradable loader?
  • Do we want to enforce that a new program account contains the PDA for it's associated data account? Despite the fact that when both are proposed as replacements, the new program account is all but completely ignored
  • Do we want to keep excess lamports in accounts or burn them? If either type of account is replaced with smaller data, what should happen to those lamports that are excess of rent-exemption?

@joncinque
Copy link
Contributor

Do we want to enforce the proper BPF loader owns a "new" account? ie. if we are tasked with replacing a program and data account with another program and data account, should we have any checks that those two new accounts are owned by the upgradable loader?

We just need to be sure that we're setting the owner that's in the new account, we don't need to check it necessarily. The issue to watch out for is swapping accounts that are on different loaders, and not setting the old account to the new program's loader.

Do we want to enforce that a new program account contains the PDA for it's associated data account? Despite the fact that when both are proposed as replacements, the new program account is all but completely ignored

We could simply read the data from the new program account and use that rather than every dealing with the new PDA, but it's up to you. Ignoring is probably fine.

Do we want to keep excess lamports in accounts or burn them? If either type of account is replaced with smaller data, what should happen to those lamports that are excess of rent-exemption?

Burn 'em! no need to have more lamports than necessary

Side note: It's funny, I'm just realizing that scenario 3 is possible with loader v4, but we don't need to address it here.

@joncinque
Copy link
Contributor

Looking through the code and all the cases, I'm starting to get cold feet about a generic replace function. This code is all really dangerous since it goes around normal consensus to move data and lamports around, so it should be as simple and limited as possible. All of the possible branches make me worry that we're going to miss a case and end up taking down the network because the capitalization is wrong.

What do you think about keeping the refactor that you've done with replace_account and creating a new function like replace_empty_account_with_upgradeable_program for the Feature Gate program? Maybe later we can add a more general replace_program, which can call into the more specific versions as needed.

I do appreciate all of the great work and all of the tests, but I think we should be as precise as possible to support only the cases that we need to. What do you think?

@buffalojoec
Copy link
Contributor Author

What do you think about keeping the refactor that you've done with replace_account and creating a new function like replace_empty_account_with_upgradeable_program for the Feature Gate program? Maybe later we can add a more general replace_program, which can call into the more specific versions as needed.

Yeah it got sketchy real quick with all the different scenarios.

Where is this function invoked anyway? I guess I didn't realize creating a new separate bank method was on the table

@joncinque
Copy link
Contributor

Where is this function invoked anyway? I guess I didn't realize creating a new separate bank method was on the table

You can see how it's used in #31764 in apply_feature_activations, you'll need to add a new feature gate which replaces Feature11111... with another program address that contains a no-op program

@buffalojoec buffalojoec force-pushed the bank-program-replace branch 3 times, most recently from 6ec0e55 to afcfa55 Compare August 18, 2023 17:40
@buffalojoec
Copy link
Contributor Author

buffalojoec commented Aug 18, 2023

This should be ready for another look. Let me know if I should squash and put up a new PR.

No-op program:

Feature:

  • 8GdovDzVwWU5edz2G697bbB7GZjrUc6aQZLWyNNAtHdg

Copy link
Contributor

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was so much easier to review, thanks a ton for reducing the scope!

I looked at the program you uploaded, and even though the extend program instruction is meant to go out with in 1.16, could you add some huge static string to make the size bigger? That way we can't get stuck with a program data account that's too small.

Also, is it a verifiable build? I'd like it to be possible for anyone in the multisig to check it to make sure it's correct before signing off on an upgrade.

sdk/src/feature_set.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
runtime/src/bank/tests.rs Outdated Show resolved Hide resolved
runtime/src/inline_feature_gate_program.rs Outdated Show resolved Hide resolved
@buffalojoec
Copy link
Contributor Author

buffalojoec commented Oct 4, 2023

Great! warning handling in bank.rs looks good. Some suggestions to make the replace_account methods more readable, plus the clippy lint. Then I think this is good to land!

Nice suggestions! Made things much cleaner. I also added another test!

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the polish!

@buffalojoec buffalojoec merged commit 25460f7 into solana-labs:master Oct 4, 2023
41 checks passed
mergify bot pushed a commit that referenced this pull request Oct 4, 2023
…on feature activation (#32783)

* replace program account

* modify for all cases

* remove non-data swap

* address tests & conditional feedback

* get the rent involved

* mix in owner & executable

* feature-related cases

* stripped back to feature-specific case only

* added feature

* address initial feedback

* added more lamport checks

* condense tests

* using test_case

* add fail cases to tests

* more cleanup

* add verifiably built program

* update program account state

* cleaned up serializing logic

* use full word capitalization

* rename old & new to dst & src

* swap src and dst in parameters

* add warnings and errors

* rename feature to programify

* test suite description clarity

* remove strings from datapoints

* spell out source and destination

* more verbose comments in account replace functions

* move lamport calculation

* swap lamport check for state check

* move replace functions to helper module

* make replace_account methods fallible

* refactor error handling

* add test for source program state

(cherry picked from commit 25460f7)
buffalojoec added a commit that referenced this pull request Oct 5, 2023
…rogram on feature activation (backport of #32783) (#33527)

Bank: Add function to replace empty account with upgradeable program on feature activation (#32783)

* replace program account

* modify for all cases

* remove non-data swap

* address tests & conditional feedback

* get the rent involved

* mix in owner & executable

* feature-related cases

* stripped back to feature-specific case only

* added feature

* address initial feedback

* added more lamport checks

* condense tests

* using test_case

* add fail cases to tests

* more cleanup

* add verifiably built program

* update program account state

* cleaned up serializing logic

* use full word capitalization

* rename old & new to dst & src

* swap src and dst in parameters

* add warnings and errors

* rename feature to programify

* test suite description clarity

* remove strings from datapoints

* spell out source and destination

* more verbose comments in account replace functions

* move lamport calculation

* swap lamport check for state check

* move replace functions to helper module

* make replace_account methods fallible

* refactor error handling

* add test for source program state

(cherry picked from commit 25460f7)

Co-authored-by: Joe C <jcaulfield135@gmail.com>
Comment on lines +8064 to +8065
&feature::id(),
&inline_feature_gate_program::noop_program::id(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these arguments swapped? My impression is that this code is trying to copy code from somewhere into the account with id of feature::id() but this code uses feature_id() as the source instead of the destination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do appear to be swapped. Great catch, thanks! 😅

#33894

@jstarry
Copy link
Member

jstarry commented Oct 27, 2023

I personally don't think this approach is a good idea. I think the feature program should be added as a builtin program rather than an upgradeable program. The feature program is very sensitive and managing the upgrade authority for it sounds like a headache (speaking of, I'm guessing the upgrade authority keypair (4ktVfu5Kfw6kWYRhVjLfBhpEicgX8XekXLZgmooJm5Tu) is currently owned by @buffalojoec?). I think it's better to update the program using feature gates. That way we don't need to verify the bytecode every time it's upgraded.. and if the program is a builtin program, everyone can read the code easily.

This approach also adds a lot of scary methods like Bank::move_account, Bank::replace_empty_account_with_upgradeable_program, and Bank::replace_non_upgradeable_program_account which don't have adequate test coverage right now and could easily be misused in the future. These code paths are not on the hot path so it's easy for other contributors to forget they exist when adding/changing behavior on the hot path.

I strongly think this change should be reverted.

@buffalojoec
Copy link
Contributor Author

I personally don't think this approach is a good idea. I think the feature program should be added as a builtin program rather than an upgradeable program.
...

Some fair points for sure, thanks for weighing in!

I've actually pulled the feature activation from the schedule in order to open up the conversation on this exact concern to a SIMD. We had similar feedback last week on the community call and in some of the SIMD 0072 reviews for the architecture this PR sets up a base for.

I'd love to have some more discussion on your thoughts here in SIMD 0077, which corresponds to this change!

@jstarry
Copy link
Member

jstarry commented Oct 27, 2023

I can weigh in later, but for now, I'd like these changes to be reverted due to the lack of test coverage for the new bank methods added here. Any objection to that?

@buffalojoec
Copy link
Contributor Author

I can weigh in later, but for now, I'd like these changes to be reverted due to the lack of test coverage for the new bank methods added here. Any objection to that?

No, not from me! Seems like a good idea considering we have to wait for the SIMD to be approved now anyway.

@jstarry
Copy link
Member

jstarry commented Oct 27, 2023

Yeah that's fair too, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution do-not-close Add this tag to exempt an issue/PR from being closed by the stalebot need:merge-assist v1.17 PRs that should be backported to v1.17
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants