Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: make RuntimeExt: Send #11634

Merged
merged 6 commits into from
Jun 21, 2024
Merged

refactor: make RuntimeExt: Send #11634

merged 6 commits into from
Jun 21, 2024

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Jun 20, 2024

In a world where we have pipelined compilation, instantiation and execution, VMLogic will have to move between threads, which requires that it becomes Send. It in turn has required some other types to become not only Send but also Sync due to them currently being stored as a & reference (which allows for multiple copies, there are better places to explain why Sync becomes necessary here...)

I'm not sure if all of these types will continue requiring Sync. In particular TrieUpdate that's stored in RuntimeExt is now by reference, but I eventually want to also make VMLogic: 'static, which would require finding some owning pointer structure that would work for TrieUpdate... Or I might be able to use scoped threads... in which case we're looking at Sync anyway...

I think the changes here are largely straightforward enough, but overall things are shaping to be pretty involved, eh?

Part of #11319

In a world where we have pipelined compilation, instantiation and
execution, `VMLogic` will have to move between threads, which requires
that it becomes `Send`. It in turn has required some other types to
become not only `Send` but also `Sync` due to them currently being
stored as a `&` reference (which allows for multiple copies, there are
better places to explain why `Sync` becomes necessary here...)

I think the changes here are largely straightforward enough, but overall
things are shaping to be pretty involved, eh?
@nagisa nagisa requested review from Longarithm and Ekleog-NEAR June 20, 2024 15:54
@nagisa nagisa requested a review from a team as a code owner June 20, 2024 15:54
Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing ownership for TrieUpdate looks challenging. I didn't fully understand the plan though. Before execution it is not needed; if execution for the same shard will happen in parallel, then one must add support of parallel read-write access to prospective and committed changes.

@nagisa
Copy link
Collaborator Author

nagisa commented Jun 21, 2024

I don't anticipate parallel writes to TrieUpdate at all. The pipelined compilation/instantiation only needs access to potentially read out the contract code. While this may happen in a different thread, the timeline is strictly disjoint between threads seeing that specific TrieUpdate...

In fact ideally we wouldn't deal with TrieUpdates here at all, but instantiation requires us to construct a full-featured VMLogic that is going to be used for contract execution (including the entirety of the knowledge of how to read and write to the storage), so it kinda needs to exist early on...

@nagisa nagisa enabled auto-merge June 21, 2024 08:30
Copy link

codecov bot commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 79.14439% with 39 lines in your changes missing coverage. Please review.

Project coverage is 71.64%. Comparing base (e5ad448) to head (571a314).
Report is 11 commits behind head on master.

Files Patch % Lines
tools/state-parts-dump-check/src/cli.rs 0.00% 6 Missing ⚠️
tools/state-viewer/src/commands.rs 0.00% 5 Missing ⚠️
chain/client/src/debug.rs 0.00% 4 Missing ⚠️
chain/client/src/sync/state.rs 50.00% 0 Missing and 3 partials ⚠️
chain/epoch-manager/src/lib.rs 89.28% 2 Missing and 1 partial ⚠️
chain/chain/src/test_utils/kv_runtime.rs 80.00% 2 Missing ⚠️
core/store/src/trie/prefetching_trie_storage.rs 33.33% 2 Missing ⚠️
runtime/runtime/src/ext.rs 88.88% 2 Missing ⚠️
tools/fork-network/src/cli.rs 0.00% 2 Missing ⚠️
tools/state-viewer/src/latest_witnesses.rs 0.00% 2 Missing ⚠️
... and 8 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11634      +/-   ##
==========================================
+ Coverage   71.46%   71.64%   +0.17%     
==========================================
  Files         788      787       -1     
  Lines      160921   160945      +24     
  Branches   160921   160945      +24     
==========================================
+ Hits       115008   115308     +300     
+ Misses      40893    40594     -299     
- Partials     5020     5043      +23     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.36% <0.00%> (+<0.01%) ⬆️
integration-tests 37.76% <64.17%> (+0.14%) ⬆️
linux 69.05% <77.00%> (+0.15%) ⬆️
linux-nightly 71.11% <79.14%> (+0.15%) ⬆️
macos 52.64% <51.07%> (+0.06%) ⬆️
pytests 1.59% <0.00%> (+<0.01%) ⬆️
sanity-checks 1.39% <0.00%> (+<0.01%) ⬆️
unittests 66.21% <66.84%> (+0.08%) ⬆️
upgradability 0.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the new locks makes me a bit sad, but we can come back to them later and see whether there’s a chance to remove them when we have more time :)

@@ -307,7 +306,7 @@ pub trait TrieStorage {
#[derive(Default)]
pub struct TrieMemoryPartialStorage {
pub(crate) recorded_storage: HashMap<CryptoHash, Arc<[u8]>>,
pub(crate) visited_nodes: RefCell<HashSet<CryptoHash>>,
pub(crate) visited_nodes: std::sync::RwLock<HashSet<CryptoHash>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling would have been Mutex to be a better match here, as it’s probably write-many-read-once. But I didn’t check all the uses and it’s likely unrelevant optimization anyway, so I’ll just leave that as an informational message if you want to think more about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was that RwLock is a direct match to RefCell with regards to its API surface (borrow => read/borrow_mut => write.) We may indeed be able to revert this eventually; we'll see.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM :)

core/store/src/trie/trie_tests.rs Show resolved Hide resolved
@nagisa nagisa added this pull request to the merge queue Jun 21, 2024
Merged via the queue into near:master with commit b6f6288 Jun 21, 2024
24 of 25 checks passed
@nagisa nagisa deleted the sends-RuntimeExt branch June 21, 2024 14:15
marcelo-gonzalez added a commit that referenced this pull request Jul 8, 2024
This was added in #11634 and
is required by the previous cherry pick of #11689 (36ab9db)
because otherwise the build fails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants