Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Runtime worker threads #7089

Merged
63 commits merged into from
Oct 20, 2020
Merged

Runtime worker threads #7089

63 commits merged into from
Oct 20, 2020

Conversation

NikVolf
Copy link
Contributor

@NikVolf NikVolf commented Sep 11, 2020

Closes #1459

  • examples
  • much more tests

@NikVolf NikVolf added A0-please_review Pull request needs code review. B7-runtimenoteworthy C3-medium PR touches the given topic and has a medium impact on builders. labels Sep 11, 2020

type StorageValue = Vec<u8>;

impl Externalities for AsyncExternalities {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part could be much simpler if we refactored storage access and stuff that is now in extensions into capability-based externalities.

Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First brief review.

primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
client/executor/src/native_executor.rs Outdated Show resolved Hide resolved
client/executor/src/native_executor.rs Outdated Show resolved Hide resolved
client/executor/src/native_executor.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking a bit in this PR, super nice stuff 👍
But I did not really follow how native is using AsyncExternalities (see comments).
I also start to wonder, since we are sharing a SpawnedNamed for is handling the different calls: should we try to implement some sync at the end of RuntimeInstanceSpawn lifetime (eg to kill sibling threads on panic from the with_externalities_safe).
Similarily should we wait for all threads before completion (if a thread panic but there is no join to wait for it we possibly could have failure or success depending on the scheduling)?

client/executor/runtime-test/src/lib.rs Outdated Show resolved Hide resolved
client/executor/src/async_externalities.rs Outdated Show resolved Hide resolved
client/executor/src/async_externalities.rs Outdated Show resolved Hide resolved
client/executor/runtime-test/src/lib.rs Outdated Show resolved Hide resolved
client/executor/src/async_externalities.rs Outdated Show resolved Hide resolved
primitives/io/src/lib.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
primitives/io/src/tasks.rs Outdated Show resolved Hide resolved
@NikVolf NikVolf added this to the 2.x series milestone Sep 15, 2020
NikVolf and others added 4 commits October 19, 2020 07:13
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides moving the stuff that was added to sp-io to a new crate


impl EntryPoint {
/// Call this entry point.
pub fn call(&self, data_ptr: Pointer<u8>, data_len: WordSize) -> anyhow::Result<u64> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are converting it to a String on the calling side anyway. Instead of pulling in another dependency you could do the same here as well.

@NikVolf
Copy link
Contributor Author

NikVolf commented Oct 20, 2020

bot merge

@ghost
Copy link

ghost commented Oct 20, 2020

Trying merge.

@ghost ghost merged commit 1845278 into master Oct 20, 2020
@ghost ghost deleted the nv-parallel-runtime branch October 20, 2020 12:41
@xlc
Copy link
Contributor

xlc commented Oct 20, 2020

Good to see this is merged. Looking forward to use this feature in our runtime!

Some questions:
What is the overhead of spawn a new runtime thread? So that we can evaluate in what case that we can actually benefits from this.
What are the limitations of using runtime thread? From my understanding, it only have access to the data been passed into the worker right? Will it able to access some other data? The Trait constants?
Any example use cases?
Any future plans on improve this? e.g. message passing between threads?
What happen if we don't join a thread? Can I spawn a thread in on_initialize and join it in on_finalize? i.e. background worker thread while the main thread continuing process transactions.

@kianenigma
Copy link
Contributor

Any example use cases?

I am eager to try it out in the NPOS stuff.

Hard use cases are concurrent phragmen and concurrent feasibility check. But an easier use case is this: I have a PR ready for a new test called PJR check. Each solution should ideally be checked to be PJR and feasible. These two checks can execute purely in parallel afaik.

These are just of the top of my head, I haven't looked into them in detail yet though.

@NikVolf
Copy link
Contributor Author

NikVolf commented Oct 21, 2020

What is the overhead of spawn a new runtime thread? So that we can evaluate in what case that we can actually benefits from this.

At the moment, should be quite big but with #7354 should be much as any additional runtime call.

What are the limitations of using runtime thread? From my understanding, it only have access to the data been passed into the worker right? Will it able to access some other data? The Trait constants?

Trait constant can change during runtime upgrade, they are not really constants.
So as any storage access this is not available. AFAIK @cheme is working on ideas about data parallelism on top of the current low-level stuff.

Any future plans on improve this? e.g. message passing between threads?

We need to explore deterministic story of message passing. But it is definitely on the list.

What happen if we don't join a thread? Can I spawn a thread in on_initialize and join it in on_finalize? i.e. background worker thread while the main thread continuing process transactions.

You can safely drop handles without joining threads.
And yes, you can do on_initialize spawn + on_finalize join.

Thanks for great questions, I'll add examples/tests so that answers are persisted.

@pepyakin
Copy link
Contributor

First of all, I would say that the API is experimental and I don't think there are any guarantees about it.

What are the limitations of using runtime thread? From my understanding, it only have access to the data been passed into the worker right? Will it able to access some other data? The Trait constants?

Yeah, you got it right. The storage is not readable nor writable. Lifting this limitation would involve massive design work, as far as I understand. The workers share the same binary though, so they do have access to any code and data that reside within the binary. Source level stuff like Trait constants also should be accessible.

Any example use cases?

It might be useful for concurrent and/or batch signature verification. Stateless contracts might also work (that is not even on the horizon though).

Any future plans on improve this? e.g. message passing between threads?

I think it would have been cool if storage could be accessed, preferably mutable. That can be achieved near-term if a worker could only operate on isolated child-trie. Message passing: I feel it would be hard to use efficiently (I think the goal for the most efficient use is to make the workers as independent as possible while having the widest forks possible), but at the same time I feel there is potential to explore there.

What happen if we don't join a thread?

That's a good question. I think I'd prefer trapping in this case, since the workers are pure functions right now - not joining to one is basically a no-op. If we ever get to making them some effects we could lift this easily and allow other behavior.

Can I spawn a thread in on_initialize and join it in on_finalize?

Well, yes and no, but mostly no. And this is a very good point. During the block import it's possible. As long as you can carry the handles between on_initialize and on_finalize. But that won't work due to the fact that block building spawns a separate runtime instance for each call.

@xlc
Copy link
Contributor

xlc commented Oct 21, 2020

Thanks for the answers. They are really helpful.

Just one more comment, this will make weights & benchmarking very interesting...

@NikVolf
Copy link
Contributor Author

NikVolf commented Oct 21, 2020

Just one more comment, this will make weights & benchmarking very interesting...

As long as you don't have unbound parallelism and reference machine used for benchmarking has number of cores specified, benchmarks should be valid.

@NikVolf
Copy link
Contributor Author

NikVolf commented Oct 21, 2020

But that won't work due to the fact that block building spawns a separate runtime instance for each call.

It can be fixed in principle (by keeping tasks alive during block production)

But anyway, the problem with persisting handles will not probably be solved until we ditch native runtime

@cheme
Copy link
Contributor

cheme commented Oct 21, 2020

What happen if we don't join a thread?

About it, I was thinking that forcing join would be good (but it requires to manage a pool of thread for the runtime call).

The case I don't like much is a panicking worker that is not joined, then its extrinsic evaluation can non deterministically panic or not. One can even include by mistake a panicking extrinsic in a block.

@pepyakin
Copy link
Contributor

How could it be non-deterministic? When runtime finishes its execution there are two outcomes: in one the runtime joined the worker and the other where the runtime didn't. Which outcome takes place depends solely on the actions of the runtime which is assumed to be deterministic.

Sure, strictness is not necessary, but there are 0 reasons when you want to do that and then if that happens then it is certainly a programming error, which should be reported ASAP IMO. Then, this would rule out the users from relying on the behavior of automatic joining leaving us a possibility to endow our own semantics to this event.

@cheme
Copy link
Contributor

cheme commented Oct 21, 2020

I was wrongly thinking about the panicking behavior, but it all run behind a panic handler so that is fine.

Still think we should either join or early terminate worker :)

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C3-medium PR touches the given topic and has a medium impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runtime worker threads
8 participants