Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Deferred Service Builder #4587

Closed
gnunicorn opened this issue Jan 10, 2020 · 18 comments
Closed

Deferred Service Builder #4587

gnunicorn opened this issue Jan 10, 2020 · 18 comments
Assignees
Labels
J0-enhancement An additional feature request.
Milestone

Comments

@gnunicorn
Copy link
Contributor

Right now, the ServiceBuilder creates the client, backend and all else right at the beginning and executes the configured functions directly. This has some implicatations, we do not want:

  1. you can out the ServiceBuilder into an unexpected/unrecommended State by calling a service method multiple times with different callbacks. As these callbacks typically have side-effects, this can lead to inconsistent and unreliable cases–even to race conditions if, for example, an authoring mechanism was initiated.
  2. some of these functions actually conflict and shouldn''t both be called. However right now, we have no way of preventing that.
  3. It is up to the user to call the functions in the right order, which is unreliable and an unnecessary foodgun – especially if we had to change that order.
  4. We can't provide sane defaults – like the longest chain selector or the light client transaction pool, both are the same for all known usages of our node-template – without requiring to the do 1 and cause unexpected side effects.
  5. Certain aspects, like the transaction pool or externality extensions, are actually created during client creation and thus have a hacky API so we can change them after the fact, see Allow node to add externalities extensions #4583.

Instead the pattern should only keep the references to the fns and execute all in one go on .build only. This would allow us to provide defaults, enforce the order in which set up takes place and would ensure any fn is only called once.

Further more this might allow us to clean up the api, so you can have one global builder and then call build_light and build_full on them rather than having to specify two builder...

@gnunicorn gnunicorn added the J0-enhancement An additional feature request. label Jan 10, 2020
@gnunicorn gnunicorn added this to the 3.0 milestone Jan 10, 2020
@gnunicorn gnunicorn self-assigned this Jan 10, 2020
@tomaka
Copy link
Contributor

tomaka commented Jan 15, 2020

The original idea behind the ServiceBuilder is that the callbacks would be completely removed.
It doesn't make sense to me that the user has to call a function and pass a closure, only for this closure to be immediately called by that function and that's it.

After all, the parameters that the builder passes to the stack are very arbitrary. If for instance you want to use an implementation of a TxPool that requires a reference to the network, you would need to modify the ServiceBuilder to pass the NetworkService to the closure that creates the TxPool.

I'm not sure how feasible that is, but my original idea was to create everything on the stack, and then pass that to the ServiceBuilder as values. Without having any closure/callback involved.

@cecton
Copy link
Contributor

cecton commented Mar 30, 2020

This is mine now

Cat owning food

@cecton
Copy link
Contributor

cecton commented Apr 2, 2020

I got an idea. Let me know if that sounds good:

  • I realized that what the ServiceBuilder will do after the change is: a. create a struct with whatever you pass to it; b. provide functions to add all the builders; c. and finally allow you to build() the service or client and the queue (which is the relevant part).
  • The idea of @tomaka to remove all the builders makes perfect sense to me.
  • @gnunicorn told me we should actually drop the concept of client.

This makes me conclude that I can probably drop the entire ServiceBuilder and replace it by a few functions (build_service, build_for_import, build_client). This will also simplify greatly the typing.

@cecton
Copy link
Contributor

cecton commented Apr 17, 2020

So I have to drop it for now to focus on cumulus... I did some progress as I found out that the builder itself don't actually help much and can be entirely removed instead of making it defer the calls to the with_* methods.

This is actually more flexible AND it doesn't reduce at all the readability nor the simplicity in the service.rs files that were using it.

@gnunicorn
Copy link
Contributor Author

I suppose the only way to actually reduce the code for it would be by using something like derive_builder.

@tomaka
Copy link
Contributor

tomaka commented Apr 27, 2020

To give my two cents: I don't even know why there is a Service struct at all.

The Service as it exists is a complete pot-pourri of multiple design attempts.

As far as I know, it was initially supposed to be the container that contains all the sub-components, and you would create it in a kind-of-declarative way with a macro.

During the six months or so when I worked on it, to me it was supposed to coordinate the various components of Substrate (#2536), but that is obviously not the case, as these components "coordinate each other" in a spaghetti-ish way at the moment (which also contributes to all the issues with unbounded channels that we have, but that's off-topic). Actually reworking that without rewriting Substrate is too big of a task at this point.

Now, almost the only thing that sc-service does is plug the Legos together at initialization. Once initialization is done, the only reason why the service exists at all is to pass to the informant and to detect whether essential tasks have crashed (both of which can be very easily implemented differently).

@expenses
Copy link
Contributor

expenses commented Jun 9, 2020

Hey, I'm gonna be taking over this issue I think. I've made a new PR: #6309.

@expenses
Copy link
Contributor

expenses commented Jun 9, 2020

@gnunicorn do you think you could assign this to me please?

@expenses
Copy link
Contributor

expenses commented Jun 9, 2020

I definitely agree with @tomaka that the Service shouldn't really exist and that it would be much better to have a few smaller structs that do it's job.

I've been using wgpu-rs a fair bit recently, and I think it has a really nice API for constructing complex things (swap chains, rendering pipelines etc compared to our import queues and finality proof providers). It puts everything in the top level of the crate so you don't have to go looking in a specific module for what you want. I think it would be good to have a kind of 'substrate prelude' crate that does the same thing.

It also uses 'descriptor' structs for a lot of the initialization, which would be a lot better than the following, which is in the current PR:

sc_service::build(
	config,
	client,
	backend,
	task_manager,
	keystore,
	None,
	Some(select_chain),
	import_queue,
	Some(finality_proof_request_builder),
	Some(finality_proof_provider),
	transaction_pool,
	(),
	Some(remote_blockchain),
	background_tasks,
	None,
	Box::new(|_| ()),
	String::new()
)

@gnunicorn
Copy link
Contributor Author

gnunicorn commented Jun 9, 2020

I am not generally opposed to getting rid of service, but one important distinction needs to be made and seen here: substrate is not a library but rather a framework of crates, many of which are optional but can – under specific conditions – work together nicely. Just think of all different consensus authoring mechanism.

Secondly, this also means it isn't being used directly, but rather plugged together by third party developers to their needs – and this is exactly where what is currently sc-service comes in, as it provides tooling to plug things together in an easier, more reliable and convenient way than if you had to do it all by yourself. But surely you can do everything that sc-service does without it – no other crate imports it anymore (other than for testing) and relies upon it, only on APIs defined in sc-api.

Thus, the way I currently think of it going forward is less of just expose everything libraries, but a tokio-0.2-meta-crate, where features becomes available through setting them in Cargo.toml, guarding all potential internal semver-breakage and offer convenience functionality would help the developers build the chain they want. But also allowing anyone to wrap the helper into their own struct(s) and do things ontop or do all the things without it entirely.

@expenses
Copy link
Contributor

So I've been thinking about the systems that we can replace the service builder with, and I've come up with two that I think are alright.

  1. The first is to remove the service builder completely, and have the user build the components needed to call the sc_service::build function up with lots of let bindings, as I mentioned in Deferred Service Builder #4587 (comment). I've created a branch for this here: Remove the service builder #6557. This is appealing to me because, as it's defined entirely in pretty simple code, it's super customizable and a lot clearer than the service builder was. It has some downsides though:

1.a. There's no way to make sure the full node service and light node service builds are compatible, meaning that someone could try using a BABE block importer for the full node and an AURA block importer for the light node and they either just wouldn't work together or, worse, cause something weird to happen. This was a potential problem with the service builder as well, but it'd be nice to solve it, ideally by having a single build system that you can specialize into full node/light node after you've defined everything.

1.b. The generics can get very very complicated, as lots of things depend on a client type, which itself depends on the block type, etc. There are ways around this problem though:

1.b.i. We setup in type definitions in the node/node-template service.rs files. This works, but it's a bit too verbose for me, and I'd rather find a way where end-users don't have to worry about those definitions.

1.b.ii. We create a macro that takes a Block, RuntimeApi and Executor and generates the types for us. I'd be happy with this but it might go against our goal of trying to avoid macros as much as possible.

1.b.iii. We do this t:rick

pub trait PreludeT {
	type FullClient;
	type LightClient;
	type FullBackend;
	...
}

pub struct Prelude<Block, RuntimeApi, Executor>(std::marker::PhantomData<(Block, RuntimeApi, Executor)>);

impl<Block, RuntimeApi, Executor> PreludeT for Prelude<Block, RuntimeApi, Executor> {
	type FullClient = TFullClient<Block, RuntimeApi, Executor>;
	type LightClient = TLightClient<Block, RuntimeApi, Executor>;
	type FullBackend = TFullBackend<Block>;
	...
}

which can then be used like:

type Prelude = sc_service::Prelude<MyBlock, MyRuntimeApi, MyExecutor>;

fn whatever() -> <Prelude as PreludeT::FullClient> {
    ...
}

This would be a lot easier with Inherent associated types.

  1. A second idea that I had in this conversation: Remove the service builder #6557 (comment) was to reimagine the service builder as a Builder trait with type definitions for the different components. You'd implement this trait for your own zero-sized-type and would be able to call build_full/build_light/build_ops on it. I've got an implementation of this here: [WIP] Replace the service builder with an experimental builder trait #6600. This solves 1.a because the component types are intended to work for both full and light nodes, and 1.b because if needed, you can retrieve the types from the Builder trait. This looks something like the following:
pub struct Builder;

impl sc_service::builder::Builder for Builder {
	type Block = Block;
	type RuntimeApi = RuntimeApi;
	type Executor = Executor;
	type TransactionPoolBuilder = BasicPoolBuilder;
	type BlockImportBuilder = GrandpaBlockImportBuilder<Self::SelectChainBuilder>;
	type ImportQueueBuilder = AuraImportQueueBuilder<Self::BlockImportBuilder>;
	type FinalityProofProviderBuilder = GrandpaFinalityProofProviderBuilder;
	type SelectChainBuilder = LongestChainBuilder;
	type RpcExtensions = NoRpc<Self>;
}

This would work really well in the simple cases, where you have a defined builder type for everything you want.

2.a. The main problem with this is that we're defining things in terms of types, not code. While it's pretty extendible as you can replace any of the types or the build_x functions with your own implementations, the whole thing might still be a bit too strict. If you wanted to use your custom import queue type in this builder, you'd have to create your your own builder type and implement ImportQueueBuilder for it. This could be difficult or impossible, as the ImportQueueBuilder functions have a fixed set of arguments (i.e. a client, a task manager, ..) that are passed to them. It'd be hard to change this so that you can pass in a different set of arguments instead.

Any feedback on these ideas would be much appreciated!

@expenses
Copy link
Contributor

There's also nothing stopping us from removing the service builder now, then adding my builder trait later.

@expenses
Copy link
Contributor

expenses commented Jul 10, 2020

@seunlanlege @joshua-mir @bkchr I'm interested in hearing your thoughts as well!

@tomaka
Copy link
Contributor

tomaka commented Jul 10, 2020

There is a pretty huge hack in sc-network at the moment.
sc-network needs to know the list of protocols that the node supports. Normally, one would pass that list as part of NetworkConfiguration, but because of the general structure of the code we just pass an empty list of protocols, and a hacky register_notifications_protocol method is provided that lets you add more protocols even after the network has actually started.

Not only the code in sc_network needs to be adjusted for this hack to work, but it is completely possible right now that sc-network connects to the bootnodes, then doesn't do everything that it is supposed to do because the rest of the Substrate codebase hasn't registered its protocols yet.
register_notifications_protocol is a complete hack and should not exist.

@tomaka
Copy link
Contributor

tomaka commented Jul 10, 2020

I'm not asking for this problem to be solved, but I would like people to be aware that this situation exists, and that we need some structure that would make it possible to solve this problem eventually.

@bkchr
Copy link
Member

bkchr commented Jul 13, 2020

Can we not just almost entirely drop the service builder and just have normal function that we call to initialize the service? Something like:

fn build() -> TaskManager {
    start_transaction_pool;
    do_something_else;

}

Did you tried something like this?

I think we need to get rid off these traits or whatever that try to force a certain structure of the service. In the end the service is just futures that are spawned and that is something we don't need to combine in struct.

@expenses
Copy link
Contributor

I'm going to remove the service builder now, and if we want to go through with the builder trait later, we'll do that.

@expenses
Copy link
Contributor

We can probably go ahead and close this issue now that #6557 has been merged.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
None yet
Development

No branches or pull requests

6 participants