Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the Prometheus Registry behind Mutex to share mutable references #2518

Closed
wants to merge 1 commit into from

Conversation

ParthDesai
Copy link
Contributor

@ParthDesai ParthDesai commented Feb 8, 2024

Description

This PR modifies Prometheus registry usage to be behind Arc<Mutex. This allows clients like Pulsar to use one registry for both Farmer and Node instance.

I considered having simply Option<&mut Registry> but since we eventually use Arc<Mutex for RegistryAdapter and the mutable reference will need to be around longer (since it is running in background task), it's better to have Arc<Mutex at Registry and remove it from RegistryAdapter.

Code contributor checklist:

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered having simply Option<&mut Registry> but since we eventually use Arc<Mutex for RegistryAdapter and the mutable reference will need to be around longer (since it is running in background task), it's better to have Arc<Mutex at Registry and remove it from RegistryAdapter.

I'm not sure it is better. What background task are you talking about exactly?

In fact I don't believe Arc<Mutex<SharedRegistry>> is necessary in the first place, it should be changed to Arc<SharedRegitry> since I do not see or expect any mutation there. Also unsafe usage there is not necessary either and should be removed.

@ParthDesai
Copy link
Contributor Author

ParthDesai commented Feb 8, 2024

I considered having simply Option<&mut Registry> but since we eventually use Arc<Mutex for RegistryAdapter and the mutable reference will need to be around longer (since it is running in background task), it's better to have Arc<Mutex at Registry and remove it from RegistryAdapter.

I'm not sure it is better. What background task are you talking about exactly?

In fact I don't believe Arc<Mutex<SharedRegistry>> is necessary in the first place, it should be changed to Arc<SharedRegitry> since I do not see or expect any mutation there. Also unsafe usage there is not necessary either and should be removed.

I am talking about the start_prometheus_worker background task. While monorepo does not need to use registry after it is passed to the worker, in pulsar since we run farmer after node, the reference that is passed to the node builder does need to be passed to the farmer, where FarmerMetrics expects a mutable reference and then it is passed to the farmer's Prometheus worker.

Reason why we have two prometheus worker is because the node's prometheus worker is started inside the service and I cannot customize/combine it.

@ParthDesai
Copy link
Contributor Author

I considered having simply Option<&mut Registry> but since we eventually use Arc<Mutex for RegistryAdapter and the mutable reference will need to be around longer (since it is running in background task), it's better to have Arc<Mutex at Registry and remove it from RegistryAdapter.

I'm not sure it is better. What background task are you talking about exactly?
In fact I don't believe Arc<Mutex<SharedRegistry>> is necessary in the first place, it should be changed to Arc<SharedRegitry> since I do not see or expect any mutation there. Also unsafe usage there is not necessary either and should be removed.

I am talking about the start_prometheus_worker background task. While monorepo does not need to use registry after it is passed to the worker, in pulsar since we run farmer after node, the reference that is passed to the node builder does need to be passed to the farmer, where FarmerMetrics expects a mutable reference and then it is passed to the farmer's Prometheus worker.

Reason why we have two prometheus worker is because the node's prometheus worker is started inside the service and I cannot customize/combine it.

Maybe using RwLock might be more efficient than Mutex.

@nazar-pc
Copy link
Member

nazar-pc commented Feb 8, 2024

To be clear, what you are proposing works. However it is a hack on top of another hack and increases technical debt, which is why I'm looking for ways to improve situation instead and upgrade architecture to something better.


I am talking about the start_prometheus_worker background task. While monorepo does not need to use registry after it is passed to the worker, in pulsar since we run farmer after node, the reference that is passed to the node builder does need to be passed to the farmer, where FarmerMetrics expects a mutable reference and then it is passed to the farmer's Prometheus worker.

Not sure what is start_prometheus_worker, but I'm quite sure you can start it after both node and farmer started. Also you can create FarmerMetrics early and pass it down to farmer that starts later. Either way you don't need mutable registry in two places at the same time.

Reason why we have two prometheus worker is because the node's prometheus worker is started inside the service and I cannot customize/combine it.

That is incorrect as well, it should be pulled out of there into subspace-node. I think the reason it is done that way right now is because if we don't pass down prometheus config then Substrate's registry will not be created (but we want it to), however if we do pass it down then sc_service::spawn_tasks will start its own server (but we don't want this to happen, though we'll be able to fix that with upcoming Substrate update thanks to paritytech/polkadot-sdk#3166). So we do pass prometheus config down, but pull it out of the configuration just before handing it over to sc_service::spawn_tasks, which while is a hack, does achieve what we want.

So we need to pull it out of the config, but leave to subspace-node to create (or not) the server itself. This will also remove subspace-metrics from subspace-service's dependencies.

Also we can try to construct Substrate's registry explicitly ourselves, but need to check carefully whether something in Substrate internals tries to access registry via config (I hope there isn't anything, but I didn't check). Then we can avoid having prometheus config.

Maybe using RwLock might be more efficient than Mutex.

Should be strictly worse unless under lock contention, which I don't expect to be the case in this case. Though for the same exact reason it doesn't matter what is used.

@ParthDesai
Copy link
Contributor Author

ParthDesai commented Feb 8, 2024

To be clear, what you are proposing works. However it is a hack on top of another hack and increases technical debt, which is why I'm looking for ways to improve situation instead and upgrade architecture to something better.

I am talking about the start_prometheus_worker background task. While monorepo does not need to use registry after it is passed to the worker, in pulsar since we run farmer after node, the reference that is passed to the node builder does need to be passed to the farmer, where FarmerMetrics expects a mutable reference and then it is passed to the farmer's Prometheus worker.

Not sure what is start_prometheus_worker, but I'm quite sure you can start it after both node and farmer started. Also you can create FarmerMetrics early and pass it down to farmer that starts later. Either way you don't need mutable registry in two places at the same time.

Reason why we have two prometheus worker is because the node's prometheus worker is started inside the service and I cannot customize/combine it.

That is incorrect as well, it should be pulled out of there into subspace-node. I think the reason it is done that way right now is because if we don't pass down prometheus config then Substrate's registry will not be created (but we want it to), however if we do pass it down then sc_service::spawn_tasks will start its own server (but we don't want this to happen, though we'll be able to fix that with upcoming Substrate update thanks to paritytech/polkadot-sdk#3166). So we do pass prometheus config down, but pull it out of the configuration just before handing it over to sc_service::spawn_tasks, which while is a hack, does achieve what we want.

So we need to pull it out of the config, but leave to subspace-node to create (or not) the server itself. This will also remove subspace-metrics from subspace-service's dependencies.

Also we can try to construct Substrate's registry explicitly ourselves, but need to check carefully whether something in Substrate internals tries to access registry via config (I hope there isn't anything, but I didn't check). Then we can avoid having prometheus config.

Maybe using RwLock might be more efficient than Mutex.

Should be strictly worse unless under lock contention, which I don't expect to be the case in this case. Though for the same exact reason it doesn't matter what is used.

Apart from FarmerMetrics, node also uses it while creating the SubspaceNetworking node (it requires Option<&mut Registry>. That is why just using it behind Arc is difficult. By start_prometheus_worker, I am referring to the background worker that keeps the prometheus server running and exposes /metrics endpoint.

Okay. Let's improve the architecture here, I can work on this.

@nazar-pc
Copy link
Member

nazar-pc commented Feb 8, 2024

Apart from FarmerMetrics, node also uses it while creating the SubspaceNetworking node (it requires Option<&mut Registry>.

Right, you just need to pass Option<&mut Registry> everywhere and start your metrics server at the very end. It is not possible right now, but possible with a bit of effort in general. We just didn't consider Pulsar use case previously when designing API and Space Acres doesn't start metrics server, so it wasn't an issue there either.

@ParthDesai ParthDesai closed this Feb 8, 2024
@nazar-pc nazar-pc deleted the fix-registry-ref branch February 8, 2024 16:00
@nazar-pc nazar-pc mentioned this pull request Feb 8, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants