Support manual pool termination #299

sasa1977 · 2025-02-06T15:48:21Z

This PR introduces Finch.stop_pool/2, which can be used to manually stop the connection pool of a given SHP.

We have tried it out in production, and it works fine, and it helped us fix our process leak.

See #294 for details.

It's worth noting that this function might not be enough. The client needs to make sure that no request to the given SHP is taking place while the pool is being stopped. This can actually be tricky. A simple solution of channeling all requests through a single process means that you can only have one req at a time to the given SHP, which might be a problem.

In our case we've implemented a basic reference counted GC mechanism, which tracks all the processes currently making a request to the given SHP. When there are no requests taking place anymore, the pool is stopped.

I could be open to submit a PR for something like that, but the proper implementation will be much more complicated, because it has to support the differences between HTTP 1 and 2 implementations, especially for async requests.

sneako · 2025-02-12T11:46:01Z

Thank you @sasa1977 !

It seems that this implementation will only shut down ONE pool, but Finch allows you to start many pools for a scheme/host/port.

What exactly happens if you shut down a pool with an ongoing request? I suppose we could consider sending a terminate message to each pool process instead of immediately telling the supervisor to terminate the child.

If a user if ok with sending all of their requests through a single process, then there is no reason for them to be using Finch in the first place, and they should probably just use Mint directly in that case.

I don't really like the idea of adding reference counting. It's probably minimal overhead, but even just a little bit of overhead is not worth it to support this feature IMO as it goes against the original design goals of the library.

sasa1977 · 2025-02-12T12:24:59Z

It seems that this implementation will only shut down ONE pool, but Finch allows you to start many pools for a scheme/host/port.

Ah, this is a good catch! I could adapt the implementation to stop them all, assuming you're still ok with adding the support for this.

What exactly happens if you shut down a pool with an ongoing request? I suppose we could consider sending a terminate message to each pool process instead of immediately telling the supervisor to terminate the child.

Not really sure, but I suppose the likely outcome is the failure of the request, or some kind of a process crash, since the pools are taken down while the request is running?

If a user if ok with sending all of their requests through a single process, then there is no reason for them to be using Finch in the first place, and they should probably just use Mint directly in that case.

Finch provides additional value with a much simpler API. Another benefit is that we can use req for more convenient goodies and features. Reimplementing all of this with Mint would require a lot of extra work, not to mention that the implementation would not be as mature, stable, or feature-rich as what Finch + req can give us.

Furthermore, in our case, we need this behaviour only for one some shps (essentially short-lived internal servers which are dynamically started in our cluster ala Flame). For other shps we don't stop the pool, and so we don't need to channel those requests through a single process.

Finally, it's worth noting that we're not channeling the actual requests through the process. Just some initial process registration is performed in this fashion (before the request is issued). The requests themselves are then issued concurrently, and then finally, deregistration is again channeled through the single process.

I don't really like the idea of adding reference counting. It's probably minimal overhead, but even just a little bit of overhead is not worth it to support this feature IMO as it goes against the original design goals of the library.

That's fair enough, I just thought to mention it.

sneako · 2025-02-12T12:32:43Z

Gotcha!

I'm definitely still on board with adding this feature as long as you update it to terminate all pools associated with a SHP.

I don't think it would be all too complicated to implement as send a message to all associated pools asking them to terminate, which would probably be more of a graceful shutdown, but I'm also fine with the currrent immediate termination approach as long as we mention in the docs that terminating a pool that is actively being used could cause some errors, if that sounds reasonable to you.

sasa1977 · 2025-02-12T12:39:48Z

as long as we mention in the docs that terminating a pool that is actively being used could cause some errors

I've already mentioned something like that in the docs:

Note that this function is not safe with respect to concurrent requests. Invoking it while
another request to the same SHP is taking place might result in the failure of that request. It
is the responsibility of the client to ensure that no request to the same SHP is taking place
while this function is being invoked.

Is that enough, or do you need something else?

sneako · 2025-02-12T12:41:21Z

Right of course, that's perfect.

sasa1977 · 2025-02-12T12:41:27Z

I don't think it would be all too complicated to implement as send a message to all associated pools asking them to terminate

This could actually lead to subtle race conditions, because the pool might be stopped later when another request is started. So we really need a synchronous stop feature.

sneako · 2025-02-12T13:50:24Z

I don't think it would be all too complicated to implement as send a message to all associated pools asking them to terminate

This could actually lead to subtle race conditions, because the pool might be stopped later when another request is started. So we really need a synchronous stop feature.

Good point! Ok let's roll with the sync route.

Without start_pool_metrics?: true, the final assertion always succeeds. With this option added, the assertion may occasionally fail immediately, due to asynchronism (because the process may not be immediately deregistered)

sasa1977 · 2025-02-12T15:54:59Z

I'm definitely still on board with adding this feature as long as you update it to terminate all pools associated with a SHP.

Did it. Could you please take a look when you find the time?

Support manual pool termination

4102f74

sasa1977 added 2 commits February 12, 2025 16:53

Improve tests

d0d6138

Without start_pool_metrics?: true, the final assertion always succeeds. With this option added, the assertion may occasionally fail immediately, due to asynchronism (because the process may not be immediately deregistered)

Stop all pool instances

f89d93c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support manual pool termination #299

Support manual pool termination #299

sasa1977 commented Feb 6, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

Support manual pool termination #299

Are you sure you want to change the base?

Support manual pool termination #299

Conversation

sasa1977 commented Feb 6, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025

sneako commented Feb 12, 2025

sasa1977 commented Feb 12, 2025