-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix - Add automatic reconnect support for STOMP producers #1099
Fix - Add automatic reconnect support for STOMP producers #1099
Conversation
Yes, indeed. You should not commit lock file in libraries in contrast to application. |
Hmm, I always lock my dependencies regardless of project type, but no matter! 😄 I'm going through some additional testing right now and I'm noticing one other issue with I'll spend some more time today testing and will report back. |
@makasim - So after looking a little deeper into it, it looks like there's another issue with The sequence looks roughly like:
Originally I thought I've added another improvement to the PR which we can discuss further, it unfortunately has the impact of requiring one connection per message. Again though, I'm not sure how to workaround this and in fairness, I can see how this hasn't been observed until now because I could see the majority of projects using this library only dispatching from short-lived web requests. |
To find out that connection is closed ( by the server) you have to read from the socket. You will get EOF or timeout. But you cannot read before you write in client to server commutation. There is no way to detect a closed connection by writing to it. Timeout is the only option. Is there a way to set write timeout ? That could be used as a sign of closed connection and trigger reconnect. |
Oh thanks! I'll tinker around with your suggestion and reply here with what I figure out. Definitely need to be able to come up with a way to gracefully handle this. I'll see how long I have to go before something has to happen upstream 😆 |
After spending a fair bit of time on it, I've been unable to come up with a way to jog or test the connection. Can you think of any concrete solutions that might get around this issue? From my vantage point, I see only a few options:
IMO, I feel like the 3rd option is the most "predictable" philosophy and also aligns message publishing with how it's functioning right now during web requests. |
Just a note in case you're at things early today, will be pushing up a new approach that I think I have working really nicely here... |
pkg/stomp/StompContext.php
Outdated
*/ | ||
public function __construct($stomp, string $extensionType) | ||
public function __construct($stomp, string $extensionType, $transient = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connections start out as "transient" which means that they will use some new reconnect behaviour below. I have it here in the constructor as configurable if for whatever reason a connection needs to start out long-lived.
@@ -173,6 +180,8 @@ public function createConsumer(Destination $destination): Consumer | |||
{ | |||
InvalidDestinationException::assertDestinationInstanceOf($destination, StompDestination::class); | |||
|
|||
$this->transient = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as the first consumer is created against a context, we convert to long-lived and disable all transient functionality so that queues with heartbeats don't lose their connections while heartbeating.
pkg/stomp/StompContext.php
Outdated
if ($this->transient && true == $this->stomp) { | ||
$this->stomp->disconnect(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happens when a connection is not being used for consumption.
Triggering a disconnect prior to a producer being used is enough, as the underlying stomp-php
library can safely rely on its false
state around connections. Connections recover without any additional involvement from enqueue.
private function createStomp(): BufferedStompClient | ||
{ | ||
$stomp = call_user_func($this->stompFactory); | ||
|
||
if (false == $stomp instanceof BufferedStompClient) { | ||
throw new \LogicException(sprintf('The factory must return instance of BufferedStompClient. It returns %s', is_object($stomp) ? get_class($stomp) : gettype($stomp))); | ||
} | ||
|
||
return $stomp; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope you don't mind, I moved this out. 😄
@makasim - Updates made and I'm really happy with this, it's working great and has greatly improved the stability and flexibility of enqueue STOMP! 😃 |
To clarify: You have to connections: one for consumers and another for publisher. The problem happens when you want to publish a message from a consumer ? |
If so, It might be better to reconnect publisher connection only once before processing a message (assuming that message processing does not take long). |
Hmm, let's see if I can do this justice 😆 I think part of the issue is around the current division of responsibilities. We need to do the correction at the
For the current design:
In theory I could add a Apologies, I'm hopefully explaining this correctly and in such a way that does the current design justice 😆 |
I am not that much into Laravel specifics. In general the idea is next
Once you get a new message do the same. |
If I understand what you're saying correctly, that flow sounds like it's for receiving messages? Which isn't a problem in my scenario. The issue I'm observing is when I try to send to a queue ( I have no reason to open and hold a connection open to the destination queue ( FWIW, this could be an issue without Laravel involved, there's nothing specific to it here. Moreover, I think what I'm adding more explicitly reflects how people use Enqueue during regular web request lifecycle. It just happens to ensure that semantic is now consistently offered for long running processes as well. Sorry, one more edit, but I'm also more than happy to do a screen share to show the difference. Sometimes a picture/video is way better for understanding what's going on. |
@makasim - Is there anything specific I can do to help get this merged? At this point my testing locally has proven the fix in this MR to be quite robust for long lived processes. 😄 |
Hey @makasim, just checking in again. I'd love to get this merged so that we'll be able to use STOMP from PHP in some upcoming work. Everything is still working very smoothly with my local version of this fix. |
Honestly, I am not completely sold on this solution. Let's discuss the issue over a call. What about tomorrow? |
Absolutely, I can definitely do that, I'm in the CDT timezone, happy to accommodate, what's your preference? 😄 It's worth noting that the transient flag is only in effect when the connection is going to be short-lived anyway. So there's very little risk of this breaking consumers. It's just causing the connection to recycle prior to use for non-consumer usage. |
@makasim - If you're still available and unsure about this, I followed you on twitter if you wanna DM any details for a quick screen share or chat. Again, I've been running this patch locally from two different queue workers that interact over the broker since figuring it out. Have made sure to come back and test dispatches. Consumers and heartbeats haven't been negatively impacted. My scenario also tests consuming and dispatching from the same queue, so the I do feel like it's a stable change and am happy to commit long term to assisting if for some reason this causes a regression. In many ways though, I think this could improve stability. A long lived web request publishing to a queue with a short TTL could in theory encounter an issue without this fix. |
@atrauzzi Sent you a message |
} | ||
$stomp = $this->config['lazy'] | ||
? function () { return $this->establishConnection(); } | ||
: $this->establishConnection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I indent lines like 79
, but the code style suggested this, apologies!
/** | ||
* @param BufferedStompClient|callable $stomp | ||
*/ | ||
public function __construct($stomp, string $extensionType) | ||
public function __construct($stomp, string $extensionType, bool $detectTransientConnections = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we default to "off".
Good morning @makasim! 😆 As per our chat earlier today, I've wired this up to the configuration obtained in |
@atrauzzi some tests are failing |
@makasim - I think the two that are failing now are okay? |
Wonderful, thank you @makasim. Will this be automatically released? |
Semi-automatically... I'll do it. |
Thank you too! |
Hey @makasim - did the release end up going through? I noticed that I'm still seeing version |
Closes: #1097
To detail my scenario, I'm running a long-lived PHP process (laravel queue worker) that listens on one connection, but can occasionally publish to another connection. The connection used for listening doesn't suffer from any issues because enqueue successfully sets up heartbeats, anticipating the fact that consumers tend to be long lived. For producers however, there might be the assumption that they are transient (web requests) and as such, don't require connections to live any longer than it takes for the request thread to finish all its work.
@makasim - My apologies if I've missed something or if this seems like a shoehorn fix / neglects any other concerns. After doing some digging around however I am starting to wonder if this might actually be a desirable approach.
What I've noticed in the
stomp-php
library is that their implementation ofClient::isConnected
has the potential to report false positives for the state of the connection and that without getting into a more convoluted fix, can only be "tested" by attempting to use it.With the code fix in this PR, failures to write are jogged with one last-ditch attempt to connect. My long lived mixed consumer+producer processes are able to send messages and reestablish connections on-demand with no ceremony or boilerplate.
I'd like to add the caveat that this is not my preferred strategy and would favour something more explicit, but just based on the fact that
StompProducer
isn't set up for configuration and the aforementioned false-positives fromstomp-php
, this may do well for all scenarios.