-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in MQTT protocol when sending messages #1094
Conversation
/cc @embano1 |
// publishMsg generate a new paho.Publish message from the p.publishOption | ||
func (p *Protocol) publishMsg() *paho.Publish { | ||
return &paho.Publish{ | ||
QoS: p.publishOption.QoS, | ||
Retain: p.publishOption.Retain, | ||
Topic: p.publishOption.Topic, | ||
Properties: p.publishOption.Properties, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, the race condition is that the publishOption
can be changed before the message is published?
Since there is only one place to set this field in the struct (
sdk-go/protocol/mqtt_paho/v2/option.go
Lines 29 to 37 in e6a74ef
func WithPublish(publishOpt *paho.Publish) Option { | |
return func(p *Protocol) error { | |
if publishOpt == nil { | |
return fmt.Errorf("the paho.Publish option must not be nil") | |
} | |
p.publishOption = publishOpt | |
return nil | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Left a comment in the issue to better understand the issue/root cause. What is causing the race? Why is publishOption
changing during concurrent calls? Asking all these questions because I didn't write the implementation and not familiar with MQ. Furthermore, if we change as per the above, I wonder what the implication is for existing users because we're changing the semantics here (copy instead of pointer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @embano1.
I append the root cause here.
The publishOption
is a pointer reference by the MQTT message, which is used to hold the CloudEvent payload. The message(publishOption
) is shared across goroutines and will be changed by them.
The publishOption
is a pointer referenced by the MQTT message, which holds the CloudEvent payload. So the message (publishOption
) is shared across multiple goroutines and can be modified by them.
By making the suggested changes, the user should not perceive anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and will be changed by them
Can you please point me to the code/line where this happens?
Also, ideally you also write a test please to surface the current issue (panics) and how this PR fixes it, perhaps there's more issues due to the current implementation which we can detect this way 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@embano1 No worries!
Here’s the code line where the panic occurs.
I’ve also written an integration test that covers the concurrent panic issue, which you can use to reproduce it.
// publishMsg generate a new paho.Publish message from the p.publishOption | ||
func (p *Protocol) publishMsg() *paho.Publish { | ||
return &paho.Publish{ | ||
QoS: p.publishOption.QoS, | ||
Retain: p.publishOption.Retain, | ||
Topic: p.publishOption.Topic, | ||
Properties: p.publishOption.Properties, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Left a comment in the issue to better understand the issue/root cause. What is causing the race? Why is publishOption
changing during concurrent calls? Asking all these questions because I didn't write the implementation and not familiar with MQ. Furthermore, if we change as per the above, I wonder what the implication is for existing users because we're changing the semantics here (copy instead of pointer).
1889566
to
b6a7ead
Compare
b6a7ead
to
787a693
Compare
Great, nice work! Please squash your commits and force-push. |
Signed-off-by: myan <myan@redhat.com> add sending concurrently Signed-off-by: myan <myan@redhat.com> add err group Signed-off-by: myan <myan@redhat.com>
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Signed-off-by: myan myan@redhat.com
Resolved: #1093
Summary:
Reason: This issue occurs when a single client sends events across multiple goroutines. Specifically, in the code here:
sdk-go/protocol/mqtt_paho/v2/protocol.go
Lines 92 to 101 in 1aecb20
at line 98, the MQTT
msg
is designed to hold a singlebinding.Message
(or Event) payload. However, in a multi-goroutine environment, multiplem
values may be written to the sharedmsg
, causing a panic likebytes.Buffer: too large
.Solution: Instead of sharing the same
msg
across goroutines, we will create a copy of it each time a message is sent.