Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix racing in Open #171

Merged
merged 2 commits into from
Feb 8, 2023
Merged

Fix racing in Open #171

merged 2 commits into from
Feb 8, 2023

Conversation

Zerpet
Copy link
Contributor

@Zerpet Zerpet commented Feb 8, 2023

Summary

  • Lock allocator assignment in openComplete
  • Add a critical section in openTune

Description

In #170, it was reported a data race in Connection.Open. The situation
described is an edge case, where the connection reader (responsible for reading
data sent from the server/wire) initiates the shutdown sequence in
Connection.shutdown before Connection.Open completes. This creates a data
race for Connection.allocator and Connection.Config.ChannelMax fields.

The shutdown function is already protected by the Connection struct mutex. The
access from Open stack calls is not protected. This PR adds two new, as small
as possible, critical sections protected by the struct mutex during Open.

It is quite challenging to add a reliable test for this edge case, even with a
server fake, since we have to inject a failure just before the frame reader
reads, but right after the Tune frame is sent. Original reporter validated the
fix.

Closes #170

Related to #170. There is an edge case where the background frame
handler receives an error before `Open` completes. In this case, there
is a data race to assign the value of the allocator. The destructor in
shutdown is already a critical section. Adding a tiny critical section
in openComplete to protect the allocator.

Signed-off-by: Aitor Perez Cedres <acedres@vmware.com>
Related to #170. There is an edge case where the TCP connection between
client-server is setup, and AMQP handshake starts, up to the point right
after sending Tune frame. At this point, the background frame reader
receives an error from the TCP socket and starts the shutdown sequence.
At the same time, the Tune function continues (as it has not completed)
and attempts to set the ChannelMax field in the connection struct. At
the same time, the shutdown sequence initiated by the error in the frame
handler reads the ChannelMax field. This creates a race.

A potential solution is to add a critical section in tune to protect
access to ChannelMax field. The destructor in the shutdown sequence is
already a critical section, protected by the struct mutex.

Signed-off-by: Aitor Perez Cedres <acedres@vmware.com>
@Zerpet Zerpet added this to the 1.7.0 milestone Feb 8, 2023
Copy link
Contributor

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we bother re-initializing these two fields when the connection is shutdown (and thus should never be used again)?

https://github.com/rabbitmq/amqp091-go/blob/issue-170/connection.go#L554-L555

@Zerpet
Copy link
Contributor Author

Zerpet commented Feb 8, 2023

I wish the past authors of a79bd14 would have left better commit messages to explain why this was necessary 🥲 Earlier, the critical section was smaller and those fields were not modified, looking back at:

amqp091-go/connection.go

Lines 370 to 408 in 425d3a7

func (c *Connection) shutdown(err *Error) {
atomic.StoreInt32(&c.closed, 1)
c.destructor.Do(func() {
c.m.Lock()
closes := make([]chan *Error, len(c.closes))
copy(closes, c.closes)
c.m.Unlock()
if err != nil {
for _, c := range closes {
c <- err
}
}
for _, ch := range c.channels {
c.closeChannel(ch, err)
}
if err != nil {
c.errors <- err
}
// Shutdown handler goroutine can still receive the result.
close(c.errors)
c.conn.Close()
for _, c := range closes {
close(c)
}
for _, c := range c.blocks {
close(c)
}
c.m.Lock()
c.noNotify = true
c.m.Unlock()
})
}

@lukebakken lukebakken merged commit 7b4cb59 into main Feb 8, 2023
@lukebakken lukebakken deleted the issue-170 branch February 8, 2023 17:45
lukebakken added a commit that referenced this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#31 resurfacing (?)
2 participants