-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-2828 Use topology version from Server instead of Connection in ProcessError. #1252
GODRIVER-2828 Use topology version from Server instead of Connection in ProcessError. #1252
Conversation
3e86d65
to
0c97b45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// ignore stale error | ||
if desc.TopologyVersion.CompareToIncoming(cerr.TopologyVersion) >= 0 { | ||
// Ignore errors that came from when the database was on a previous topology version. | ||
if topologyVersion.CompareToIncoming(cerr.TopologyVersion) >= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where the max{ServerTopology, ConnectionTopology} = ErrorTopology , why do we consider this a "previous topology version"? If this is correct behavior, can we add a comment explaining it?
// In the case where the max{serverDesc.TopologyVersion, connDesc.TopologyVersion} == cerr.TopologyVersion ,
// we consider cerr.Topology version a "previous topology version" because ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the current comment actually describes slightly different behavior because the logic also ignores errors that came from the current topology version.
I believe the reason we only act on errors where the error topology version is strictly greater than the local topology version is that we only want to process the first instance of that error for a given topology version, and ignore the rest.
For example consider the following sequence of events:
Step | Action | Driver TopologyVersion | DB TopologyVersion |
---|---|---|---|
1 | Initial state | 1 | 1 |
2 | DB elects new primary | 1 | 2 |
3 | Driver starts 4 concurrent writes to secondary, secondary returns "not writable primary" | 1 | 2 |
4 | Driver processes first "not writable primary" error with TopologyVersion=2, marks server as Kind=Unknown, updates its TopologyVersion | 2 | 2 |
5 | Driver processes 3 more "not writable primary" errors with TopologyVersion=2, ignores them | 2 | 2 |
Notice that the driver only processes the first "not writable primary" error, marking the server role as "unknown" until it is updated by the monitoring routine. Subsequent errors are ignored based on topology version, which prevents the driver from re-processing errors for the same topology change event.
Keep in mind that we only check topology version for "node is recovering" or "not primary" error codes, which we expect to increment topology version on the database. The first time the driver sees one of those errors, it knows the database topology has changed and needs to re-discover the topology. The driver ignores topology version for all other kinds of errors.
For additional reference, the topology version comparison logic comes directly from the error handling pseudocode in the SDAM spec.
I'll update the comment with a summary of the above information to increase clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. This great explanation! Thank you
// ignore stale error | ||
if desc.TopologyVersion.CompareToIncoming(wcerr.TopologyVersion) >= 0 { | ||
// Ignore errors that came from when the database was on a previous topology version. | ||
if topologyVersion.CompareToIncoming(wcerr.TopologyVersion) >= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where the max{ServerTopology, ConnectionTopology} = ErrorTopology , why do we consider this a "previous topology version"? If this is correct behavior, can we add a comment explaining it?
// In the case where the max{serverDesc.TopologyVersion, connDesc.TopologyVersion} == wcerr.TopologyVersion ,
// we consider cerr.Topology version a "previous topology version" because ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this comment for a detailed answer.
x/mongo/driver/topology/server.go
Outdated
@@ -434,9 +467,9 @@ func (s *Server) ProcessError(err error, conn driver.Connection) driver.ProcessE | |||
|
|||
res := driver.ServerMarkedUnknown | |||
// If the node is shutting down or is older than 4.2, we synchronously clear the pool | |||
if wcerr.NodeIsShuttingDown() || desc.WireVersion == nil || desc.WireVersion.Max < 8 { | |||
if wcerr.NodeIsShuttingDown() || wireVersion == nil || wireVersion.Max < 8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on creating a constant for "8"? Even though comparing wireVersions to their literal number is commonplace in the driver, it is a magic number which makes it sort of hard to interpret.
const wireVersionGEQ42 = 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using a named constant would help with readability. Will update.
} | ||
|
||
func newProcessErrorTestConn(tv *description.TopologyVersion) *processErrorTestConn { | ||
func newProcessErrorTestConn(wireVersion *description.VersionRange, stale bool) *processErrorTestConn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this and "newServerDescription" in the namespace? i.e. either both as package test functions, or both as closure functions in the test function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processErrorTestConn
can't be defined in the ProcessError
test function because it needs to define methods to satisfy the driver.Connection
interface. However, I can move newServerDescription
to the package scope.
x/mongo/driver/topology/server.go
Outdated
@@ -415,16 +448,16 @@ func (s *Server) ProcessError(err error, conn driver.Connection) driver.ProcessE | |||
|
|||
res := driver.ServerMarkedUnknown | |||
// If the node is shutting down or is older than 4.2, we synchronously clear the pool | |||
if cerr.NodeIsShuttingDown() || desc.WireVersion == nil || desc.WireVersion.Max < 8 { | |||
if cerr.NodeIsShuttingDown() || wireVersion == nil || wireVersion.Max < 8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on creating a constant for "8"? Even though comparing wireVersions to their literal number is commonplace in the driver, it is a magic number which makes it sort of hard to interpret.
const wireVersionGEQ42 = 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using a named constant would help with readability. Will update.
name: "stale not primary error", | ||
startDescription: newServerDescription(description.RSPrimary, processID, 1, nil), | ||
inputErr: driver.Error{ | ||
Code: 10107, // NotPrimary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: 10107 is "NotWritablePrimary", see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That error is documented as "NotPrimary" in many places in the Go driver code. I'll correct it in these tests and then open a subsequent PR to correct it elsewhere in the Go driver.
Co-authored-by: Preston Vasquez <prestonvasquez@icloud.com>
dda23d3
to
af7f854
Compare
…in ProcessError. (#1252) Co-authored-by: Preston Vasquez <prestonvasquez@icloud.com>
GODRIVER-2828
Summary
ProcessError
.ProcessError
test to express the expected initial and resulting state changes more clearly.Background & Motivation
While processing errors, the SDAM error handling function
ProcessError
attempts to compare the topology version on the error to the current topology version known by the Go driver, ignoring errors where the topology version is not newer than the known topology version. However, the Go driver currently uses the connection's topology version as the "known" topology version, which is never updated after the connection is created. As a result,ProcessError
will act on errors from older connections when it should ignore them.