Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor error classes #15

Merged
merged 5 commits into from
Jan 13, 2023

Conversation

casperisfine
Copy link
Contributor

Fix: #11

Opening this as a draft because I hit a few roadblocks, so I figured I should solicit some feedback.

The goals are:

  • Not raise any error that isn't a descendant of Trilogy::Error.
  • Have a clear distinction between transient network errors that may be retried, and invalid queries that shouldn't.

Unsolved issues (yet)

  • Trilogy calls rb_syserr_fail_str in a few places. Meaning it can still raise any of the Errno::* errors.
  • handle_trilogy_error raise all TRILOGY_ERR as ProtocolError which is a ConnectionError, but
    I'm not certain none of the possible errors could be considered client errors.

cc @composerinteralia @matthewd

@adrianna-chang-shopify
Copy link
Collaborator

@jhawthorn @composerinteralia @matthewd we're looking to get the ball rolling on this work again. Do y'all have any feedback in the meantime? Thanks ❤️

cc @paarthmadan

@composerinteralia
Copy link
Contributor

I can take a look early next week if nobody beats me to it. I vaguely remember liking this change, if that helps at all 😄.

@composerinteralia
Copy link
Contributor

This looks promising to me so far. My only concern is compatibility with https://github.com/github/activerecord-trilogy-adapter/blob/000ed42f5023e447b2c28e970fcd1fbac325232c/lib/trilogy_adapter/lost_connection_exception_translator.rb, and one custom bit of translation we've got at GitHub for retrying timeouts on connect, roughly:

def connect
  Trilogy.new(config)
rescue Errno::ETIMEDOUT, Errno::ECONNREFUSED, Errno::ECONNRESET => error
  raise ActiveRecord::ConnectionNotEstablished.new(error.message)

I think this change will ultimately simplify all that, but I'd just want to be careful to not accidentally lose any of our reconnection handling along the way.

Trilogy calls rb_syserr_fail_str in a few places. Meaning it can still raise any of the Errno::* errors.

I know we've got adapter code that rescues and translates EPIPE, ETIMEDOUT, ECONNREFUSED, and ECONNRESET, so having Trilogy-specific for those in particular seems helpful.

handle_trilogy_error raise all TRILOGY_ERR as ProtocolError which is a ConnectionError, but
I'm not certain none of the possible errors could be considered client errors.

Here's an example:
https://github.com/github/trilogy/blob/90f0b701dc77d5eaa2db52b9031b19718baba8dc/test/blocking_test.c#L117-L121 The error_code on that is 1064, a query parse error. There would be no reason to retry that.

In theory we could raise different errors for different error_codes, but I'm not sure how practical that is since there are lots of them.

@adrianna-chang-shopify
Copy link
Collaborator

Hey @composerinteralia , thank you for the feedback! 🙇‍♀️

My only concern is compatibility with

Given @byroot 's approach using modules, I think backwards compatibility should be retained, since all of the error classes will still inherit from the original error classes. In moving towards just Trilogy:Error classes, would adding more extensive test coverage for LostConnectionExceptionTranslator help mitigate the risk?

one custom bit of translation we've got at GitHub for retrying timeouts on connect

I'm curious why this translation bit wasn't upstreamed into the AR adapter for Trilogy. I thought it was odd that underlying errors like Errno::ECONNREFUSED were being translated to ActiveRecord::StatementInvalid. Does it make sense to get this upstreamed first before we start changing the error classes?

Trilogy calls rb_syserr_fail_str in a few places. Meaning it can still raise any of the Errno::* errors.

I know we've got adapter code that rescues and translates EPIPE, ETIMEDOUT, ECONNREFUSED, and ECONNRESET, so having Trilogy-specific for those in particular seems helpful.

@byroot do you remember -- was the issue here that we're calling rb_syserr_fail_str in places where we're not sure what the appropriate wrapper error would be? 🤔 Were you looking for feedback on what the Trilogy::Error mapping should be?

The error_code on that is 1064, a query parse error. There would be no reason to retry that.
In theory we could raise different errors for different error_codes, but I'm not sure how practical that is since there are lots of them.

Would it make sense to do something more similar to what mysql2 does and map explicit error codes to ProtocolError / ConnectionError, and have everything else raised as a more generic BaseError?

@casperisfine
Copy link
Contributor Author

was the issue here that we're calling rb_syserr_fail_str in places where we're not sure what the appropriate wrapper error would be?

It's been a while. But the issue with rb_syserr_fail_str is that you pass an errno, so you need to carefully check the manpage of each of the function that errno might come from to see which are the possible raised exception. And that can sometime differ from platform to platform. Overall it could raise any of SystemCallError.subclasses which contains 107 different errors.

careful auditing might reduce this to a handful, but it's a lot of work and error prone.

@adrianna-chang-shopify
Copy link
Collaborator

So maybe in our first pass at this, we leave rb_syserr_fail_str as is and maintain the translation logic in the adapter code?

@composerinteralia
Copy link
Contributor

composerinteralia commented Nov 29, 2022

Given @byroot 's approach using modules, I think backwards compatibility should be retained, since all of the error classes will still inherit from the original error classes.

👍🏻 I'm probably unnecessarily worried.

I'm curious why this translation bit wasn't upstreamed into the AR adapter for Trilogy.

Good question. I think we were in a rush to fix a bug and then never revisited it. From my commit message for that change:

We could possibly solve this in Rails instead, and we should probably
change Trilogy to raise some more easily rescuable errors so we don't
need this, but making this change in <internal library> seems like the
quickest way to get something deployable.

Would it make sense to ... map explicit error codes to ProtocolError / ConnectionError, and have everything else raised as a more generic BaseError?

Yeah, I think that'd be OK, especially if we have a well-known list we can draw from.

@adrianna-chang-shopify adrianna-chang-shopify force-pushed the base-errors branch 2 times, most recently from 7ed2578 to b81d3ed Compare December 6, 2022 18:43
@adrianna-chang-shopify
Copy link
Collaborator

adrianna-chang-shopify commented Dec 6, 2022

Okay, so following the feedback from our last conversation, we've:

  • Brought in the error code mappings from mysql2, so that certain TRILOGY_ERRs are raised as ConnectionError or TimeoutError, with remaining TRILOGY_ERRs being considered ProtocolErrors.
  • Decided to hold off on touching places where we call rb_syserr_fail_str for now. We can do this in a second pass.

This is ready for another round of feedback. I'll work on changes to the adapter afterwards, but it would be good to have a conversation about how much leeway we have to make breaking changes to translating errors in the adapter / what that process should look like.

@casperisfine casperisfine marked this pull request as ready for review December 6, 2022 20:14
class ProtocolError < BaseError
ERROR_CODES = {
1205 => TimeoutError, # ER_LOCK_WAIT_TIMEOUT
1044 => ConnectionError, # ER_DBACCESS_DENIED_ERROR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized. We can't raise ConnectionError as it's a module. We'll need to create another error that include that module.

We should also probably try to cover that path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch 👍 On it

@composerinteralia
Copy link
Contributor

it would be good to have a conversation about how much leeway we have to make breaking changes to translating errors in the adapter / what that process should look like.

I think it's fine to make breaking changes—there's probably not too much use of this library outside GitHub yet. It'd be helpful to get a corresponding activerecord-trilogy-adapter branch that passes with these changes so we can run it all against our CI before merging.

@adrianna-chang-shopify
Copy link
Collaborator

It'd be helpful to get a corresponding activerecord-trilogy-adapter branch that passes with these changes

I can absolutely do that!

@composerinteralia
Copy link
Contributor

composerinteralia commented Jan 9, 2023

@haldun is running these changes against GitHub CI. We'll let you know if anything comes up.

@composerinteralia
Copy link
Contributor

GitHub CI passing after renaming a couple things 👍🏻

rows.count
end
class Result
attr_reader :fields, :rows, :query_time
Copy link
Contributor

@composerinteralia composerinteralia Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed these warnings in the activerecord-trilogy-adapter tests:

warning: method redefined; discarding old fields
warning: method redefined; discarding old rows
warning: method redefined; discarding old query_time

Defining these in Ruby seems good to me, but then I assume we'll need to remove the rb_define_attrs in cext.c.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

byroot and others added 4 commits January 11, 2023 15:46
Fix: trilogy-libraries#11

The goals are:

  - Not raise any error that isn't a descendant of `Trilogy::Error`.
  - Have a clear distinction between transient network errors that may be retried, and invalid queries that shouldn't.

Unsolved issues (yet)

  - Trilogy calls `rb_syserr_fail_str` in a few places. Meaning it can still raise any of the `Errno::*` errors.
  - `handle_trilogy_error` raise all `TRILOGY_ERR` as `ProtocolError` which is a `ConnectionError`, but
     I'm not certain none of the possible errors could be considered client errors.
Instead of raising all ProtocolErrors as connection errors,
we use the SQL error code to determine the type of error to raise
when handling the Trilogy error in the Ruby bindings.

The mapping of error codes to error classes was taken from the mysql2
implementation: https://github.com/brianmario/mysql2/blob/master/lib/mysql2/error.rb#L13-L27

Note that we don't need to care about the 2xxx errors because those
are raised by the libmysql client, which Trilogy doesn't deal with.

Co-authored-by: Jean Boussier <jean.boussier@shopify.com>
@composerinteralia
Copy link
Contributor

composerinteralia commented Jan 13, 2023

CI keeps getting cancelled (timing out?) for some reason. Retrying again. Once it's green I'll merge.

Update: I don't think rerunning is going to help, at least at the moment. I'm seeing some weirdness on another branch as well (it's suddenly taking a really long time to install packages), although most of those test did pass eventually: https://github.com/github/trilogy/compare/trigger-ci?expand=1. I'll see if I can run these locally.

@composerinteralia
Copy link
Contributor

composerinteralia commented Jan 13, 2023

I've got a patch that I think should get things passing, but realized I can't push to your branch. Could you apply this and I'll kick off the CI builds again?

Patch
From b7207666d2f646715eaabf0c42d0b2b97524b683 Mon Sep 17 00:00:00 2001
From: Daniel Colson <danieljamescolson@gmail.com>
Date: Fri, 13 Jan 2023 06:48:13 -0500
Subject: [PATCH] Close connections after timeout test

Prior to this commit client_1 would remain open, causing any later test
touching the trilogy_test table to fail after the default
innodb_lock_wait_timeout (I think 50s). This was causing the tests to
timeout in CI.

This commit solves the problem by closing the clients after the test
runs.

This commit also sets innodb_lock_wait_timeout on client_1 instead of
client_2. We want client_2 to timeout as soon as possible so the test
doesn't take too long, but prior to this change it was timing out after
the default 50s.
---
 contrib/ruby/test/client_test.rb | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/contrib/ruby/test/client_test.rb b/contrib/ruby/test/client_test.rb
index 6eca9a6..bba1e93 100644
--- a/contrib/ruby/test/client_test.rb
+++ b/contrib/ruby/test/client_test.rb
@@ -518,14 +518,18 @@ class ClientTest < TrilogyTest
     create_test_table(client_1)
     client_2.change_db("test")
 
-    client_1.query("SET GLOBAL innodb_lock_wait_timeout = 2;")
     client_1.query("INSERT INTO trilogy_test (varchar_test) VALUES ('a')")
     client_1.query("BEGIN")
     client_1.query("SELECT * FROM trilogy_test FOR UPDATE")
 
+    client_2.query("SET SESSION innodb_lock_wait_timeout = 1;")
     assert_raises Trilogy::TimeoutError do
       client_2.query("SELECT * FROM trilogy_test FOR UPDATE")
     end
+
+  ensure
+    ensure_closed(client_1)
+    ensure_closed(client_2)
   end
 
   def test_connection_error
-- 
2.34.1

@casperisfine
Copy link
Contributor Author

Thanks. Applied in 1a9ab90

@composerinteralia
Copy link
Contributor

composerinteralia commented Jan 13, 2023

Looks like test_connection_error is flaky—sometimes we end up with a TRILOGY_PROTOCOL_VIOLATION. I tried running the same tests on main and saw the same flakiness, so I think the new test is exposing an existing bug.

I'm going to merge and look into that separately (it does not reproduce consistently with the same seed, unfortunately). If the flakiness is disruptive in the meantime we may need to remove that test.

@composerinteralia composerinteralia requested a review from a team January 13, 2023 14:52
@composerinteralia composerinteralia merged commit 10aee3b into trilogy-libraries:main Jan 13, 2023
@adrianna-chang-shopify
Copy link
Collaborator

Looks like test_connection_error is flaky—sometimes we end up with a TRILOGY_PROTOCOL_VIOLATION. I tried running the same tests on main and saw the same flakiness, so I think the new test is exposing an existing bug.

Oh interesting that it fails on main 🤔 I thought maybe we needed to add a case to the switch statement in #handle_trilogy_error that would raise TRILOGY_PROTOCOL_VIOLATION as a BaseConnectionError, but seems like that might not be the correct fix if it's flakiness that was apparent on main too.

Anyways, thanks for getting this shipped -- I'm going to start working on the next iteration of this. I think we'll need to look at:

  • Handling TRILOGY_DNS_ERR as BaseConnectionError following the convo here
  • Ensuring IOError is raised as ConnectionClosed
  • Addressing the rest of the syscall errors

@adrianna-chang-shopify adrianna-chang-shopify deleted the base-errors branch January 13, 2023 15:06
@lorint
Copy link

lorint commented Jan 13, 2023

I think the new test is exposing an existing bug

(Hi - I'm Lorin Thwaits, the guy adding AR 6.1 and 7.0 compatibility to Trilogy. Long time litener / first time caller!)

The Python folks had this same PROTOCOL_VIOLATION error unexpectedly surface starting with the release of MySQL 8.0.24 -- a MySQL bug had been fixed at that point and the change notes for that release indicate:
"Prepared statements without parameters violate the MySQL protocol by sending unnecessary extra bytes."

@adrianna-chang-shopify
Copy link
Collaborator

👋 Hi @lorint , thanks for supplying some extra context! Do you know if the Python folks were experiencing flakiness as well, or was the PROTOCOL_VIOLATION error showing up consistently after moving to MySQL 8.0.24?

I suspect this might be something different though, I don't believe Trilogy supports prepared statements yet 🤔

@lorint
Copy link

lorint commented Jan 13, 2023

Do you know if the Python folks were experiencing flakiness as well?

I think it was a consistent failure they were seeing.

With this new patch in place then running our suite on 8.0 I get a perfectly consistent failure. Somewhat fast and multithreaded machine -- Macbook M1.

@adrianna-chang-shopify
Copy link
Collaborator

With this new patch in place then running our suite on 8.0 I get a perfectly consistent failure.

Do you have an example of a test that's failing consistently from your suite? I can't get test_connection_error to fail locally on my end, even with MySQL 8 + Ruby 3.2. I'm also on an M1.

@lorint
Copy link

lorint commented Jan 16, 2023

@adrianna-chang-shopify - Now that I went back to check this out I see that I have mis-spoken! The failure I had seen was after I had merged in your ARTA PR "Translate new error classes" along with my compatibility PR, on this line:
https://github.com/adrianna-chang-shopify/activerecord-trilogy-adapter/blob/ac-translate-new-error-classes/test/lib/active_record/connection_adapters/trilogy_adapter_test.rb#L44

But now it's been fixed and merged upstream here in this PR on the Trilogy gem. Have just done the same thing again and everything works :)

lorint added a commit to lorint/activerecord-trilogy-adapter that referenced this pull request Jan 16, 2023
There is a new generic Trilogy::ClientError class established with
this PR which is useful to provide full AR 7.0 support:
trilogy-libraries/trilogy#15
composerinteralia added a commit that referenced this pull request Feb 28, 2023
#15 changed the error class for
TRILOGY_ERR from `DatabaseError` to `ProtocolError`. This commit
reintroduces `DatabaseError` as an ancestor of `ProtocolError` for
compatibility reasons, so that rescuing `DatabaseError` still works as
before.

Note that even with this change there are a number of error codes (see
ProtocolError::ERROR_CODES) that are no longer `DatabaseError`s, since
we have explicitly recategorized them as `BaseConnectionError`,
`TimeoutError`, or `QueryError`.
composerinteralia added a commit to composerinteralia/rails that referenced this pull request Jan 26, 2024
This commit makes a few changes to our trilogy error translation:

* trilogy-libraries/trilogy#118 introduced
  `Trilogy::EOFError` which we can use instead of matching on
  `TRILOGY_CLOSED_CONNECTION`.
* trilogy-libraries/trilogy#15 introduced
  `Trilogy::ConnectionClosed`, which inherits from `IOError` for
  backwards compatibility. As far as I can tell that's the only
  `IOError` trilogy can raise, so this commit rescues the
  trilogy-specific error instead.
* As far as I can tell Trilogy does not raise `SocketError`, so don't
  bother translating that
* Don't treat TRILOGY_UNEXPECTED_PACKET as a connection error. If we get
  this, it's probably a bug in trilogy that we should fix. I'd like to
  eventually get rid of TRILOGY_INVALID_SEQUENCE_ID too, but we're
  currently relying on it in a few tests (related to trilogy missing
  caching_sha2_password auth support, if I recall correctly)

I'm kinda hoping we'll eventually be able to simplify this to something
like:

```rb
if exception.is_a?(Trilogy::ConnectionError)
  ConnectionFailed.new(message, connection_pool: @pool)
else
  super
end
```

but we'd need more changes to trilogy before that is possible.
composerinteralia added a commit to composerinteralia/rails that referenced this pull request Jan 26, 2024
This commit makes a few changes to our trilogy error translation:

* trilogy-libraries/trilogy#118 introduced
  `Trilogy::EOFError` which we can use instead of matching on
  `TRILOGY_CLOSED_CONNECTION`.
* trilogy-libraries/trilogy#15 introduced
  `Trilogy::ConnectionClosed`, which inherits from `IOError` for
  backwards compatibility. As far as I can tell that's the only
  `IOError` trilogy can raise, so this commit rescues the
  trilogy-specific error instead.
* As far as I can tell Trilogy does not raise `SocketError`, so don't
  bother translating that
* Don't treat TRILOGY_UNEXPECTED_PACKET as a connection error. If we get
  this, it's probably a bug in trilogy that we should fix. I'd like to
  eventually get rid of TRILOGY_INVALID_SEQUENCE_ID too, but we're
  currently relying on it in a few tests (related to trilogy missing
  caching_sha2_password auth support, if I recall correctly)

I'm kinda hoping we'll eventually be able to simplify this to something
like:

```rb
if exception.is_a?(Trilogy::ConnectionError)
  ConnectionFailed.new(message, connection_pool: @pool)
else
  super
end
```

but we'd need more changes to trilogy before that is possible.
viralpraxis pushed a commit to viralpraxis/rails that referenced this pull request Mar 24, 2024
This commit makes a few changes to our trilogy error translation:

* trilogy-libraries/trilogy#118 introduced
  `Trilogy::EOFError` which we can use instead of matching on
  `TRILOGY_CLOSED_CONNECTION`.
* trilogy-libraries/trilogy#15 introduced
  `Trilogy::ConnectionClosed`, which inherits from `IOError` for
  backwards compatibility. As far as I can tell that's the only
  `IOError` trilogy can raise, so this commit rescues the
  trilogy-specific error instead.
* As far as I can tell Trilogy does not raise `SocketError`, so don't
  bother translating that
* Don't treat TRILOGY_UNEXPECTED_PACKET as a connection error. If we get
  this, it's probably a bug in trilogy that we should fix. I'd like to
  eventually get rid of TRILOGY_INVALID_SEQUENCE_ID too, but we're
  currently relying on it in a few tests (related to trilogy missing
  caching_sha2_password auth support, if I recall correctly)

I'm kinda hoping we'll eventually be able to simplify this to something
like:

```rb
if exception.is_a?(Trilogy::ConnectionError)
  ConnectionFailed.new(message, connection_pool: @pool)
else
  super
end
```

but we'd need more changes to trilogy before that is possible.
fractaledmind pushed a commit to fractaledmind/rails that referenced this pull request May 13, 2024
This commit makes a few changes to our trilogy error translation:

* trilogy-libraries/trilogy#118 introduced
  `Trilogy::EOFError` which we can use instead of matching on
  `TRILOGY_CLOSED_CONNECTION`.
* trilogy-libraries/trilogy#15 introduced
  `Trilogy::ConnectionClosed`, which inherits from `IOError` for
  backwards compatibility. As far as I can tell that's the only
  `IOError` trilogy can raise, so this commit rescues the
  trilogy-specific error instead.
* As far as I can tell Trilogy does not raise `SocketError`, so don't
  bother translating that
* Don't treat TRILOGY_UNEXPECTED_PACKET as a connection error. If we get
  this, it's probably a bug in trilogy that we should fix. I'd like to
  eventually get rid of TRILOGY_INVALID_SEQUENCE_ID too, but we're
  currently relying on it in a few tests (related to trilogy missing
  caching_sha2_password auth support, if I recall correctly)

I'm kinda hoping we'll eventually be able to simplify this to something
like:

```rb
if exception.is_a?(Trilogy::ConnectionError)
  ConnectionFailed.new(message, connection_pool: @pool)
else
  super
end
```

but we'd need more changes to trilogy before that is possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ruby: Base error types?
5 participants