-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Process hostname
encoding error in sqlite
#143
Conversation
* When `Socket.gethostname` returns a hostname with a special character like apostrophe in `Basecamp’s-Computer`, an encoding error is raised `Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8` * By forcing a utf-8 encoding, the error goes away and the process row properly inserts. * A similar issue occurred in the newrelic ruby gem, and required the same solution newrelic/newrelic-ruby-agent@e1f65a8
hostname
encoding error in sqlite
This also fixes an issue in MissionControl https://github.com/basecamp/mission_control-jobs/issues/61 |
@@ -55,7 +55,7 @@ def kind | |||
end | |||
|
|||
def hostname | |||
@hostname ||= Socket.gethostname | |||
@hostname ||= Socket.gethostname.force_encoding(Encoding::UTF_8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a question: why not try to convert the string to UTF-8 via encode(Encoding::UTF_8)
rather than changing the encoding alone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm thanks for a good question! I think it's just a habit that i've typically reached for force_encoding
in the past so I did it here. I guess most issues i've encountered have been encoding misinterpretations, vs needing to be re-encoded.
But I think encode
should work just as well, might work for more use-cases, and also doesn't mutate anything which is a nice side effect (or lack of a side effect 🙃 ). I'll change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, it actually doesn't work.
hostname = "Basecamp’s-Computer"
=> "Basecamp’s-Computer"
hostname.force_encoding("ASCII-8BIT")
=> "Basecamp\xE2\x80\x99s-Computer"
hostname.force_encoding("ASCII-8BIT").encode(Encoding::UTF_8)
(irb):3:in `encode': "\xE2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
It also fails the same way if I use the direct result of Socket.gethostname
on my machine:
Socket.gethostname.force_encoding("ASCII-8BIT") # works
Socket.gethostname.encode(Encoding::UTF_8) # fails
# `encode': "\xE2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
The return value of gethostname
seems to always come back in binary format, but the underlying bytes, at least in my case, are already valid UTF-8 (\xE2\x80\x99
is the byte sequence for "right single quotation mark"). So by forcing the encoding to be UTF-8 with force_encoding
, it reinterprets the string correctly.
When I use encode
it seems to actually convert as if it was binary data to utf-8 and fails.
I don't know for certain forcing encoding couldn't cause an issue on some platform somewhere - but the same format has been in use by the newrelic ruby agent for the past 5 years (newrelic/newrelic-ruby-agent@e1f65a8), so there's a good chance it's ok to keep as force_encoding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, how odd! The first case, I'd expect it to fail, but it's very strange the string comes from Socket.gethostname
with the wrong encoding, I wonder why that's it...
In any case, ok to keep force_encoding
👍
When
Socket.gethostname
returns a hostname with a special character like apostrophe inBasecamp’s-Computer
, an encoding error is raisedEncoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8
.By forcing a utf-8 encoding, the error goes away and the process row properly inserts.
A similar issue occurred in the newrelic ruby gem, and required the same solution newrelic/newrelic-ruby-agent@e1f65a8
I noticed the same type of issue in Postgres, but there I only periodically saw a warning and the code still worked. This seems to resolve that warning as well.