Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Process hostname encoding error in sqlite #143

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

jpcamara
Copy link
Contributor

@jpcamara jpcamara commented Feb 6, 2024

When Socket.gethostname returns a hostname with a special character like apostrophe in Basecamp’s-Computer, an encoding error is raised Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8.

By forcing a utf-8 encoding, the error goes away and the process row properly inserts.

A similar issue occurred in the newrelic ruby gem, and required the same solution newrelic/newrelic-ruby-agent@e1f65a8

I noticed the same type of issue in Postgres, but there I only periodically saw a warning and the code still worked. This seems to resolve that warning as well.

* When `Socket.gethostname` returns a hostname with a special character like apostrophe in `Basecamp’s-Computer`, an encoding error is raised `Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8`

* By forcing a utf-8 encoding, the error goes away and the process row properly inserts.

* A similar issue occurred in the newrelic ruby gem, and required the same solution newrelic/newrelic-ruby-agent@e1f65a8
@jpcamara jpcamara changed the title Fix encoding error in sqlite Fix Process hostname encoding error in sqlite Feb 6, 2024
@jpcamara
Copy link
Contributor Author

jpcamara commented Feb 6, 2024

This also fixes an issue in MissionControl https://github.com/basecamp/mission_control-jobs/issues/61

@@ -55,7 +55,7 @@ def kind
end

def hostname
@hostname ||= Socket.gethostname
@hostname ||= Socket.gethostname.force_encoding(Encoding::UTF_8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a question: why not try to convert the string to UTF-8 via encode(Encoding::UTF_8) rather than changing the encoding alone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm thanks for a good question! I think it's just a habit that i've typically reached for force_encoding in the past so I did it here. I guess most issues i've encountered have been encoding misinterpretations, vs needing to be re-encoded.

But I think encode should work just as well, might work for more use-cases, and also doesn't mutate anything which is a nice side effect (or lack of a side effect 🙃 ). I'll change it

Copy link
Contributor Author

@jpcamara jpcamara Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, it actually doesn't work.

hostname = "Basecamp’s-Computer"
 => "Basecamp’s-Computer"
hostname.force_encoding("ASCII-8BIT")
 => "Basecamp\xE2\x80\x99s-Computer"
hostname.force_encoding("ASCII-8BIT").encode(Encoding::UTF_8)
(irb):3:in `encode': "\xE2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

It also fails the same way if I use the direct result of Socket.gethostname on my machine:

Socket.gethostname.force_encoding("ASCII-8BIT") # works
Socket.gethostname.encode(Encoding::UTF_8) # fails
# `encode': "\xE2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

The return value of gethostname seems to always come back in binary format, but the underlying bytes, at least in my case, are already valid UTF-8 (\xE2\x80\x99 is the byte sequence for "right single quotation mark"). So by forcing the encoding to be UTF-8 with force_encoding, it reinterprets the string correctly.

When I use encode it seems to actually convert as if it was binary data to utf-8 and fails.

I don't know for certain forcing encoding couldn't cause an issue on some platform somewhere - but the same format has been in use by the newrelic ruby agent for the past 5 years (newrelic/newrelic-ruby-agent@e1f65a8), so there's a good chance it's ok to keep as force_encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, how odd! The first case, I'd expect it to fail, but it's very strange the string comes from Socket.gethostname with the wrong encoding, I wonder why that's it...

In any case, ok to keep force_encoding 👍

@rosa rosa merged commit 3413d29 into rails:main Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants