Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irc: Don't write line ending twice. #1674

Merged
merged 1 commit into from
Oct 27, 2019
Merged

irc: Don't write line ending twice. #1674

merged 1 commit into from
Oct 27, 2019

Conversation

Tomin1
Copy link
Contributor

@Tomin1 Tomin1 commented Aug 3, 2019

When echo message is simulated it is dispatched the message with '\r\n'
appended to the end. PreTrigger expects the message to not to have any
newline or carriage return characters at the end of the message.

Copy link
Contributor

@HumorBaby HumorBaby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR led to some fun digging around the past 10 years of the codebase to answer the question: "why did the humans of the past only care to strip \r, instead of all whitespace??"

I found that it has to do with how asynchat.async_chat parses incoming messages... basically, you set_terminator('\n') for what you believe to be the end of an incoming message, and then found_terminator() is called to handle whatever has been read into the buffer. Since IRC protocol requires that messages end with CRLF (\r\n), the bot's buffer is usually left with a straggling \r, which 10 years ago (and I assume probably even earlier since this is only the earliest commit for this repo) was handled with a:

sopel/irc.py

Lines 92 to 95 in 30cddb2

def found_terminator(self):
line = self.buffer
if line.endswith('\r'):
line = line[:-1]

before being handled downstream. Eventually, PreTrigger included a similar line = line.strip('\r') safety check when it was created (5 years ago).

Anyways, enough of the history... the main point I want to make is that instead of manually stripping the CRLF from the temp variable, just have PreTrigger handle it. Messages being sent with the bot must always have \r\n, while messages being parsed by the bot (i.e., through PreTrigger) never need it. So just have the final stripping be there instead. For example:

diff --git a/sopel/trigger.py b/sopel/trigger.py
index f170b2c4..177a634f 100644
--- a/sopel/trigger.py
+++ b/sopel/trigger.py
@@ -28,7 +28,8 @@ class PreTrigger(object):
     def __init__(self, own_nick, line):
         """own_nick is the bot's nick, needed to correctly parse sender.
         line is the full line from the server."""
-        line = line.strip('\r')
+        strip_num_chars = -2 if line.endswith('\r\n') else -1 if line.endswith('\r') else None
+        line = line[:strip_num_chars]
         self.line = line
 
         # Break off IRCv3 message tags, if present

Something like this would also ensure that a message like hello, world. \n\r\n would retain that first `\n', which is intended (and allowed in a valid IRC message).

EDIT: I lied. Currently, IRC won't support LR or CR alone in a message (maybe with multi-line?).

Something changing the strip('\r')s to strip('\r\n') might be easiest/best.

Just a suggestion though (especially my code sample). Other's may have differing opinions.

@dgw
Copy link
Member

dgw commented Aug 3, 2019

I doubt CR or LF will be allowed even with multi-line. Discussion on that has all been about using BATCH or client-side message-tags, so let's not worry about it here.

@Tomin1 I agree that PreTrigger itself should handle making sure the message is cleaned properly. Thanks to @HumorBaby for digging around proactively, so I didn't have to!

@Tomin1
Copy link
Contributor Author

Tomin1 commented Aug 3, 2019

@dgw Sure, I'm fine doing this some other way. This was easy for someone that doesn't know sopel code base very well (yet) without breaking other use cases. I just noticed that chanlogs module appends extra characters to sopel's own messages when I added echo support to it.

I think I'll make a bit different variation of that example code from @HumorBaby. It's more readable if it's not all on one line IMO.

@Tomin1
Copy link
Contributor Author

Tomin1 commented Aug 4, 2019

After procrastinating a bit (some say it's useful to a point), I decided a bit differently. I think it's better to just add that \n to be stripped in PreTrigger. That is also something @HumorBaby suggested.

Though, I think it would be nicer to give PreTrigger message without \n character but then again this makes less copies of it. I think CR-LF adding code should belong to send instead of write probably but that's not something that would need to be done now.

@dgw
Copy link
Member

dgw commented Aug 6, 2019

I think CR-LF adding code should belong to send instead of write probably

Agreed. If you like, submit a PR for that. But wait until #1676 is merged unless you like getting scared by CI failures that aren't your fault. 😸

Copy link
Contributor

@HumorBaby HumorBaby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fix the immediate issue, so let's do it. Future PRs still very much welcome to fix the underlying issue with the order of operations performed on outbound lines. 😸

@dgw dgw added this to the 7.0.0 milestone Aug 7, 2019
@dgw dgw added the Bugfix Generally, PRs that reference (and fix) one or more issue(s) label Aug 7, 2019
@dgw dgw merged commit 47eacd8 into sopel-irc:master Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugfix Generally, PRs that reference (and fix) one or more issue(s) Low Priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants