Broken interrupt handling when connection is inactive #1478

dgw · 2019-02-12T19:18:36Z

For the record, I'm testing on macOS today because that's what I have available. However, the same behavior occurs with identical or similar tracebacks on my Ubuntu machine at home, so I don't think it's related to the fact that I'm using Homebrew Python or anything. It's probably related to Sopel's sometimes-wonky exception handling.

Here's what I did:

Run sopel
Press Ctrl-C after "Connecting to <server>..."
Run sopel again
Press Ctrl-C while Sopel is waiting to reconnect after an expected SSL failure

Interrupting the connection phase resulted in an AttributeError, probably as expected (since quitting tries to send something to the socket, and the socket object doesn't exist before connecting finishes):

Connecting to irc.network.net:6667...
^CGot quit signal, shutting down.
Traceback (most recent call last):
  File "/Users/dgw/github/sopel/sopel/__init__.py", line 91, in run
    p.run(config.core.host, int(config.core.port))
  File "/Users/dgw/github/sopel/sopel/irc.py", line 167, in run
    self.initiate_connect(host, port)
  File "/Users/dgw/github/sopel/sopel/irc.py", line 177, in initiate_connect
    source_address=source_address))
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
  File "/Users/dgw/github/sopel/sopel/__init__.py", line 73, in signal_handler
    p.quit('Closing')
  File "/Users/dgw/github/sopel/sopel/irc.py", line 199, in quit
    self.write(['QUIT'], message)
  File "/Users/dgw/github/sopel/sopel/bot.py", line 166, in write
    irc.Bot.write(self, args, text=text)
  File "/Users/dgw/github/sopel/sopel/irc.py", line 161, in write
    self.send(temp.encode('utf-8'))
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncore.py", line 360, in send
    result = self.socket.send(data)
AttributeError: 'NoneType' object has no attribute 'send'

Interrupting the reconnect delay yields a completely different traceback, and additionally requires pressing Ctrl-C twice:

Warning: Disconnected. Reconnecting in 20 seconds...
^CGot quit signal, shutting down.
Traceback (most recent call last):
  File "./sopel.py", line 7, in <module>
    sys.exit(run_script.main())
  File "/Users/dgw/github/sopel/sopel/run_script.py", line 351, in main
    ret = run(config_module, pid_file_path)
  File "/Users/dgw/github/sopel/sopel/__init__.py", line 120, in run
    time.sleep(delay)
  File "/Users/dgw/github/sopel/sopel/__init__.py", line 73, in signal_handler
    p.quit('Closing')
  File "/Users/dgw/github/sopel/sopel/irc.py", line 199, in quit
    self.write(['QUIT'], message)
  File "/Users/dgw/github/sopel/sopel/bot.py", line 166, in write
    irc.Bot.write(self, args, text=text)
  File "/Users/dgw/github/sopel/sopel/irc.py", line 161, in write
    self.send(temp.encode('utf-8'))
  File "/Users/dgw/github/sopel/sopel/irc.py", line 327, in _ssl_send
    result = self.socket.send(data)
OSError: [Errno 9] Bad file descriptor
^CGot quit signal, shutting down.
Exception ignored in: <module 'threading' from '/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py'>
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1273, in _shutdown
    t.join()
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1032, in join
    self._wait_for_tstate_lock()
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
  File "/Users/dgw/github/sopel/sopel/__init__.py", line 73, in signal_handler
    p.quit('Closing')
  File "/Users/dgw/github/sopel/sopel/irc.py", line 199, in quit
    self.write(['QUIT'], message)
  File "/Users/dgw/github/sopel/sopel/bot.py", line 166, in write
    irc.Bot.write(self, args, text=text)
  File "/Users/dgw/github/sopel/sopel/irc.py", line 161, in write
    self.send(temp.encode('utf-8'))
  File "/Users/dgw/github/sopel/sopel/irc.py", line 327, in _ssl_send
    result = self.socket.send(data)
OSError: [Errno 9] Bad file descriptor

I've run into this second issue relatively often while testing things in the last few months, since I often intentionally interrupt Sopel during states other than "Connected to network and running normally".

The exception itself isn't a big deal (though it would be nice not to spit it out). I consider needing to press Ctrl-C twice to be the main bug here. That simply shouldn't be required.

Interrupting Sopel after the "Loading modules..." line (but before it starts to connect) yields a clean exit, though. That's kind of weird, and I haven't had time to look into why.

Since I know we have a few people digging around in Sopel's internals and refactoring things now (and doing damn good work, too!), I'm hoping one of them will look into this at some point. 😹

This isn't a high-severity bug, but I do want to get it fixed if possible, ideally in the next year or two. That need to press Ctrl-C twice carries over into, for example, sopel --quit. Sopel should never need to be told to quit twice, unless something is catastrophically wrong—and waiting to reconnect is a totally normal situation that shouldn't break things the way it does now.

The text was updated successfully, but these errors were encountered:

Because the `time.sleep` during the disconnect phase, the signal handling is thrown off and a new bot is spawned before the `bot.hasquit` flag can be checked. Add a check at the top of the loop to see if the `hasquit` flag was set during a disconnect. Fixes sopel-irc#1478

HumorBaby · 2019-03-30T03:35:25Z

I see the fix is "low priority", but it is a straightforward backport to 6.6.x (from CLI back to __init__.py). Is this something worth pursuing?

HumorBaby · 2019-03-30T04:20:48Z

Interrupting Sopel after the "Loading modules..." line (but before it starts to connect) yields a clean exit, though. That's kind of weird, and I haven't had time to look into why.

I believe it's because, at this point, the KeyboardInterrupt is caught as-is from the MainThread, as opposed to the socket thread (or any one of the many subsequent threads that are spawned from this point on).
See:

sopel/sopel/cli/run.py

Lines 84 to 86 in f694b5b

    
               p.run(config.core.host, int(config.core.port)) 
        
           except KeyboardInterrupt: 
        
               break

vs. waiting for bot.hasquit to be set after a signal from another thread.

HumorBaby · 2019-03-30T17:29:27Z

I see the fix is "low priority", but it is a straightforward backport to 6.6.x (from CLI back to __init__.py). Is this something worth pursuing?

During the development for #1527 (and I have now noticed for Docker containers, as well), I found that more often than not, the triggering the exceptions described above (^C), would lead a complete hang (subsequent ^C did not do anything) 😢 The only way to kill Sopel after this was was ^Z, kill -9 %1 (for docker, containers needed to be manually force-removed docker rm -f ...)

This docker behavior is not new, but since it's related to this, I figured I would mention it.

Because the `time.sleep` during the disconnect phase, the signal handling is thrown off and a new bot is spawned before the `bot.hasquit` flag can be checked. Add a check at the top of the loop to see if the `hasquit` flag was set during a disconnect. Fixes sopel-irc#1478

dgw · 2019-04-04T20:03:25Z

The broken cases described here are fixed in #1534 (and presumably in #1527 also, on the new "run" CLI, though I haven't specifically tested that yet).

As @HumorBaby and I pinned down on IRC though, fixing the handling of Ctrl-C during the "waiting to reconnect" state revealed another issue: The quit signal isn't processed until after the time.sleep(delay) period elapses, so Sopel appears to hang for several seconds.

Fixing that would require a fair bit of refactoring, so it's a separate issue. In fact, I'll just make one.

dgw · 2019-04-10T16:49:06Z

As #1534 was merged, I'll close this as fixed. #1527 remains in the 7.0.0 milestone, so it won't get lost.

Because the `time.sleep` during the disconnect phase, the signal handling is thrown off and a new bot is spawned before the `bot.hasquit` flag can be checked. Add a check at the top of the loop to see if the `hasquit` flag was set during a disconnect. Fixes sopel-irc#1478

Because of the `time.sleep` during the disconnect phase, the signal handling is thrown off and a new bot is spawned before the `bot.hasquit` flag can be checked. Add a check at the top of the loop to see if the `hasquit` flag was set during a disconnect. Fixes sopel-irc#1478. Reapplied because of changes from sopel-irc#1493.

dgw added Bug Low Priority Patches Welcome labels Feb 12, 2019

dgw added this to the 7.0.0 milestone Feb 12, 2019

HumorBaby mentioned this issue Mar 30, 2019

core: fix broken interrupt handling #1527

Merged

dgw removed the Patches Welcome label Apr 4, 2019

dgw modified the milestones: 7.0.0, 6.6.6 Apr 4, 2019

dgw closed this as completed Apr 10, 2019

dgw mentioned this issue Apr 12, 2019

jobs: make JobScheduler easier to stop #1557

Merged

dgw mentioned this issue Oct 9, 2019

Ctrl-C fails while loading #1705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken interrupt handling when connection is inactive #1478

Broken interrupt handling when connection is inactive #1478

dgw commented Feb 12, 2019

HumorBaby commented Mar 30, 2019

HumorBaby commented Mar 30, 2019 •

edited

Loading

HumorBaby commented Mar 30, 2019

dgw commented Apr 4, 2019

dgw commented Apr 10, 2019

Broken interrupt handling when connection is inactive #1478

Broken interrupt handling when connection is inactive #1478

Comments

dgw commented Feb 12, 2019

HumorBaby commented Mar 30, 2019

HumorBaby commented Mar 30, 2019 • edited Loading

HumorBaby commented Mar 30, 2019

dgw commented Apr 4, 2019

dgw commented Apr 10, 2019

HumorBaby commented Mar 30, 2019 •

edited

Loading