Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows error 10022 kills threads #375

Open
1 of 3 tasks
AdamaTheHutt opened this issue Jul 14, 2021 · 7 comments
Open
1 of 3 tasks

Windows error 10022 kills threads #375

AdamaTheHutt opened this issue Jul 14, 2021 · 7 comments
Labels
bug Something is broken reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR

Comments

@AdamaTheHutt
Copy link

AdamaTheHutt commented Jul 14, 2021

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?
Sometimes when closing a connection, we get an error, probably because it's already closed or similar. This is addressed in 384e6d0 , the problem is that in windows the error might instead be WSAEINVAL (10022), and that is not caught.

❓ What is the motivation / use case for changing the behavior?
Because of this, in combination with, I think, the error seen in #358 , we are eventually running out of threads, and the server stops responding completely.

πŸ’‘ To Reproduce
Don't know how to reproduce, we get this error sporadically in production. All threads are typically gone in a few days or sometimes weeks.

πŸ’‘ Expected behavior
I expect this error to be ignored, so that the thread keeps on running.

πŸ“‹ Details
Also tried switching to pyopenssl, but the same error happened there too.

πŸ“‹ Environment

  • Cheroot version: 8.5.2
  • CherryPy version: -
  • Python version: 3.8.9
  • OS: Microsoft Windows Server 2016 Datacenter, 10.0.14393
  • Browser: [?]

πŸ“‹ Additional context
A typical error:

 Exception in thread CP Server Thread-18: 
Traceback (most recent call last): 
 Β File "c:\py\lib\threading.py", line 917, in _bootstrap_inner 
 Β Β Β self.run() 
 Β File "c:\py\lib\site-packages\cheroot\workers\threadpool.py", line 130, in run 
 Β Β Β conn.close() 
 Β File "c:\py\lib\site-packages\cheroot\server.py", line 1360, in close 
 Β Β Β self._close_kernel_socket() 
 Β File "c:\py\lib\site-packages\cheroot\server.py", line 1483, in _close_kernel_socket 
 Β Β Β shutdown(socket.SHUT_RDWR) Β # actually send a TCP FIN 
 Β File "c:\py\lib\ssl.py", line 1090, in shutdown 
 Β Β Β super().shutdown(how) 
OSError: [WinError 10022] An invalid argument was supplied

Then similar exceptions for thread after thread, until none are left.

In the references added in 9f9de65 there are several examples of dealing with error 10022 also, for example python/cpython@83a2c28 .

@AdamaTheHutt AdamaTheHutt added bug Something is broken triage labels Jul 14, 2021
@webknjaz
Copy link
Member

webknjaz commented Oct 3, 2022

Looks like this error code is sometimes exposed as EINVAL or WSAEINVAL:

So it seems like maybe it should be covered already. At least in some envs. But somebody needs to check a reproducer under a relevant env. One way to do this is to connect to a GHA worker using https://github.com/marketplace/actions/debugging-with-tmate.

@webknjaz webknjaz added reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR and removed triage labels Oct 3, 2022
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 12, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 13, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 13, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 21, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 22, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 23, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 23, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 27, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit that referenced this issue Mar 31, 2024
A DoS would happen in many situations, including TLS errors and
attempts to close the underlying sockets erroring out.

This patch aims to prevent a situation when the worker threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all.

PR #649

Fixes #358
Fixes #354

Ref #310
Ref #346
Ref #375
Ref #599
Ref #641

Resolves #365
@AzureIP
Copy link

AzureIP commented Sep 17, 2024

Hi,
I experienced this error, too. What is the current state?

@liquidaty
Copy link

Same. It is difficult to reproduce but fatal

@webknjaz
Copy link
Member

@liquidaty I thought #649 made it better. No?

@liquidaty
Copy link

@liquidaty I thought #649 made it better. No?

I did too, but under certain conditions it appears that threads are still either dying or becoming unresponsive. Might be related to line 1291 of cheroot/server.py:

            req.respond() # <== here
            if not req.close_connection:
                return True

don't suppose there is an easy way to spit out, at regular intervals, the number of active and healthy threads?

@liquidaty
Copy link

liquidaty commented Jan 15, 2025

*** UPDATED ***

@liquidaty I thought #649 made it better. No?

@webknjaz here's an example of an actual error log:

[14/Jan/2025:15:08:07] ENGINE A fatal exception happened. Setting the server interrupt flag to OSError(22, 'An invalid argument was supplied', None, 10022, None) and giving up.

Please, report this on the Cheroot tracker at <https://github.com/cherrypy/cheroot/issues/new/choose>, providing a full reproducer with as much context and details as possible.
Traceback (most recent call last):
  File "cheroot\workers\threadpool.py", line 119, in run
    self._process_connections_until_interrupted()
  File "cheroot\workers\threadpool.py", line 216, in _process_connections_until_interrupted
    conn.close()
  File "cheroot\server.py", line 1364, in close
    self._close_kernel_socket()
  File "cheroot\server.py", line 1484, in _close_kernel_socket
    shutdown(socket.SHUT_RDWR)  # actually send a TCP FIN
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ssl.py", line 1353, in shutdown
OSError: [WinError 10022] An invalid argument was supplied

Maybe this would fix, in errors.py:

acceptable_sock_shutdown_error_codes = {
    errno.ENOTCONN,
    10022, # <== add this?
    errno.EPIPE, errno.ESHUTDOWN,  # corresponds to BrokenPipeError in Python 3
    errno.ECONNRESET,  # corresponds to ConnectionResetError in Python 3
}
"""Errors that may happen during the connection close sequence.

and in server.py:

        except OSError as e:
            if e.errno == 10022:
                self.server.error_log('Ignoring OSError 10022')
            else:
                raise

@webknjaz
Copy link
Member

Thanks for the traceback! I'm not sure yet. However, I seem to recall having some unfinished improvement attempts in this area somewhere on my laptop. I'll try to remember and see if those might be related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

No branches or pull requests

4 participants