Fix executor shutdown #392

dkropachev · 2024-12-19T23:51:45Z

Usually Cluster.executor is getting shutdown after Cluster.scheduler, but in some cases executor is getting shutdown earlier, probably when Cluster.executor is manipulated directly or when process is getting killed and Cluster instance is still alive.

When it happens you will see this exception:

Traceback (most recent call last):
  File "lib/python3.13/threading.py", line 1041, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "cassandra/cluster.py", line 4429, in run
    future = self._executor.submit(fn, *args, **kwargs)
  File "lib/python3.13/concurrent/futures/thread.py", line 171, in submit
Error:   raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

This PR makes it handle this case gracefully not scheduling anything and not failing.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
~~I added relevant tests for new features and bug fixes~~.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
~~I have adjusted the documentation in ./docs/source/.~~
~~I added appropriate Fixes: annotations to PR description.~~

Avoid exception: Traceback (most recent call last): File "lib/python3.13/threading.py", line 1041, in _bootstrap_inner self.run() ~~~~~~~~^^ File "cassandra/cluster.py", line 4429, in run future = self._executor.submit(fn, *args, **kwargs) File "lib/python3.13/concurrent/futures/thread.py", line 171, in submit Error: raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown

Lorak-mmk · 2024-12-20T21:07:09Z

What is the task that is being scheduled? I know about this error, but I didn't fix it that way before because it is a wrong way in my opinion. We should first end all tasks / threads (or whatever are they called here), and only after they are stopped the scheduler can be stopped. Ignoring this error is not fixing the issue but just masking it.

I envision it could hurt us in the future - perhaps the queries that were not finished will be forever waiting? Or there will be something that schedules a task, and awaits its result, causing a hang? I'd like to avoid issues similar to cassandra-stress being stuck on shutdown.

dkropachev · 2024-12-21T19:43:13Z

What is the task that is being scheduled? I know about this error, but I didn't fix it that way before because it is a wrong way in my opinion. We should first end all tasks / threads (or whatever are they called here), and only after they are stopped the scheduler can be stopped. Ignoring this error is not fixing the issue but just masking it.

I envision it could hurt us in the future - perhaps the queries that were not finished will be forever waiting? Or there will be something that schedules a task, and awaits its result, causing a hang? I'd like to avoid issues similar to cassandra-stress being stuck on shutdown.

It is actually already done, here _Scheduler waits for thread to complete on shutdown:

python-driver/cassandra/cluster.py

Lines 4399 to 4407 in 63110b0

    
           def shutdown(self): 
        
               try: 
        
                   log.debug("Shutting down Cluster Scheduler") 
        
               except AttributeError: 
        
                   # this can happen on interpreter shutdown 
        
                   pass 
        
               self.is_shutdown = True 
        
               self._queue.put_nowait((0, 0, None)) 
        
               self.join()

And here it is Cluster shutdowns _Scheduler first and then shutdown executor:

python-driver/cassandra/cluster.py

Lines 1888 to 1895 in 63110b0

    
           self.scheduler.shutdown() 
        
           self.control_connection.shutdown() 
        
           for session in tuple(self.sessions): 
        
               session.shutdown() 
        
           self.executor.shutdown()

Which means that given issue can happen only when shutdown procedure does not go as it should.
Either when test or user manipulates with executor directly or when process is getting killed and Cluster instance is not shutdown properly.

Lorak-mmk · 2024-12-23T14:35:50Z

Wait, so the error can happen in one of 3 conditions:

There is some unknown bug in our code
Test does something weird
Process is killed

The third case is not important with regard to warnings, I think it is ok to print them.
In the second case, I think the test should be fixed so that it does not produce the warning, instead of silencing the warning globally.
Perhaps the test needs to do some weird stuff, but if that is the case maybe it can temporarily silence the warnings?

The first case is why the warning should not be ignored imo. If we fix second case, and don't kill the process, and still see the warning, then it indicates a bug which we should fix, right?

dkropachev · 2024-12-23T15:05:31Z

Wait, so the error can happen in one of 3 conditions:

There is some unknown bug in our code

Test does something weird

Process is killed

The third case is not important with regard to warnings, I think it is ok to print them. In the second case, I think the test should be fixed so that it does not produce the warning, instead of silencing the warning globally. Perhaps the test needs to do some weird stuff, but if that is the case maybe it can temporarily silence the warnings?

The first case is why the warning should not be ignored imo. If we fix second case, and don't kill the process, and still see the warning, then it indicates a bug which we should fix, right?

Correct, Let's do this:

Make it log exception and don't throw it
Fix test case that manipulates with executor and see if this error appears on the CICD.

WDYT ?

dkropachev requested review from fruch and Lorak-mmk December 19, 2024 23:51

dkropachev force-pushed the dk/fix-executor-shutdown branch from cd0361c to ad9ffaa Compare December 20, 2024 14:11

dkropachev marked this pull request as ready for review December 20, 2024 17:19

dkropachev requested a review from sylwiaszunejko December 20, 2024 17:19

dkropachev force-pushed the dk/fix-executor-shutdown branch from ad9ffaa to 63110b0 Compare December 20, 2024 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix executor shutdown #392

Fix executor shutdown #392

dkropachev commented Dec 19, 2024 •

edited

Loading

Lorak-mmk commented Dec 20, 2024

dkropachev commented Dec 21, 2024 •

edited

Loading

Lorak-mmk commented Dec 23, 2024

dkropachev commented Dec 23, 2024

Fix executor shutdown #392

Are you sure you want to change the base?

Fix executor shutdown #392

Conversation

dkropachev commented Dec 19, 2024 • edited Loading

Pre-review checklist

Lorak-mmk commented Dec 20, 2024

dkropachev commented Dec 21, 2024 • edited Loading

Lorak-mmk commented Dec 23, 2024

dkropachev commented Dec 23, 2024

dkropachev commented Dec 19, 2024 •

edited

Loading

dkropachev commented Dec 21, 2024 •

edited

Loading