[mac] Fix callback exceptions when the watcher is deleted but still receiving events #786

BoboTiG · 2021-05-06T13:48:22Z

The watcher might be cleared on thread stop but the C callback would still send events.
Let's ignore them to prevent unhandled exceptions later:

    AttributeError: 'NoneType' object has no attribute 'is_recursive'
      File "watchdog/observers/fsevents.py", line 299, in callback
        emitter.queue_events(emitter.timeout, events)
      File "watchdog/observers/fsevents.py", line 261, in queue_events
        self._queue_created_event(event, src_path, src_dirname)
      File "watchdog/observers/fsevents.py", line 124, in _queue_created_event
        self.queue_event(DirModifiedEvent(dirname))
      File "watchdog/observers/fsevents.py", line 97, in queue_event
        if self._watch.is_recursive :

Or even:

    AttributeError: 'NoneType' object has no attribute 'path'
      File "watchdog/observers/fsevents.py", line 299, in callback
        emitter.queue_events(emitter.timeout, events)
      File "watchdog/observers/fsevents.py", line 174, in queue_events
        src_path = self._encode_path(event.path)
      File "watchdog/observers/fsevents.py", line 323, in _encode_path
        if isinstance(self.watch.path, bytes):

Co-authored-by: Romain Grasland rgrasland@nuxeo.com

…eceiving events The watcher might be cleared on thread stop but the C callback would still send events. Let's ignore them to prevent unhandled exceptions later: AttributeError: 'NoneType' object has no attribute 'is_recursive' File "watchdog/observers/fsevents.py", line 299, in callback emitter.queue_events(emitter.timeout, events) File "watchdog/observers/fsevents.py", line 261, in queue_events self._queue_created_event(event, src_path, src_dirname) File "watchdog/observers/fsevents.py", line 124, in _queue_created_event self.queue_event(DirModifiedEvent(dirname)) File "watchdog/observers/fsevents.py", line 97, in queue_event if self._watch.is_recursive : Or even: AttributeError: 'NoneType' object has no attribute 'path' File "watchdog/observers/fsevents.py", line 299, in callback emitter.queue_events(emitter.timeout, events) File "watchdog/observers/fsevents.py", line 174, in queue_events src_path = self._encode_path(event.path) File "watchdog/observers/fsevents.py", line 323, in _encode_path if isinstance(self.watch.path, bytes): Co-authored-by: Romain Grasland <rgrasland@nuxeo.com>

BoboTiG · 2021-05-06T13:48:45Z

cc @samschott and @CCP-Aporia.

samschott · 2021-05-06T15:09:05Z

Ouch, what a nasty exception.

Just a few questions to understand your changes:

Does the use of the lock in on_thread_stop already fix the race condition by itself? Or do we still need the test for self.watch in the callback?
Do we really want to drop those last events? Technically, since they where emitted while the watch was still running, they should still be processed. But this is just nitpicking...

In any case, moving the callback out of that inline definition is an improvement in the first place :)

BoboTiG · 2021-05-06T15:29:13Z

It is part of the fix, yes. Both the lock and the check are needed :)
Actually, they cannot be processed: the watcher is deleted and queue_events() needs several attributes from self.watch, which no more exists. Even queue_event() needs attributes from self.watch.

I would say pending events are lost in any cases, that's just cleaner now. As the code processing events heavily requires self.watch attributes, it is not worth adding checks everywhere to prevent issues. WDYT?

samschott · 2021-05-06T16:32:29Z

Those are fair points. It's just that the explicit check for self.watch somehow strikes me as a roundabout way of fixing this. I can't quite put my finger on why though.

Maybe I just find it odd that the supplied callback, wrapped in extension module as watchdog_FSEventStreamCallback, is still invoked after FSEventStreamStop was called, i.e., after:

watchdog/src/watchdog/observers/fsevents.py

Line 90 in 8ebb408

_fsevents.remove_watch(self.watch)

Similarly, the callback acquires the GIL so calling _fsevents.remove_watch() should block until the callback has finished. So where does the race come from? I could be missing something obvious here. And maybe @CCP-Aporia knows more about this.

I certainly can't argue with the issue that you are seeing or with the success of the fix. So I'm happy with the changes!

BoboTiG · 2021-05-06T16:49:44Z

My understanding (and it may be wrong) is that the issue happens in the callback, more specifically here:

watchdog/src/watchdog/observers/fsevents.py

Lines 295 to 299 in 8ebb408

    
           events = [ 
        
               _fsevents.NativeEvent(path, inode, event_flags, event_id) 
        
               for path, inode, event_flags, event_id in zip(paths, inodes, flags, ids) 
        
           ] 
        
           emitter.queue_events(emitter.timeout, events)

The watcher is stopped while iterating and before sending events to queue_events(). And so the self._watch attribute is set to None. That's a tricky one for sure, and it indeed fixes reported issues. Maybe a better fix is possible, that's why I pinged @CCP-Aporia and you :)

CCP-Aporia · 2021-05-07T08:50:02Z

I'm in agreement with @samschott that the explicit self.watch checks look like a work around for an underlying deficiency. Let me take a look at the C side of things; it's very possible that the issue is in there.

BoboTiG · 2021-05-07T09:11:11Z

Of course, go ahead :)
Both tests are showing the issue, it will be easier to have a look then.

CCP-Aporia · 2021-05-07T11:54:26Z

These tests were great, indeed. Thank you for those! :-) I've fixed the issue in my fork and opened a PR to update your branch here, see #788

Co-authored-by: Thomas <thomas@ccpgames.com>

BoboTiG · 2021-05-07T12:06:58Z

Thanks a lot @CCP-Aporia 🍾
I will merge that PR and correctly set the author, so you do not loose the credits ;)

If the review is OK for you both, let's merge when the CI will be green.

samschott

Thank you @CCP-Aporia, this looks much cleaner :)

So, just to understand why those changes work:

The workaround with setting self._watch = None was added to prevent issues when on_thread_stop is called twice because the C extension did not properly handle the case of watchdog_remove_watch being called twice.
This workaround led to queue_events raising errors when self._watch is None.
It is still possible that queue_events is called while the emitter is being stopped but this no longer raises "unhandled" exceptions.

samschott · 2021-05-07T12:36:49Z

src/watchdog_fsevents.c

+    PyObject *streamref_capsule = PyDict_GetItem(watch_to_stream, watch);
+    if (!streamref_capsule) {
+        // A watch might have been removed explicitly before, in which case we can simply early out.
+        Py_RETURN_NONE;


When exactly does this case occur?

As it says in the comment - if someone explicitly calls remove_watch twice for the same watch then PyDict_GetItem won't find it again. Not checking for the nullptr return here was the main reason for the work-arounds in Python code. 🙂
Effectively, streamref_capsule being a nullptr caused the assertion failures for the three fsevents related API calls that follow, as well as the error message about returning a value with an exception set.

BoboTiG · 2021-05-07T13:00:22Z

You are right about the 3rd statement @samschott. Without testing now, I guess it simply works because self._watch is not reset to None. I guess remaining events are then sent with success (it was one of your consideration).

I do not have a strong opinion if we should process those events or not, I am just thinking loud.

BoboTiG · 2021-05-07T13:05:00Z

To ease working on it, I could split the PR: a smaller one could be merged ASAP to move out the callback from the inline function. Then that PR will only contains tests + C-fix to ease thinking about a potential fix.

samschott · 2021-05-08T21:11:35Z

I do not have a strong opinion if we should process those events or not, I am just thinking loud.

I don't really have a strong opinion either. My intuition says that those events should be processed since they are triggered before the emitter is stopped. There also should not be any consequences with unexpected events being dispatched to the handlers after stopping and joining the observer thread. So it's really up to you!

To ease working on it, I could split the PR: a smaller one could be merged ASAP to move out the callback from the inline function.

Personally I am quite happy with the current fix. Is there any more work to do that's worth splitting off? You can of course split things off to have a cleaner commit history.

CCP-Aporia · 2021-05-10T08:53:26Z

@samschott - your understanding of the fix is correct. 🙂

@BoboTiG - indeed, resetting self._watch to None was the whole work-around which is now completely unnecessary. That kind of reset only introduced edge cases which can be avoid with the fix in the C extension.

BoboTiG · 2021-05-10T12:31:04Z

Thank you both, let's do a new release with that fix :)

BoboTiG added bug fsevents labels May 6, 2021

[skip ci] Add docstring + fix comment

6dde6ed

BoboTiG and others added 3 commits May 7, 2021 14:01

Patch from #788

8c4629c

Revert some changes

3007b17

Ad the good contrinutor

b2539a1

Co-authored-by: Thomas <thomas@ccpgames.com>

[skip ci] Minor perf improvement

3e73f56

BoboTiG closed this May 7, 2021

BoboTiG reopened this May 7, 2021

samschott reviewed May 7, 2021

View reviewed changes

BoboTiG merged commit a70b350 into master May 10, 2021

BoboTiG deleted the fix-fsevent-callback-no-watcher branch May 10, 2021 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mac] Fix callback exceptions when the watcher is deleted but still receiving events #786

[mac] Fix callback exceptions when the watcher is deleted but still receiving events #786

BoboTiG commented May 6, 2021

BoboTiG commented May 6, 2021

samschott commented May 6, 2021

BoboTiG commented May 6, 2021

samschott commented May 6, 2021

BoboTiG commented May 6, 2021 •

edited

Loading

CCP-Aporia commented May 7, 2021

BoboTiG commented May 7, 2021

CCP-Aporia commented May 7, 2021

BoboTiG commented May 7, 2021 •

edited

Loading

samschott left a comment •

edited

Loading

samschott May 7, 2021

CCP-Aporia May 10, 2021

BoboTiG commented May 7, 2021

BoboTiG commented May 7, 2021

samschott commented May 8, 2021 •

edited

Loading

CCP-Aporia commented May 10, 2021

BoboTiG commented May 10, 2021 •

edited

Loading

[mac] Fix callback exceptions when the watcher is deleted but still receiving events #786

[mac] Fix callback exceptions when the watcher is deleted but still receiving events #786

Conversation

BoboTiG commented May 6, 2021

BoboTiG commented May 6, 2021

samschott commented May 6, 2021

BoboTiG commented May 6, 2021

samschott commented May 6, 2021

BoboTiG commented May 6, 2021 • edited Loading

CCP-Aporia commented May 7, 2021

BoboTiG commented May 7, 2021

CCP-Aporia commented May 7, 2021

BoboTiG commented May 7, 2021 • edited Loading

samschott left a comment • edited Loading

Choose a reason for hiding this comment

samschott May 7, 2021

Choose a reason for hiding this comment

CCP-Aporia May 10, 2021

Choose a reason for hiding this comment

BoboTiG commented May 7, 2021

BoboTiG commented May 7, 2021

samschott commented May 8, 2021 • edited Loading

CCP-Aporia commented May 10, 2021

BoboTiG commented May 10, 2021 • edited Loading

BoboTiG commented May 6, 2021 •

edited

Loading

BoboTiG commented May 7, 2021 •

edited

Loading

samschott left a comment •

edited

Loading

samschott commented May 8, 2021 •

edited

Loading

BoboTiG commented May 10, 2021 •

edited

Loading