SEGFAULT on macOS with the latest version of watchdog #762

bstaletic · 2021-02-11T18:07:17Z

Unfortunately, this will be tough to describe, since I don't have a mac and only my CI is complaining.

I'm using watchdog as a submodule and today tried to update it. One specific test consistently makes pytest segfault.

This is the log from a SEGFAULTING run: https://dev.azure.com/borisstaletic/3ce92110-caa5-4c49-b8c3-44a433da676b/_apis/build/builds/1535/logs/64
The way I'm using watchdog is by talking to, in this case, jdt.ls. When I open this file it sends... these patterns:

[
  {
    "globPattern": "**/*.java"
  },
  {
    "globPattern": "**/.project"
  },
  {
    "globPattern": "**/.classpath"
  },
  {
    "globPattern": "**/.settings/*.prefs"
  },
  {
    "globPattern": "**/src/**"
  },
  {
    "globPattern": "**/*.gradle"
  },
  {
    "globPattern": "**/gradle.properties"
  },
  {
    "globPattern": "**/pom.xml"
  },
  {
    "globPattern": "/home/bstaletic/work/ycmd/ycmd/tests/java/testdata/simple_eclipse_project",
    "kind": 4
  }
]

Which we then edit, like this: https://github.com/ycm-core/ycmd/blob/master/ycmd/completers/language_server/language_server_completer.py#L590-L591

And finally use that with this handler: https://github.com/ycm-core/ycmd/blob/master/ycmd/completers/language_server/language_server_completer.py#L3219

Any further information, as well as running the CI, I can gladly provide.

BoboTiG · 2021-02-12T09:18:48Z

Thanks for the report.

For the history, you are talking about the release v2.0.0.

Now, that will be a though one to debug 🤔 Do you think you can reduce the code to ease reproducting the issue? I guess this is releated to some C code that was modified in FSEvents. Maybe @samschott or @CCP-Aporia could have a look at changes introduced in v2.0.0 to see if there is a missing refcount change or something like that?

bstaletic · 2021-02-12T09:45:44Z

I do have an idea how to reduce the code. What I think would repro:

Have directory structure as the simple_eclipse_project
Prepare watchdog with hardcoded patterns that jdt.ls would use.
Start jdt.ls, so that it triggers some events.
Wait.
Hopefully segfault.

I'll try that later today.

samschott · 2021-02-12T10:44:45Z

@bstaletic, could you possibly get the crash report from the CI runner? It should be the latest .crash file in ~/Library/Logs/DiagnosticReports/.

puremourning · 2021-02-12T11:05:54Z

I repro'd this locally. this is what crash reporter says: http://vpaste.net/XouKt

Python_2021-02-12-104711_BeniMac2020.txt

Python_2021-02-12-105901_BeniMac2020.txt

puremourning · 2021-02-12T11:07:07Z

BTW the reproduction occurs when we add raise RuntimeError( 'test' ) in a thread that's handling IO events from a subprocess stdout. not sure if that helps explain it.

samschott · 2021-02-12T12:04:40Z

Thanks @puremourning, that's very helpful! Could you also provide a minimum example for us to reproduce the crash?

If I am reading the report correctly, the segfault happens because NULL gets passed to CFString_AsPyUnicode here:

watchdog/src/watchdog_fsevents.c

Line 407 in 4bddc32

path = CFString_AsPyUnicode(cf_path);

@CCP-Aporia, would you agree?

Now from the FSEvents docs, I would not have expected the event path to ever be null. But maybe the actual issue lies earlier in the code at:

watchdog/src/watchdog_fsevents.c

Lines 388 to 397 in 4bddc32

    
           py_event_paths = PyList_New(num_events); 
        
           py_event_inodes = PyList_New(num_events); 
        
           py_event_flags = PyList_New(num_events); 
        
           py_event_ids = PyList_New(num_events); 
        
           if (G_NOT(event_path_info_array_ref && py_event_flags && py_event_ids)) 
        
           { 
        
               Py_XDECREF(py_event_ids); 
        
               Py_XDECREF(py_event_flags); 
        
               return /*NULL*/; 
        
           }

This check looks a bit odd to me. Shouldn't it be something like the following instead:

    /* Convert event flags and paths to Python ints and strings. */
    py_event_paths = PyList_New(num_events);
    py_event_inodes = PyList_New(num_events);
    py_event_flags = PyList_New(num_events);
    py_event_ids = PyList_New(num_events);
    if (G_NOT(py_event_paths && py_event_inodes && py_event_flags && py_event_ids))
    {
        Py_XDECREF(py_event_paths);
        Py_XDECREF(py_event_inodes);
        Py_XDECREF(py_event_flags);
        Py_XDECREF(py_event_ids);
        return /*NULL*/;
    }

CCP-Aporia · 2021-02-12T12:48:58Z

Hey, that's a great a great bug report - thanks a lot for all the details!

@samschott - I agree that the crash likely happens because of cf_path being a nullptr. CFDictionaryGetValue may return NULL "if no key-value pair matching key exists". Presumably there is no kFSEventStreamEventExtendedDataPathKey item available for the given path info, and we need to cater for that scenario.

About the if (G_NOT( block ... yeah, that also looks like it's not covering everything in its condition and is not freeing up everything. But at the same time, if PyList_New fails then we have bigger problems anyway; in general it only fails when we're out of memory on a machine.

samschott · 2021-02-12T13:06:15Z

@CCP-Aporia, thanks for the explanation. I was wondering which scenario the G_NOT check was guarding against.

Presumably there is no kFSEventStreamEventExtendedDataPathKey item available for the given path info, and we need to cater for that scenario.

It appears like that. I am still confused as to why this would happen and how this is related to raising a RuntimeError in a different thread which handles the stdout from a subprocess.

But if there is a scenario where cf_path can be NULL, we clearly need to accommodate that.

puremourning · 2021-02-12T13:44:18Z

It appears like that. I am still confused as to why this would happen and how this is related to raising a RuntimeError in a different thread which handles the stdout from a subprocess.

Turns out that it's probably not.

The thread that is throwing the exception is actually the same thread that created the observer objects. In this case the throwing of the exception causes us to destroy the observers:

        observer = Observer()
        observer.schedule( self._watchdog_factory( globs ),
                           self._project_directory,
                           recursive = True )
        observer.start()
        self._observers.append( observer )

later (triggered by the exception)...

    for observer in self._observers:
      observer.stop()
      observer.join()

puremourning · 2021-02-16T12:08:07Z

TL;DR - I think the crash happens when there are multiple events in the callback, but I don't know why.

I am able to reproduce this with a debug version of watchdog and a debugger attached.

It seems that it crashes every time it enters this code:

    if (G_IS_NULL(c_string_ptr)) {
        cf_string_length = CFStringGetLength(cf_string_ref);
        CFStringGetCString(cf_string_ref, c_string_buff, cf_string_length, 0);
        py_string = PyUnicode_FromString(c_string_buff);
    } else {

Neither cf_path nor cf_inode are NULL:

Watches: ----
Expression: i
  - Result: 1
Expression: path_info_dict
  - Result: 0x00007ff8815e6f70
    + [0] (__lldb_autogen_nspair): {key:0x00007fff88f72fb8, value:0x00007ff8815e0ef0}
    + [1] (__lldb_autogen_nspair): {key:0x00007fff88f72fd8, value:0x428cef0e47b0896d}
    + [raw] (): CFDictionaryRef
Expression: cf_path
  - Result: 0x00007ff8815e0ef0
Expression: cf_inode
  - Result: 0x428cef0e47b0896d
Expression: cf_string_length
 *- Result: 156
Expression: num_events
 *- Result: 17

If it helps, these are the flags for the one that causes it to break:

Expression: event_flags[i]
 *- Result: 66048

The only useful other thing that I notice is that it crashes consistently when i > 0 (actually always when i == 1). Perhaps this is useful, I couldn't see anything specifically in the code that meant this was obviously broken for event lists with more than 1 entry, but you guys would have a better idea than me.

samschott · 2021-02-16T14:59:46Z

@puremourning, this is very helpful, thanks! It also explains why the segfault didn't show up in our testing: the code block which you show is only called as a backup when CFStringGetCStringPtr returns NULL. I suspect that the string length is incorrectly calculated there.

Could you try applying this patch and see if it fixes the problem:

diff --git a/src/watchdog_fsevents.c b/src/watchdog_fsevents.c
index e9f4e6aced800936af2778cc046fa6382dceae13..92f3ed0c7e067bda0f362cf022d7af2f99c5f69e 100644
--- a/src/watchdog_fsevents.c
+++ b/src/watchdog_fsevents.c
@@ -293,17 +293,27 @@ static void watchdog_pycapsule_destructor(PyObject *ptr)
  */
 PyObject * CFString_AsPyUnicode(CFStringRef cf_string_ref)
 {
-    CFIndex cf_string_length;
-    char *c_string_buff = NULL;
+
+    if (G_IS_NULL(cf_string_ref)) {
+        return PyUnicode_FromString("");
+    }
+
     const char *c_string_ptr;
     PyObject *py_string;
 
-    c_string_ptr = CFStringGetCStringPtr(cf_string_ref, 0);
+    c_string_ptr = CFStringGetCStringPtr(cf_string_ref, kCFStringEncodingUTF8);
 
     if (G_IS_NULL(c_string_ptr)) {
-        cf_string_length = CFStringGetLength(cf_string_ref);
-        CFStringGetCString(cf_string_ref, c_string_buff, cf_string_length, 0);
-        py_string = PyUnicode_FromString(c_string_buff);
+        CFIndex length = CFStringGetLength(cf_string_ref);
+        CFIndex max_length = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8) + 1;
+        char *buffer = (char *)malloc(max_length);
+        if (CFStringGetCString(cf_string_ref, buffer, max_length, kCFStringEncodingUTF8)) {
+            py_string = PyUnicode_FromString(buffer);
+        }
+        else {
+            py_string = PyUnicode_FromString("");
+        }
+        free(buffer);
     } else {
         py_string = PyUnicode_FromString(c_string_ptr);
     }

puremourning · 2021-02-16T16:06:56Z

applying that patch does indeed resolve the crash for the test case ! thanks.

samschott · 2021-02-16T18:08:58Z

Great! I'll submit a PR with the changes.

It would still be good to have an example which we can test against, if only to prevent future regressions. But this might be difficult.

Do you know which file path caused the segfault? It will likely fulfil two criteria:

The number of bytes to hold the encoded string exceeds the number of unicode characters.
CFStringGetCStringPtr returns NULL.

The second criterion is pretty vague. According to the documentation for CFStringGetCStringPtr:

Whether or not this function returns a valid pointer or NULL depends on many factors, all of which depend on how the string was created and its properties. In addition, the function result might change between different releases and on different platforms. So do not count on receiving a non-NULL result from this function under any circumstances.

bstaletic · 2021-02-16T18:29:18Z

The number of bytes to hold the encoded string exceeds the number of unicode characters.

If I'm not mistaken, the watched directory is this one. When our tests are run, jdt.ls creates a directory in there named target in which there's only one file with unicode characters.

simple_eclipse_project/target/classes/com/youcompleteme/Test$TéstClass.class

That thing is the compiled version of this class.

samschott · 2021-02-16T19:18:18Z

Yes, that is it. I can reproduce the segfault with that path. Many thanks for your help in debugging this :)

…762) * fix converting CFStringRef to PyUnicode * fix handling when creation of PyList fails * added test case for string conversion * declare c_string_ptr in-line * remove trailing slash from tmpdir * use decomposed e-acute in filename for tests this prevents unicode normalisation on HFS+

BoboTiG · 2021-02-17T14:43:48Z

Thanks everyone for quick analysis and resolution, I love OSS for that kind of actions. ❤️ 🍾

I will release v2.0.1 with the patch.

BoboTiG · 2021-02-17T16:00:09Z

Release v2.0.1 online.

BoboTiG added bug fsevents-c Help welcome :) labels Feb 11, 2021

samschott mentioned this issue Feb 16, 2021

Fix fsevents segfault #763

Merged

BoboTiG closed this as completed in #763 Feb 17, 2021

mrclary mentioned this issue Feb 18, 2021

SystemError when switching projects spyder-ide/spyder#14779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEGFAULT on macOS with the latest version of watchdog #762

SEGFAULT on macOS with the latest version of watchdog #762

bstaletic commented Feb 11, 2021

BoboTiG commented Feb 12, 2021 •

edited

Loading

bstaletic commented Feb 12, 2021

samschott commented Feb 12, 2021

puremourning commented Feb 12, 2021

puremourning commented Feb 12, 2021

samschott commented Feb 12, 2021 •

edited by BoboTiG

Loading

CCP-Aporia commented Feb 12, 2021

samschott commented Feb 12, 2021

puremourning commented Feb 12, 2021

puremourning commented Feb 16, 2021

samschott commented Feb 16, 2021 •

edited

Loading

puremourning commented Feb 16, 2021

samschott commented Feb 16, 2021 •

edited

Loading

bstaletic commented Feb 16, 2021

samschott commented Feb 16, 2021 •

edited

Loading

BoboTiG commented Feb 17, 2021 •

edited

Loading

BoboTiG commented Feb 17, 2021

SEGFAULT on macOS with the latest version of watchdog #762

SEGFAULT on macOS with the latest version of watchdog #762

Comments

bstaletic commented Feb 11, 2021

BoboTiG commented Feb 12, 2021 • edited Loading

bstaletic commented Feb 12, 2021

samschott commented Feb 12, 2021

puremourning commented Feb 12, 2021

puremourning commented Feb 12, 2021

samschott commented Feb 12, 2021 • edited by BoboTiG Loading

CCP-Aporia commented Feb 12, 2021

samschott commented Feb 12, 2021

puremourning commented Feb 12, 2021

puremourning commented Feb 16, 2021

samschott commented Feb 16, 2021 • edited Loading

puremourning commented Feb 16, 2021

samschott commented Feb 16, 2021 • edited Loading

bstaletic commented Feb 16, 2021

samschott commented Feb 16, 2021 • edited Loading

BoboTiG commented Feb 17, 2021 • edited Loading

BoboTiG commented Feb 17, 2021

BoboTiG commented Feb 12, 2021 •

edited

Loading

samschott commented Feb 12, 2021 •

edited by BoboTiG

Loading

samschott commented Feb 16, 2021 •

edited

Loading

samschott commented Feb 16, 2021 •

edited

Loading

samschott commented Feb 16, 2021 •

edited

Loading

BoboTiG commented Feb 17, 2021 •

edited

Loading