Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging framework can lead to deadlocks #751

Closed
jlaine opened this issue Feb 26, 2021 · 16 comments · Fixed by #1383
Closed

logging framework can lead to deadlocks #751

jlaine opened this issue Feb 26, 2021 · 16 comments · Fixed by #1383
Labels

Comments

@jlaine
Copy link
Member

jlaine commented Feb 26, 2021

Currently PyAV wires up FFmpeg's logging so that messages are sent to Python's traditional loggers instead of going straight to stderr.

One such example is simply trying to open a CodecContext for the h264_omx (hardware accelerated H264) codec on a RaspberryPi.

codec = av.CodecContext.create("h264_omx", "w")
codec.width = 640
codec.height = 480
codec.pix_fmt = "yuv420p"
codec.framerate = fractions.Fraction(30, 1)
codec.time_base = fractions.Fraction(1, 30)
codec.open()

This code will hang forever on the codec.open() call. Setting the environment variable PYAV_LOGGING=off completely solves the issue.

@mikeboers I am starting to wonder whether the logging support should be opt-in instead of opt-out, any thoughts on this?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 26, 2022
@jlaine
Copy link
Member Author

jlaine commented Mar 26, 2022

This is still an issue.

@Breakthrough
Copy link

Breakthrough commented Mar 27, 2022

+1 that this should be opt-in instead of opt-out by default. It would also be very useful to provide some functionality to suppress all log messages by passing a no-op callback rather than relying on log levels.

I'm not very familiar with Cython, however I see the logger callback is marked as nogil and uses with gil where required. It's not really clear how this can deadlock though, unless something is already holding the GIL when this function is called and never releases it, or the thread is destroyed/cancelled while holding the GIL (or another lock, which the code does hold - skip_lock - without any timeout).

Would it make sense to call PyGILState_Check here and manually call PyGILState_Ensure rather than using a with gil block? It's not clear to me from reading the Cython docs at least if using with gil is compatible with threads started outside of cython.parallel, nor does it explain if invoking the same callback multiple times from non-Python created threads is supported.

Another potential point of concern is if the thread is destroyed mid-way through callback execution - this could potentially poison the GIL state, or even the skip_lock used to prevent multiple log messages from being emitted. I'm not sure if that can be resolved without using the underlying threading library directly though, e.g. using pthread_cleanup_push. That seems like a bit of a rabbit hole though, so understandable if that's not feasible from Cython. And again, am not that familiar with the internals of this stuff, so please feel free to correct anything I said that might be incorrect.

@jlaine
Copy link
Member Author

jlaine commented Mar 27, 2022

You have some excellent question, but I'll be honest we're reaching the limits of my Cython knowledge. Below are some thoughts:

Log verbosity

  • FFmpeg's logging system defaults to AV_LOG_INFO
  • PyAV performs it's own filtering on messages it receives, with a default of AV_LOG_VERBOSE
  • PyAV's Python logger is set to logging.WARNING by default

I don't understand the value of filtering messages inside PyAV at all, it would seem a lot more efficient for av.logging.set_level to directly call av_log_set_level and let FFmpeg do the filtering itself so unwanted messages don't even reach PyAV. @mikeboers ?

EDIT: I was wrong. av_log_set_level only has an effect on FFmpeg's default logging callback, so once we install our own callback we need to filter ourselves.

We could also most likely default to AV_LOG_WARNING since that is in effect the only thing we send through to the Python logger by default (unless I'm mistaken..). Judging by our users' feedback if anything they want less logging not more.

Opt-in vs opt-out

I'd also be in favor of making the logging framework opt-in, but there is one very useful usecase: it's what enables us to raise sensible exceptions. If you have a close look the logging framework captures error messages, which can then be used by err_check.

Based on this my impression is that we probably do need a logging callback to capture errors, but for the rest we could probably stay out of the way and just hand off to FFmpeg's default callback.

@jlaine
Copy link
Member Author

jlaine commented Mar 27, 2022

Here's another finding. I tried to cut down the log_callback function in av/logging.pyx as much as possible to see what's causing the hang. The answer is that even this will cause a hang:

cdef void log_callback(void *ptr, int level, const char *format, lib.va_list args) nogil:
    with gil:
        pass

So in short there doesn't seem to be any safe way of running Python in the log callback.

My gut feeling at this stage would be:

  • make our log_callback pure Cython code which only stashes errors and hands off anything else to FFmpeg's default log handler
  • make set_level / get_level proxies to av_log_set_level / av_log_get_level
  • rip out all the rest

@Breakthrough
Copy link

Breakthrough commented Mar 29, 2022

Is there any way to repro this on x86 or does it have to be on an RPi? I had a look at the generated Cython code and it seems to generate an Ensure/Release pair as expected, and you can call it multiple times from the same thread, which makes this hang even more puzzling to be honest. I don't know if the driver source code is public or not, so there might be some threads created there, but interestingly, I don't see the codec itself creating any threads:
https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/omx.c

Are you able to see any log output from that when this happens? What happens if you check lib.Py_IsInitialized() first?

For the related issue I had where the interpreter hung on exit, Py_IsInitialized was sometimes returning 0 after the first time it was called indicating another thread had destroyed the interpreter at that point (or was in the process thereof).

@jlaine
Copy link
Member Author

jlaine commented Mar 29, 2022

Is there any way to repro this on x86 or does it have to be on an RPi?

I am reproducing this using the same code as in #909, on Debian, x86_64.

@Breakthrough
Copy link

Oh sorry I meant specifically in the case where it hangs on open rather than #909 which hangs only when things are being destroyed.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 28, 2022
@jlaine
Copy link
Member Author

jlaine commented Jul 28, 2022

Still very much an issue

@github-actions github-actions bot removed the stale label Jul 29, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@maiermic
Copy link

maiermic commented Jun 2, 2023

This is still an issue. It even causes a deadlock if logging is disabled, see aiortc/aiortc#734 (comment)

@hmaarrfk
Copy link
Contributor

shall this issue be reopened?

@jlaine jlaine removed the stale label Jan 14, 2024
@jlaine
Copy link
Member Author

jlaine commented Jan 14, 2024

Yes absolutely!

@jlaine jlaine reopened this Jan 14, 2024
@WyattBlue WyattBlue linked a pull request Apr 24, 2024 that will close this issue
@WyattBlue
Copy link
Member

I've fixed this in master now. The logging is now disabled by default (the callback is a no-op).

import av

# No logs whatso ever (no GIL calls)
av.logging.set_level(None)

# You can change the log level on the fly.
# Set to the old default (detailed error messages).
av.logging.set_level(av.logging.VERBOSE)

@WyattBlue
Copy link
Member

Here's a basic test

import av
print('Starting test.')
container = av.open('example.mp4')
container.streams.video[0].thread_type = 'AUTO'
# Have to decode at least one frame to trigger the bug.
next(container.decode(video=0))
print('Test complete.')

av 12.0.0 locks up 1/2 of the time
av 13.0.0rc1 always works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants