Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchdog timeout when camera thread is busy #888

Closed
jasaw opened this issue Jan 16, 2019 · 5 comments
Closed

Watchdog timeout when camera thread is busy #888

jasaw opened this issue Jan 16, 2019 · 5 comments

Comments

@jasaw
Copy link
Contributor

jasaw commented Jan 16, 2019

  1. Reviewed guide and contributing documents? (Yes/No): Yes
  2. version [x.y.z, hash, other]: 4.1.1
  3. installed as a package or compiled from sources [deb, rpm, git, other]: Compiled from source
  4. standalone or part of third party [motion, MotionEyeOS, other]: MotionEyeOS
  5. video stream source [V4L (card or USB), net cam (mjpeg, rtsp, other), mmal]: mmal and USB
  6. hardware [x86, ARM, other]: Raspberry Pi all variants
  7. operating system [16.04, Stretch, etc, FreeBSD, other]: MotionEyeOS
  8. video encoder: h264_omx
  9. motionEyeOS version: pre-release version 20181129
  10. rpi-firmware version : fbad6408c4596d3d671736ee0571aae444f24e68
  11. rpi-userland version : d574b51a60a075baefe863670466ee24e6c4256e

Brief:
The Raspberry Pi firmware/userland behaviour changed a little bit recently, causing motion to incorrectly trigger a watchdog timeout and killing the camera thread when camera thread is busy.

When motion is configured to push high resolution (1600x1200) at high frame rate (20fps) that chokes the GPU, motion keeps printing EAGAIN message until the watchdog eventually times out. (works perfectly when GPU not choked, e.g. lower resolution or fps)

[1:ml1] [DBG] [ENC] ffmpeg_encode_video: Receive packet threw EAGAIN returning -2 code :Resource temporarily unavailable
[1:ml1] [DBG] [ENC] ffmpeg_put_image: Buffered packet

While printing this message (before watchdog timeout) EAGAIN period, everything works fine, still recording video as expected. Also during this EAGAIN period, if workload is throttled back a little bit (e.g. motion gap period), it resets the watchdog timer and doesn't timeout.

Full log here:

[1:ml1] [INF] [EVT] event_new_video: Source FPS 2
[1:ml1] [NTC] [ENC] ffmpeg_set_codec: Using codec h264_omx
[1:ml1] [INF] [ENC] ffmpeg_set_quality: h264_omx codec vbr/crf/bit_rate: 2250000
[1:ml1] [NTC] [EVT] event_newfile: File of type 8 saved to: /data/output/TestCam/2019-01-16/12-00-31.mp4
[1:ml1] [NTC] [ALL] motion_detected: Motion detected - starting event 1
[1:ml1] [DBG] [ENC] ffmpeg_encode_video: Receive packet threw EAGAIN returning -2 code :Resource temporarily unavailable
[1:ml1] [DBG] [ENC] ffmpeg_put_image: Buffered packet
[1:ml1] [DBG] [ENC] ffmpeg_encode_video: Receive packet threw EAGAIN returning -2 code :Resource temporarily unavailable
[1:ml1] [DBG] [ENC] ffmpeg_put_image: Buffered packet
... Repeat of the same 2 messages
[1:ml1] [DBG] [ENC] ffmpeg_encode_video: Receive packet threw EAGAIN returning -2 code :Resource temporarily unavailable
[1:ml1] [DBG] [ENC] ffmpeg_put_image: Buffered packet
[1:ml1] [DBG] [ENC] ffmpeg_encode_video: Receive packet threw EAGAIN returning -2 code :Resource temporarily unavailable
[1:ml1] [DBG] [ENC] ffmpeg_put_image: Buffered packet
[1:ml1] [DBG] [ENC] ffmpeg_encode_video: Receive packet threw EAGAIN returning -2 code :Resource temporarily unavailable
[1:ml1] [DBG] [ENC] ffmpeg_put_image: Buffered packet
[0:motion] [ERR] [ALL] main: Thread 1 - Watchdog timeout, trying to do a graceful restart
[0:motion] [ERR] [ALL] main: Thread 1 - Watchdog timeout, did NOT restart graceful, killing it!
[0:motion] [ERR] [ALL] main: cleaning Thread 1

With the older rpi-firmware/userland version, motion prints out EAGAIN and buffered packet messages twice at the beginning of each video recording, and continues to work without watchdog timeout. I don't know what has changed in rpi-firmware/userland, and how motion watchdog detects a timeout. Can someone please shed a light on how motion watchdog works?

Note: Motion works fine with:
rpi-firmware version : ce8652e2c743f02f04cb29f23611cbf13765483b
rpi-userland version : a343dcad1dae4e93f4bfb99496697e207f91027e

@tosiara
Copy link
Member

tosiara commented Jan 16, 2019

How do you use h264_omx codec?

@jasaw
Copy link
Contributor Author

jasaw commented Jan 16, 2019

h264_omx encode by having motion call ffmpeg API, i.e. remove h264_omx from blacklist, then select h264_omx as the preferred encoder in the thread-1.conf file.
Also worth mentioning is zero_copy has been disabled in our MotionEyeOS build so it doesn't trigger OMX_EmptyThisBuffer hanging bug.

@Mr-Dave
Copy link
Member

Mr-Dave commented Jan 17, 2019

The main function enters a forever loop in which is spawns as needed the threads for each camera which then go into their own loops grabbing and processing images. In each pass of the main loop, it waits for one second and then calls a watchdog function which decrements the watchdog counter. Each count on watchdog variable is therefore a decrement of one second. The counter is reset to the starting point during each pass of the camera motion loop.

When it gets to the first predefined timeout, it sets the "finish" flag which is the best way to close it all down since it cleans everything up. If that works, then the thread will be restarted and the watchdog reset. If the thread doesn't respond to a "clean" shutdown the watchdog variable continues to count down to the kill timeout. If the thread still hasn't reset the counter then a kill request is sent not only to the camera thread but also to all of the other threads that it may have spawned.

If motion is timing out on the encoding, then it is hanging up trying to flush the buffer or a particular function is no longer returning a value. The camera motion loop calls an event and the event sends the image for encoding. A watchdog means it never returned from that sequence back to the motion loop to reset the watchdog.

I'd note that the buffer flushing situation could occur if a timelapse is requested. We get one image and we must write it out because the next image may not come for hours. So the same image keeps getting sent until the return code says it was not buffered.

References:
Loop in Main function checking all the threads for being alive
Watchdog function checking for whether to cancel or kill
Reset the counter in the motion loop.

@jasaw
Copy link
Contributor Author

jasaw commented Jan 17, 2019

This appears to be caused by ffmpeg omx code not handling multiple buffers as reported here: raspberrypi/firmware#1087
Sorry about the noise.

@jasaw jasaw closed this as completed Jan 17, 2019
@tosiara
Copy link
Member

tosiara commented Jan 17, 2019

Also omx-related: #433

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants