Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gstream omxh264enc causes a stall and locks the raspberry pi 4 UI #1426

Closed
46cv8 opened this issue Jul 15, 2020 · 10 comments
Closed

gstream omxh264enc causes a stall and locks the raspberry pi 4 UI #1426

46cv8 opened this issue Jul 15, 2020 · 10 comments

Comments

@46cv8
Copy link

46cv8 commented Jul 15, 2020

I've tested the following steps on a raspberry pi 3b+ and 4. On the 3b+ everything works fine however on the 4 it crashes after 1-2 minutes. I can trigger the crash faster by unplugging my keyboard, mouse and monitor and re-plugging them back in, but it isn't necessary to cause the issue.

I am hoping that this may be related to the issue fixed for ffmpeg over a year ago #1087
The patch for the change that was required to ffmpeg is linked below, but I can't be certain it's the same issue.
https://trac.ffmpeg.org/ticket/7687
https://trac.ffmpeg.org/attachment/ticket/7687/0001-avcodec-omx-Fix-handling-of-fragmented-buffers.patch

The setup is to install the latest version of raspberry pi os on a new SD card, insert it in a raspberry pi 4 with a logitech c525 webcam attached, and then run the following commands.

sudo apt-get update
sudo apt-get upgrade
sudo rpi-update
<reboot>

sudo apt-get install gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav

gst-launch-1.0 v4l2src device=/dev/video0 do-timestamp=1 ! video/x-raw,width=800,height=448,framerate=30/1 ! videoconvert ! omxh264enc control-rate=1 target-bitrate=800000 ! h264parse config-interval=1 ! rtph264pay ! udpsink host=234.5.5.5 port=5000 async=false 

The result is that stream goes out fine for a minute or two (one can confirm this by connecting via VLC), but then after 1-2 minutes the raspberry pi UI locks up. It becomes almost impossible to close the program as you can only interact with the UI every few seconds for afew milliseconds. The CPU is not pegged as confirmed by having top running in a parallel window which shows gst-launch-1.0 CPU usage at 0% to 1%. If you do manage to kill gst-launch-1.0 the system will return to normal, but the error will occur much more quickly the second time you attempt to start gst-launch-1.0.

The log file for gst-launch-1.0 when obtained after running "export GST_DEBUG=5" shows the following at the point where the UI was locked.

0:01:15.548700487  1654   0xccbaf0 DEBUG               basesink gstbasesink.c:3577:gst_base_sink_chain_unlocked:<udpsink0> object unref after render 0xb4521a70
0:01:15.548738729  1654   0xccbaf0 DEBUG         GST_SCHEDULING gstpad.c:4341:gst_pad_chain_data_unchecked:<udpsink0:sink> called chainlistfunction &gst_base_sink_chain_list, returned ok
0:01:15.548768044  1654   0xccbaf0 DEBUG             GST_MEMORY gstmemory.c:88:_gst_memory_free: free memory 0xb091f1e0
0:01:15.548803581  1654   0xccbaf0 DEBUG         GST_SCHEDULING gstpad.c:4326:gst_pad_chain_data_unchecked:<rtph264pay0:sink> called chainfunction &0xb5877fe8 with buffer 0xb452bbc8, returned ok
0:01:15.548839433  1654   0xccbaf0 DEBUG              baseparse gstbaseparse.c:3198:gst_base_parse_chain:<h264parse0> not enough data available (only 0 bytes)
0:01:15.548883897  1654   0xccbaf0 DEBUG         GST_SCHEDULING gstpad.c:4326:gst_pad_chain_data_unchecked:<h264parse0:sink> called chainfunction &gst_base_parse_chain with buffer 0xb451dd20, returned ok
0:01:15.548942416  1654   0xccbaf0 DEBUG            omxvideoenc gstomxvideoenc.c:1521:gst_omx_video_enc_loop:<omxh264enc-omxh264enc0> Finished frame: ok
0:01:15.548994528  1654   0xccbaf0 DEBUG                    omx gstomx.c:1519:gst_omx_port_release_buffer:<omxh264enc-omxh264enc0> Releasing buffer 0xb092a7f8 (0xb450bef0) to video_encode port 201
0:01:15.549189937  1654   0xccbaf0 DEBUG                    omx gstomx.c:1563:gst_omx_port_release_buffer:<omxh264enc-omxh264enc0> Released buffer 0xb092a7f8 to video_encode port 201: None (0x00000000)
0:01:15.549212160  1654   0xccbaf0 DEBUG            omxvideoenc gstomxvideoenc.c:1529:gst_omx_video_enc_loop:<omxh264enc-omxh264enc0> Read frame from component
0:01:15.549238345  1654   0xccbaf0 DEBUG                    omx gstomx.c:1361:gst_omx_port_acquire_buffer:<omxh264enc-omxh264enc0> Acquiring video_encode buffer from port 201
0:01:15.549261179  1654   0xccbaf0 DEBUG                    omx gstomx.c:1474:gst_omx_port_acquire_buffer:<omxh264enc-omxh264enc0> Queue of video_encode port 201 is empty
0:01:15.549293957  1654   0xccbaf0 DEBUG                    omx gstomx.c:460:gst_omx_component_wait_message:<omxh264enc-omxh264enc0> video_encode waiting for signal
^Chandling interrupt.
0:01:23.781345691  1654   0xb38000 DEBUG                GST_BUS gstbus.c:318:gst_bus_post:<bus1> [msg 0xb452a178] posting on bus application message: 0xb452a178, time 99:99:99.999999999, seq-num 130, element 'pipeline0', GstLaunchInterrupt, message=(string)"Pipeline\ interrupted";

The last call before the UI locks up appears to be "omx gstomx.c:460:gst_omx_component_wait_message: video_encode waiting for signal".

I am unable to reproduce the issue on the raspberry pi 3b+ even if I unplug my mouse, keyboard and monitor and reconnect them. It appears to keep playing fine.

I am happy to try and roll back to different versions of the raspberry pi firmware to debug the issue as that appears to have allowed for isolation of the issue when debugging #1087 .
But I am unsure which firmware versions to attempt and which would be compatible with the kernel running on my system.
Please let me know how I can provide more information to help you trouble shoot the issue.

I will see if I can generate the same issue when streaming from a file (I suspect it should be possible). It appears that the issue randomly occurs more based on other interactions with the system that may be happening, more so than the nature of the media being encoded.

@6by9
Copy link

6by9 commented Jul 15, 2020

When you create an issue you should be presented with a template, which includes a request for a load of information on firmware and kernel versions being used, and ideally any logs that you can retrieve.
Without at least the version information we can not progress any investigation.

I am hoping that this may be related to the issue fixed for ffmpeg over a year ago #1087

Seeing as it was a bug in the FFmpeg code, why would it change anything in GStreamer?

Which version of GStreamer are you using? Rasbian has a couple of bug fixes in gstreamer-1.0-onx-rpi, including the equivalent to the fix applied to FFmpeg. If you've built GStreamer from scratch then you would want to also pick up that patch.

@46cv8
Copy link
Author

46cv8 commented Jul 15, 2020

Hey 6by9, thanks for your reply. I'm not sure why but I don't remember the "write" section for New issue having a description in it when I made this ticket yesterday. I attempted to make another ticket now and it was indeed populated. Below are some of the details regarding the various versions that it says to provide as well as the gstreamer version you requested.

pi@raspberrypi:~/chat/demo $ cat /etc/rpi-issue
Raspberry Pi reference 2020-05-27
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 825107f04027269db77426046f5085475b1ea22f, stage4
pi@raspberrypi:~/chat/demo $ vcgencmd version
Jul 13 2020 13:55:18 
Copyright (c) 2012 Broadcom
version adcebbdb7b415c623931e80795ba3bae68dcc4fa (clean) (release) (start)
pi@raspberrypi:~/chat/demo $ uname -a
Linux raspberrypi 5.4.51-v7l+ #1325 SMP Mon Jul 13 13:47:17 BST 2020 armv7l GNU/Linux
pi@raspberrypi:~/chat/demo $ gst-launch-1.0 --version
gst-launch-1.0 version 1.14.4
GStreamer 1.14.4
http://packages.qa.debian.org/gstreamer1.0

My initial hope was that this might have been the same issue that was fixed for ffmpeg that had not yet been fixed for gstreamer but this does not appear to be the case. The version I am using is the one installed via regular apt-get install gstreamer1.0-tools (I am not building from source). I have noticed that if I reboot and am careful not to do anything that might result in dropped packets/data I am able to have a stream work for over 1-2 hours. I am going to try and hookup a seperate mouse and monitor to another raspberry pi to test over wifi to see the frequency of the issue. If gstreamer were just crashing or pegging the CPU I would take the issue there, but what happens is a total lockup of the system with a low cpu load of 0% to 1% for gst-launch-1.0

If there are any other details I missed or that might be of assistance please let me know. I included the gstreamer debug output in my original post. I'm unsure what precise output from dmesg would be of interest but I'm happy to try and get some logs from that too.

@46cv8
Copy link
Author

46cv8 commented Jul 15, 2020

So afew more data points.
There is no output to dmesg at the time of the error.
I've been able to reproducibly trigger the issue every time within about 20 seconds if I follow a specific set of steps below.
I don't think the actual steps are necessarily required to cause the issue, but they appear to put the system in a state where it is more likely to occur.

  1. In one terminal start gstreamer capturing video and audio from webcam and streaming over rtp
gst-launch-1.0 v4l2src device=/dev/video0 do-timestamp=1 ! video/x-raw,width=800,height=448,framerate=30/1 ! videoconvert ! omxh264enc control-rate=1 target-bitrate=800000 ! h264parse config-interval=1 ! rtph264pay ! udpsink host=234.5.5.5 port=5000 async=false alsasrc device=hw:2,0 do-timestamp=1 ! queue ! audioconvert ! audio/x-raw,format=S16LE,rate=48000,channels=2 ! opusenc bitrate=96000 ! rtpopuspay ! udpsink host=234.5.5.5 port=5002 async=false
  1. in another window launch vlc and open the sdp file below to display the webcam and play audio
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
t=0 0
a=tool:libavformat 58.48.100
m=video 5000 RTP/AVP 96
c=IN IP4 234.5.5.5
b=AS:800
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1
m=audio 5002 RTP/AVP 96
c=IN IP4 234.5.5.5
b=AS:96
a=rtpmap:96 opus/48000/2
  1. now open chrome and open a tab and visit a website, if it doesn't crash immediately open a second tab and visit a second website. normally the system starts freezing up after the second tab and it is very difficult to tab back to the original gstream terminal to kill gstream.

Just afew notes about things I tried to see if I could reduce the number of steps that caused the issue to occur or reduce it's frequency significantly enough that it did not occur for at least 5 minutes.

  1. if you don't open chrome and don't touch the system, it can continue to play for hours
  2. if you use "videotestsrc" instead of "v4l2src device=/dev/video0 do-timestamp=1" I was unable to trigger the issue despite opening lots of chrome tabs and even visiting youtube.
  3. if you don't include the audio stream "alsasrc device=hw:2,0 do-timestamp=1 ! queue ! audioconvert ! audio/x-raw,format=S16LE,rate=48000,channels=2 ! opusenc bitrate=96000 ! rtpopuspay ! udpsink host=234.5.5.5 port=5002" the frequency of the error appears to be a lot lower, it didn't occur for at least 5 minutes. I have seen (as mentioned in my original comment) that doing other things that trigger CPU load like unplugging usb mice/keyboards, will still cause the issue, even if only streaming video.

If there are any other data points you want me to obtain by trying to eliminate specific steps from the repro steps above let me know. This seems to be a bit of a tricky issue to reproduce as it appears like it may need the system to be experiencing a specific type of load to end up in the state which causes things to lock up.

I had read something about omxh264enc perhaps using zerocopy that could result in systems locking up, but I'm unsure of what specifically that means and if it might be related.

Also I realize this is an unrelated bug, but I'm throwing the details in here just incase it's indicative of a larger issue that may be causing both problems. I am unable to use gstreamer "omxh264dec" as it appears something may have been built incorrectly relating to it. (this does work fine when plugging the same sdcard into my raspberry pi 3b+ so the issue is specific to raspberry pi 4 and it's firmware)

pi@raspberrypi:~/chat/demo $ gst-launch-1.0 filesrc location=./test.h264 ! h264parse ! omxh264dec ! omxh264enc control-rate=1 target-bitrate=800000 ! h264parse config-interval=1 ! rtph264pay ! udpsink host=234.5.5.5 port=5000 async=false
Setting pipeline to PAUSED ...
0:00:00.109982988  6967  0x101ac80 ERROR                    omx gstomx.c:799:gst_omx_component_new:<omxh264dec-omxh264dec0> Failed to get component handle 'OMX.broadcom.egl_render' from core '/opt/vc/lib/libopenmaxil.so': 0x80001000
0:00:00.110058894  6967  0x101ac80 WARN            videodecoder gstvideodecoder.c:2535:gst_video_decoder_change_state:<omxh264dec-omxh264dec0> error: Failed to open decoder
ERROR: Pipeline doesn't want to pause.
ERROR: from element /GstPipeline:pipeline0/GstOMXH264Dec-omxh264dec:omxh264dec-omxh264dec0: Could not initialize supporting library.
Additional debug info:
gstvideodecoder.c(2535): gst_video_decoder_change_state (): /GstPipeline:pipeline0/GstOMXH264Dec-omxh264dec:omxh264dec-omxh264dec0:
Failed to open decoder
Setting pipeline to NULL ...
Freeing pipeline ...

@46cv8
Copy link
Author

46cv8 commented Jul 15, 2020

One last point, I swapped out the webcam as the video source for a raspberry pi camera. I continued to bring the data in via "v4l2src device=/dev/videoNNN do-timestamp=1" (where N is the dev for the native raspberry pi camera now). I was unable to cause the issue. Even when I tried unplugging keyboard, mouse and monitor and opening many chrome tabs, it continued to function for over 10 minutes. I'm not sure how using the usb webcam could be related but it appears it may be in some way. This also fits with the case I mentioned earlier where I was unable to reproduce the issue using videotestsrc instead of the usb camera.

@6by9
Copy link

6by9 commented Jul 16, 2020

omxh264dec is broken on Pi4 as they try to merge video_decode with the firmware GLES egl_render component, the latter of which is incompatible with the updated 3D block on the Pi4.

The recommendation is to switch from the OMX codec components to the V4L2 ones as they are also more efficient in terms of copying behaviour. It's the same underlying encoder/decoder, but the frameworks allow more efficient buffer passing.
v4l2h264enc extra-controls="controls,video_bitrate=2000000;" should do the same as your omxh264enc fragment.

You may even be able to skip the (software) videoconvert with v4l2h264enc as it supports a greater number of input formats. It depends on exactly what format your webcam is producing. Your command lines don't appear to force any particular format, so your webcam, Pi camera, and videotestsrc could all be producing different formats.

v4l2src device=/dev/video0 do-timestamp=1 io-mode=5 ! video/x-raw,width=800,height=448,framerate=30/1 ! v4l2h264enc output-io-mode=4 extra-controls="controls,video_bitrate=2000000;" ! ..... should avoid a fair chunk of the processing overhead. (UVC cameras have different memory requirements to the codec, therefore it requires the codec to allocate the memory via the io-mode parameters). If I can find my USB webcam I'll confirm that syntax later.

@46cv8
Copy link
Author

46cv8 commented Jul 16, 2020

Thanks so much for your suggestions. The last suggestions reduced gstreamer cpu utilization by between 5% and 10% down to around 20% from 27%. Unfortunately the issue still persists when I use a USB webcamera. I have two identical web cameras and was able to confirm that the issue occurs identically for both cameras so I don't think it is a hardware fault in the camera (and even if it was it wouldn't expect it to cause the UI to lock with low CPU figures).

For reference the latest command I'm using is the one you suggested.

gst-launch-1.0 v4l2src device=/dev/video0 do-timestamp=1 io-mode=5 ! video/x-raw,width=800,height=448,framerate=30/1 ! v4l2h264enc output-io-mode=4 extra-controls="controls,video_bitrate=2000000;" ! h264parse config-interval=1 ! rtph264pay ! udpsink host=234.5.5.5 port=5000 async=false alsasrc device=hw:2,0 do-timestamp=1 ! queue ! audioconvert ! audio/x-raw,format=S16LE,rate=48000,channels=2 ! opusenc bitrate=96000 ! rtpopuspay ! udpsink host=234.5.5.5 port=5002 async=false

I've uploaded the full GST_DEBUG=5 log output to the following zip file https://www.udrop.com/SMc/gst_007.zip it's 24 MB uncompressed. Basically the highlights are the following.

0:00:08.638706373 �[336m 2609�[00m 0xb3403a60 �[37mDEBUG  �[00m �[00m      v4l2bufferpool gstv4l2bufferpool.c:1699:gst_v4l2_buffer_pool_process:<v4l2h264enc0:pool:src>�[00m process buffer 0xb1ffed78
0:00:08.638738947 �[336m 2609�[00m 0xb3403a60 �[37mDEBUG  �[00m �[00m            GST_POLL gstpoll.c:1317:gst_poll_wait:�[00m 0xb3e0ae00: timeout :99:99:99.999999999
##### PI UI starts to lockup and audio stream also locks up, there are no more outputs from v4l2bufferpool in the log until just before I managed to kill the process, the audio stream appears to attempt to process audio data every 4-6 seconds but fails due to missing data presumably because the system is locked. #######
0:00:10.676964733 �[336m 2609�[00m 0xb34043b0 �[37mDEBUG  �[00m �[00m          ringbuffer gstaudioringbuffer.c:2085:gst_audio_ring_buffer_set_timestamp:<audiosrcringbuffer0>�[00m Storing timestamp 99:99:99.999999999 @ 7
0:00:10.677103028 �[336m 2609�[00m 0xb34043b0 �[37mDEBUG  �[00m �[00m          ringbuffer gstaudioringbuffer.c:1935:gst_audio_ring_buffer_advance:<audiosrcringbuffer0>�[00m signal waiter
...
##### process sees ctrl-c interrupt #####
0:00:39.532839794 �[336m 2609�[00m  0x1662000 �[37mDEBUG  �[00m �[00;43m             GST_BUS gstbus.c:318:gst_bus_post:<bus1>�[00m [msg 0xb3424468] posting on bus application message: 0xb3424468, time 99:99:99.999999999, seq-num 164, element 'pipeline0', GstLaunchInterrupt, message=(string)"Pipeline\ interrupted";
...
0:00:39.594693538 �[336m 2609�[00m  0x1662000 �[37mDEBUG  �[00m �[00m      v4l2bufferpool gstv4l2bufferpool.c:960:gst_v4l2_buffer_pool_flush_start:<v4l2h264enc0:pool:sink>�[00m start flushing
0:00:39.594721205 �[336m 2609�[00m  0x1662000 �[37mDEBUG  �[00m �[00m      v4l2bufferpool gstv4l2bufferpool.c:960:gst_v4l2_buffer_pool_flush_start:<v4l2h264enc0:pool:src>�[00m start flushing

I'm starting to wonder why this lock-up only happens when using the usb webcam and not the pi camera through the custom connector. Perhaps it is a USB firmware issue and not graphics?

@6by9
Copy link

6by9 commented Jul 16, 2020

If it crashes only on a Pi4, and only with a USB webcam, then I'm suspecting a USB issue not one in the codec.

Can you confirm the USB chipset firmware in use? See https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=273027

@46cv8
Copy link
Author

46cv8 commented Jul 16, 2020

Thanks for the fast reply.

BCM2711 detected
Dedicated VL805 EEPROM detected
BOOTLOADER: up-to-date
CURRENT: Thu 16 Apr 2020 05:11:26 PM UTC (1587057086)
 LATEST: Thu 16 Apr 2020 05:11:26 PM UTC (1587057086)
 FW DIR: /lib/firmware/raspberrypi/bootloader/critical
VL805: up-to-date
CURRENT: 000137ad
 LATEST: 000137ad

@46cv8
Copy link
Author

46cv8 commented Jul 20, 2020

Hey 6by9 did you want me to attempt to upgrade or downgrade my firmware? Also please let me know if I should close this issue as it appears to be a USB firmware issue and not an issue related to the hardware accelerated encoding. I can close it if you feel it is appropriate. I will probably make use of the pi camera module anyways instead of a webcam due to these issues preventing further development for now.

Also unrelated to this issue but related to your earlier suggestion about using v4l2h264dec. It appears there is some issue with v4l2h264dec where it dies when run over wlan0 due to missing packets I presume (It works fine when run over eth0). This is documented well by another user who hit it over half a year ago https://stackoverflow.com/questions/57946999/problem-when-playing-rtsp-stream-with-gstreamer. I appear to be having the exact same issue. Since omxh264dec doesn't work on pi4 I'm going to need to try and find some way to work around it. Are there any instructions your aware of that talk about how to build v4l2h264dec (and/or the gst-plugins-good repo that it's included in). I wasn't sure if it was good enough to follow any normal build instructions or if some raspberrypi specific patches needed to be applied to the base repo before attempting to build. Perhaps there is a script that already exists that is used to build the deb modules in the raspberrypi distribution. I expect in order to file a bug ticket related to v4l2h264dec and help test any potential fix I am going to need to be able to build gstream from source. (Sorry for the unrelated question)

@46cv8
Copy link
Author

46cv8 commented Jul 26, 2020

So just to update you I did try updating the VL805 firmware to 000138a1 as per your link.
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=273027
It didn't appear to fix the issue.

Before (and after) attempting the update I also tried setting the usb chipset power saving mode to 0x40, 0x41 and the default 0x43 as per.
#1282
sudo setpci -s 01:00.0 0xD4.B=0x41

I found that setting the power mode whilst on 000137ad didn't appear to make any difference for some reason.
After updating to 000138a1 I found that setting to 0x40 and 0x41 (but not 0x42 or 0x43) yielded a system that would still not work, but would be easier to recover from the hung state.

By this I mean that when using 000137ad (with any mode) or 000138a1 (with 0x43) the UI would freeze up (or all usb input would be dropped, hard to tell which) for about 5-10 seconds at a time then return for afew milliseconds. When using 000138a1 with 0x40 or 0x41 the UI would freeze up for 3 seconds, come back for about 500ms then freeze up for 1-2 second come back for 500 ms then go away for 1-2 seconds, this made it much easier to recover from the hung (dropped USB) state.

One last thing if I run my webcam at 320x240 yuyv instead of 640x480 yuyv or 800x448 yuyv it doesn't cause the USB to fail, so I'm pretty sure it's a throughput issue, perhaps related to how fast the cpu is able to pull data out of some usb chpset queue, due the cpu is being busy doing other things I've asked of it. Maybe the pi 3b+ has some different interrupt scheme or queue or something...

In short it seems the raspberry pi 4 USB support for USB2 hi speed is most likely the cause of the hanging, it's a shame that despite all your teams efforts over the last year the performance for USB2 is still not as good as the pi 3b+. Do you have any idea if it will ever be as good or is there some fundamental change in how the data is handled (beyond the totally different USB chipset).

I was able to build gstreamer from source and the issue with it hanging in 1.42 is resolved in 1.62 and was able to get a system working with the raspberry pi camera module that will do what I need for now. I'm going to close this issue as it seems you and your team are away of the USB issues already and actively working on resolving them still. I wish you the best of luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants