Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISP freezes with new firmware #1915

Open
mchalain opened this issue Aug 14, 2024 · 7 comments
Open

ISP freezes with new firmware #1915

mchalain opened this issue Aug 14, 2024 · 7 comments

Comments

@mchalain
Copy link

Is this the right place for my bug report?
My bug occurs depending on the version of start.elf and fixup.dat and it is available with my application.
The application runs on old raspberry firmwares and on PC. The linux kernel version has not effect on the trouble.
Then I try here to find help.

Describe the bug
My application fastvideo uses an ov5647 camera and the ISP directly with V4L2 API (not the libcamera).

A first process reads video0 and push the data to the ISP video13 with DMA_buf protocol.
A second process configure video14 to read the ISP with DMA_buf and push then to the gpu.
But after the STREAM_ON the first process treats one buffer to push on the ISP and the both process doesn't receive any event from the kernel other than a timer. If we kill the second process, this one is blocking inside the STREAM_OFF.

To reproduce

$ sudo apt install libjansson-dev
$ git clone https://github.com/mchalain/fastvideo
$ cd fastvideo
$ make defconfig
$ make DEBUG=y
$ LD_LIBRARY_PATH=./src ./src/fastvideo -j data/raspicam.json -i cam-ov5647 -o isp-in -D
$ LD_LIBRARY_PATH=./src ./src/fastvideo -j data/raspicam.json -i isp-out -o gpu

Expected behaviour
The camera image should appear on the screen.

Actual behaviour
A frame is opened on the screen but it is empty. CTRL+C inside the second process is blocking inside STREAM_OFF

System
Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:

  • Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW
    PiZeroW2 + CAM OV5647

  • Which OS and version (cat /etc/rpi-issue)?

Raspberry Pi reference 2024-03-15
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f19ee211ddafcae300827f953d143de92a5c6624, stage4
  • Which firmware version (vcgencmd version)?
May 24 2024 15:31:28 
Copyright (c) 2012 Broadcom
version 4942b7633c0ff1af1ee95a51a33b56a9dae47529 (clean) (release) (start)

and run on

Jan 17 2022 19:22:03 
Copyright (c) 2012 Broadcom
version bd34f55ef7b01b0a367f131060b561a2a58b80bb (clean) (release) (start)
  • Which kernel version (uname -a)?
Linux raspberrypi 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux

Logs
After 1 second I have a kernel oops:

4,443,258717297,-;------------[ cut here ]------------
4,444,258717320,-;Firmware transaction timeout
4,445,258717383,-;WARNING: CPU: 2 PID: 54 at drivers/firmware/raspberrypi.c:68 rpi_firmware_property_list+0x254/0x278
4,446,258717417,-;Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep brcmfmac_wcc brcmfmac brcmutil hci_uart cfg80211 btbcm bluetooth raspberrypi_hwmon ov5647 ecdh_generic ecc bcm2835_codec(C) bcm2835_unicam bcm2835_v4l2(C) bcm2835_isp(C) rfkill v4l2_dv_timings v4l2_mem2mem bcm2835_mmal_vchiq(C) libaes v4l2_fwnode v4l2_async videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 binfmt_misc videodev videobuf2_common vc_sm_cma(C) raspberrypi_gpiomem snd_bcm2835(C) mc uio_pdrv_genirq uio i2c_dev fuse dm_mod ip_tables x_tables ipv6 vc4 snd_soc_hdmi_codec drm_display_helper cec drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd backlight i2c_mux_pinctrl i2c_mux i2c_bcm2835
4,447,258717746,-;CPU: 2 PID: 54 Comm: kworker/2:1 Tainted: G         C         6.6.31+rpt-rpi-v8 #1  Debian 1:6.6.31-1+rpt1
4,448,258717759,-;Hardware name: Raspberry Pi Zero 2 W Rev 1.0 (DT)
4,449,258717769,-;Workqueue: events get_values_poll [raspberrypi_hwmon]
4,450,258717795,-;pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
4,451,258717806,-;pc : rpi_firmware_property_list+0x254/0x278
4,452,258717816,-;lr : rpi_firmware_property_list+0x254/0x278
4,453,258717826,-;sp : ffffffc0805ebcb0
4,454,258717831,-;x29: ffffffc0805ebcb0 x28: 0000000000000000 x27: ffffff800249dd40
4,455,258717850,-;x26: 000000000000000a x25: ffffffc080498008 x24: 0000000000001000
4,456,258717867,-;x23: ffffff801ba87dc0 x22: ffffffda3e1caea0 x21: 0000000000000010
4,457,258717884,-;x20: ffffff800249dd00 x19: ffffffc080498000 x18: 0000000000000000
4,458,258717902,-;x17: 0000000000000000 x16: 0000000000000000 x15: 0000007fb4b0d628
4,459,258717919,-;x14: ffffffc080eabc90 x13: 74756f656d697420 x12: 6e6f69746361736e
4,460,258717937,-;x11: fffffffffffe0000 x10: ffffffda3e0a60b0 x9 : ffffffda3cb09594
4,461,258717954,-;x8 : 00000000ffffefff x7 : ffffffda3e0a3710 x6 : 00000000000001bc
4,462,258717971,-;x5 : ffffff801adcdd48 x4 : 0000000000000000 x3 : 0000000000000027
4,463,258717987,-;x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80025bdc40
4,464,258718005,-;Call trace:
4,465,258718011,-; rpi_firmware_property_list+0x254/0x278
4,466,258718023,-; rpi_firmware_property+0x78/0x120
4,467,258718032,-; get_values_poll+0x54/0x138 [raspberrypi_hwmon]
4,468,258718049,-; process_one_work+0x148/0x3b8
4,469,258718060,-; worker_thread+0x32c/0x450
4,470,258718070,-; kthread+0x11c/0x128
4,471,258718083,-; ret_from_fork+0x10/0x20
4,472,258718097,-;---[ end trace 0000000000000000 ]---

and after the CTRL+C the following kernel message appears

3,473,258718126,-;hwmon hwmon1: Failed to get throttled (-110)
 SUBSYSTEM=hwmon
 DEVICE=+hwmon:hwmon1
3,474,260445322,-;raspberrypi-clk soc:firmware:clocks: Failed to change fw-clk-arm frequency: -110
 SUBSYSTEM=platform
 DEVICE=+platform:soc:firmware:clocks
3,475,261341303,-;bcm2835_mmal_vchiq: timed out waiting for sync completion
3,476,261341335,-;bcm2835-isp bcm2835-isp: bcm2835_isp_node_stop_streaming: Failed disabling o/p port, ret -62
3,477,263357303,-;bcm2835-isp bcm2835-isp: bcm2835_isp_node_stop_streaming: Timeout waiting for buffers to be returned - 4 outstanding

Additional context

  • Any vcgencmd command doesn't respond.
  • Important thing: libcamera-hello runs well with my camera and the last firmware. but libcamera-hello generates glitch with the old ones.
  • With buildroot system I have the same behaviour with Linux 6.1.74 and 6.6.38
  • the command "LD_LIBRARY_PATH=./src ./src/fastvideo -j data/raspicam.json -i cam-ov5647 -o file:test.raw " stores correctly the stream on the memory. Then the trouble seems to come from the ISP configuration. But impossible to find example or any indication of the ISP new configuration.

Please help

@mchalain
Copy link
Author

Hello every body,
Here my trouble, I may send more information.
Thank you a lot for any help.

@popcornmix
Copy link
Contributor

Might be worth adding "over_voltage=2" to config.txt, rebooting and testing again.

@mchalain
Copy link
Author

mchalain commented Sep 4, 2024

Thx popcornmix! but it doesn't the problem.
I found that the ISP capture (video14) must start (streamon) before to queue buffer into the ISP output (video13).
This limitation is recent, and create a kernel crash and a VideoCore freeze.

@6by9
Copy link

6by9 commented Sep 4, 2024

Recent is relative. bcm2835-isp last had a very minor update in April 2024, and the firmware side was last changed in April 2023.

Memory says that buffers submitted to the ISP input queue (video13) will be sent to the outputs that are enabled at the point of submission. Submitting to video13 with none of video14-16 enabled should therefore just result in a discard and return of the input buffer.
This did change back in Oct 2022 - prior to that it would wait for video14 to be enabled before processing the frame.

I will see if I can find a few moments to run your app (assuming it will build with a simple make.
Otherwise if you could add start_debug=1 to /boot/firmware/config.txt, reboot, run your app, and then sudo vclog -a should list any asserts that have fired on the Videocore side. That might give us a heads up on what is going wrong.

@mchalain
Copy link
Author

mchalain commented Sep 9, 2024

Hello,
your description looks like I see with each firmware.
The new feature may not be an issue (but it's worry to not be able to start the before before choice between preview or record).
But the new feature generates a freeze of GPU and after a kernel crash when it tries to get informations from GPU.
My current application runs correctly on RaspberryOS, but I made some enhancements and now I have to correct to run on this OS (it runs on Buildroot). I will send to you the log when I correct my code.
I saw another issue when I call STREAMOFF if there is buffers in the pipe. But I can give more informations, I was a crash of the unicam driver.
Thank you.

@6by9
Copy link

6by9 commented Oct 1, 2024

I've finally found time to try your app.
I'm on a Pi4 so need to render to "screen" (DRM) instead of "gpu" (GL) as your code isn't looking for a correct DRM render device.

The image is very blue, and torn (it looks like a buffer start address is wrong so we get a wrap about 1/3rd of the way down the buffer), but I can't reproduce a crash or firmware lockup.

This is on the latest firmware and kernel.

$ vcgencmd version
May  3 2024 19:32:42 
Copyright (c) 2012 Broadcom
version fef7849c350cb250be836afc6d43c742f9ebc97a (clean) (release) (start_db)

$ uname -a
Linux bookworm64-1 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt1 (2024-09-26) aarch64 GNU/Linux

Tried on a Pi3, and get similar results. I get weird images on the display if the -i cam-ov5647 -o isp-in side is started first, but correct images if the -i isp-out -o screen side is started first.

The firmware side is stalling processing as there are no output ports enabled, so it views it as unnecessary to process input buffers - all it could do is return them to the application. They should be returned cleanly if the application quits.

TBH having looked at it, it is not expected for a V4L2 M2M device to be opened by 2 totally different processes and effectively be treated as a Linux pipe between processes. There are discussions being had at the moment which will bind a context to these devices so that you can run multiple instances of the ISP, and not by replicating /dev/video nodes as we do at present.

The weird images do make me wonder if there is a path where the ISP isn't correctly configured. That would have the potential to write to random parts of memory and do bad things, however I'm not convinced it's worth chasing down.
There is something in the logs that does make it look like we're only producing 480 lines on the output instead of the expected 1080, so I'll have a quick look into that.

@6by9
Copy link

6by9 commented Oct 1, 2024

There is something different in the ISP configuration depending on whether input or output is started first.
If input is started first, I'm only getting 480 output lines signaled, and the conversion is taking around 2250usecs.
If output is started first, I'm getting the expected 1080 output lines signaled, and the conversion is taking around 7500usecs.

I think I've identified the issue, so will throw up a test firmware when CI has done the business.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants