Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia Jetson gstreamer support (WIP) #2440

Closed
wants to merge 11 commits into from

Conversation

yury-sannikov
Copy link
Contributor

hi @blakeblackshear,
First of all, thank you for this awesome project!

I also struggle with the hardware video decode on Jetson Nano. As far as I understood, ffmpeg will be sort of PITA on the Tegra platform. So I made an attempt to set up GStreamer with Frigate.

This work is very far from being done, though it was mostly intended to be a proof of concept and request for comments/suggestions.

Here is what I have now:

frigate.yml can have a gstreamer: section instead of ffmped like

cameras:
  farther_cam:
    gstreamer:
      inputs:
        - path: rtsp://admin:123456@192.168.5.95:554/stream0
          roles:
            - detect

This configuration has no GStreamer configuration so it falls back to the GStreamer videotestsrc video source. The result looks like this:
image

With the knowledge of supported decoders one can come up with the following configuration:

cameras:
  farther_cam:
    gstreamer:
      decoder_pipeline:
        - rtph265depay
        - h265parse
        - omxh265dec
      inputs:
        - path: rtsp://admin:123456@192.168.5.95:554/stream0
          roles:
            - detect
   
    detect:
      width: 1280
      height: 720

image

I'm using a CPU detector and the usage might look pretty high. Though NVDEC is active.
image

In order to have full control over the GStreamer, one can come up with the manual pipeline which might look something like this:

cameras:
  farther_cam:
    gstreamer:
      manual_pipeline:
        - rtspsrc location="rtsp://admin:123456@192.168.5.95:554/stream0"
        - rtph265depay
        - h265parse
        - omxh265dec
        - video/x-raw,format=(string)NV12
        - videoconvert
        - videoscale
        - video/x-raw,width=(int)1280,height=(int)720,format=(string)I420
        - videoconvert
      inputs:
        - path: rtsp://admin:123456@192.168.5.95:554/stream0
          roles:
            - detect
   
    detect:
      width: 1280
      height: 720

This set up successfully up and running on my Jetson Nano and control the garden lights. Though I see no issues with GStreamer as a back end, I did not have time to make ffmpeg work. As a result, the live stream is broken. There is a list of things that makes this PR far from ideal:

  • Dockerfile setup is a full mess
  • No RTMP support
  • ffmpeg is broken, no live stream or recording
  • configuration validation is far from ideal

As far as future improvement, I have a couple of questions:

  • do you feel GStreamer is a way to go, at least for the NV Tegra devices family?
  • if GStreamer is used, is it okay to run it side-by-side with the ffmpeg for live streaming, birdseye, recording?
  • what do you think about a completely separate Dockerfile based on something like https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-tensorrt
  • having the TensorRT, do you think it make sense to support TensorRT in addition to the tflite_runtime?

Thanks

@netlify
Copy link

netlify bot commented Dec 13, 2021

✔️ Deploy Preview for frigate-docs canceled.

🔨 Explore the source changes: 0097ddb

🔍 Inspect the deploy log: https://app.netlify.com/sites/frigate-docs/deploys/61cafb89eea2d30007e16ebd

@blakeblackshear
Copy link
Owner

I don't have any issues with this direction. I looked at gstreamer previously, but ffmpeg is much more widely used and I am much more familiar with it. It seems totally reasonable to run it in parallel when needed.

I also think TensorRT is worth including. I tried a while back, but there were python version incompatibilities that I couldn't resolve. I just haven't had time to come back around to trying again.

These changes would likely be for 0.11.0. I am also planning to try and switch to jellyfin-ffmpeg for that release.

@ozett
Copy link

ozett commented Dec 13, 2021

is this the right place to collect ideas?
as its always hard to get the right parameter, a gui-section to try out (in a drop-down) some (pre-defined) options would be great. octoprint has a simple example for the octo-cam section.

but you could guess how this is could look like...
overall. great to go for the jetson nano with gstreamer and tensorRT... could make this small thing a big one..

@yury-sannikov
Copy link
Contributor Author

hi @blakeblackshear
Any plans to make the detector component pluggable?
Since L4T docker image is based on Bionic and will stay like that pretty long, I'm looking into the ways on how to run Frigate. Version downgrade to 3.6 is relatively easy to do now, however, with a 0.10 release it might not be possible anymore. One of the solutions might be running Frigate on python3.8 and detector on 3.6 as a separate pluggable process. I would like to get your opinion on that.

@blakeblackshear
Copy link
Owner

I thought I saw that TensorRT was compatible with python 3.8 a while back. I don't know much about the L4T containers.

@blakeblackshear
Copy link
Owner

Take a look at watsor to see if it gives you any ideas: https://github.com/asmirnou/watsor/tree/master/docker

@yury-sannikov
Copy link
Contributor Author

From a first look, it seems it's based on 3.6: tflite_runtime-2.5.0-cp36-cp36m-linux_aarch64

@yury-sannikov
Copy link
Contributor Author

@ozett
Copy link

ozett commented Dec 25, 2021

promising performance results for more than 8 cams...
wb666greene/AI-Person-Detector#11 (comment)

@yury-sannikov
Copy link
Contributor Author

I have pretty decent results running 7 cameras @10 fps with software inference and gstreamer with hardware support. I also get ~25fps at 1080P inference speed with the yolov4-tiny-416 model using TensorRT. My goal now is to make a Frigate version that uses gsreamer + TenorRT inference and stress test this setup on my hardware.

I have all pieces together, though I have a hard time compiling some of the dependencies during the Docker Build phase. I have to either add "default-runtime": "nvidia" to the /etc/docker/daemon.json or to do some sort of scripting (you can check out 2 last commits) to make it work. This really complicates the build process and I feel the L4T build will be sort of a one-off thing (which I really like to avoid). I will put an update here whenever it will be available for testing.

@ozett
Copy link

ozett commented Dec 27, 2021

as i have no coding skills but can only read some lines ... so i follow your commits and refactoring code for the nano (and with foreign-detectors looks like some other improvements).
that all looks great and i keep watching it growing.
i have some spare nanos to test something, as it becomes a need.... 👍

@yury-sannikov
Copy link
Contributor Author

Hi folks,
I was able to make it to a point where Jetson Nano can use TensorRT for GPU inference and GStreamer for the hardware-accelerated decoding.
Though it's very early to consider this effort done, I made a docker image for testing. The image has the yolov4-tiny-288 and yolov4-tiny-416 models baked in. You can pull it from asdocker pull yury1sannikov/frigate.l4t:latest Sorry, no build instructions yet.

I'm running this setup on standard Jetpack4.6 [L4T 32.6.1] on the 4Gb version of Jetson Nano. I have to bump up my swap memory to 4Gb to be able to generate yolov4-416 TensorRT model.

In order to run you might try this command

sudo docker run --rm --name frigate.l4t \
    --mount type=tmpfs,target=/tmp/cache,tmpfs-size=1000000000 \
    -v $(pwd)/media:/media/frigate:rw \
    -v $(pwd)/frigate.yml:/config/config.yml:ro \
    -v $(pwd)/labelmap.txt:/opt/frigate/labelmap.txt:ro \
    -v $(pwd)/cpu_model.tflite:/opt/frigate/cpu_model.tflite:ro \
    -v $(pwd)/edgetpu_model.tflite:/opt/frigate/edgetpu_model.tflite:ro \
    -v /tmp/argus_socket:/tmp/argus_socket \
    -e FRIGATE_RTSP_PASSWORD='password' \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video \
    -p 5000:5000 \
    -p 1935:1935 \
    --runtime=nvidia \
    --privileged \
    yury1sannikov/frigate.l4t:latest

My frigate.yml looks like this:

mqtt:
  host: x.x.x.x
  user: hassio
  password: ******

detectors:
  JetsonNano:
    type: tensorrt

rtmp:
  # Optional: Enable the RTMP stream (default: True)
  enabled: False


cameras:
  test_cam:
    gstreamer:
      manual_pipeline:
        - rtspsrc location="rtsp://user:pass@x.x.x.x:554/stream0"
        - rtph265depay
        - h265parse
        - omxh265dec
        - video/x-raw,format=(string)NV12
        - videoconvert
        - videoscale
        - video/x-raw,width=(int)1280,height=(int)720,format=(string)I420
        - videoconvert
      inputs:
        - path: rtsp://user:pass@x.x.x.x:554/stream0
          roles:
            - detect
    detect:
      width: 1280
      height: 720


  pole_cam:
    gstreamer:
      manual_pipeline:
        - rtspsrc location="rtsp://user:pass@x.x.x.x:554/cam/realmonitor&subtype=1"
        - rtph264depay
        - h264parse
        - omxh264dec
        - video/x-raw,format=(string)NV12
        - videoconvert
        - videoscale
        - video/x-raw,width=(int)1920,height=(int)1080,format=(string)I420
        - videoconvert
      inputs:
        - path: rtsp://user:pass@x.x.x.x:554/cam/realmonitor&subtype=1
          roles:
            - detect

    zones:
      front_steps:
        coordinates: 1429,337,1140,324,1082,409,770,397,759,588,1710,695,1920,743,1920,369,1738,358
        objects:
          - person
        filters:
          person:
            # Optional: minimum score for the object to initiate tracking (default: shown below)
            min_score: 0.2
            # Optional: minimum decimal percentage for tracked object's computed score to be considered a true positive (default: shown below)
            threshold: 0.5            
    motion:
      threshold: 22
      contour_area: 50
      delta_alpha: 0.15
      frame_alpha: 0.15
      frame_height: 220    
      mask:
        - 0,364,498,384,500,587,736,565,735,406,955,404,972,224,1111,207,1112,0,0,0
    detect:
      width: 1920
      height: 1080

  farther_cam:
    gstreamer:
      decoder_pipeline:
        - rtph264depay
        - h264parse
        - omxh264dec
      inputs:
        - path: rtsp://user:pwd@x.x.x.x:554/cam/realmonitor?channel=0&subtype=0
          roles:
            - detect
    motion:
      mask:
        - 543,0,704,0,704,170,497,166,499,305,393,401,398,470,207,472,209,408,102,341,105,127,0,126,0,0,31,0,107,0,132,0
        - 477,576,704,576,704,542,472,537
    detect:
      width: 704
      height: 576

  street1:
    gstreamer:
      decoder_pipeline:
        - rtph265depay
        - h265parse
        - omxh265dec
      inputs:
        - path: rtsp://user:pwd@x.x.x.x:554/cam/realmonitor?channel=1&subtype=1
          roles:
            - detect
    motion:
      mask:
        - 960,0,960,69,0,89,0,0

    detect:
      width: 960
      height: 480

  street2:
    gstreamer:
      decoder_pipeline:
        - rtph265depay
        - h265parse
        - omxh265dec
      inputs:
        - path: rtsp://user:pwd@x.x.x.x:554/cam/realmonitor?channel=4&subtype=1
          roles:
            - detect
    motion:
      mask:
        - 960,0,0,0,0,104,960,99
        - 130,199,0,156,0,480,227,480
    detect:
      width: 960
      height: 480

  garden_top:
    gstreamer:
      decoder_pipeline:
        - rtph265depay
        - h265parse
        - omxh265dec
      inputs:
        - path: rtsp://user:pwd@x.x.x.x:554/cam/realmonitor?channel=3&subtype=1
          roles:
            - detect
    motion:
      mask:
        - 606,53,511,47,514,0,960,0,960,100,960,154,743,65
    detect:
      width: 960
      height: 480

  walkway:
    gstreamer:
      decoder_pipeline:
        - rtph265depay
        - h265parse
        - omxh265dec
      inputs:
        - path: rtsp://user:pwd@x.x.x.x:554/cam/realmonitor?channel=2&subtype=1
          roles:
            - detect
    motion:
      mask:
        - 917,0,594,0,593,61,919,59
    detect:
      width: 960
      height: 480

objects:
  track:
    - person
    - bicycle
    - car
    - motorcycle
    - bus
    - train


logger:
  default: info

model:
  # Optional: path to the model (default: automatic based on detector)
  path: /yolo4/yolov4-tiny-416.trt
  # Optional: path to the labelmap (default: shown below)
  labelmap_path: /labelmap.txt
  # Required: Object detection model input width (default: shown below)
  width: 416
  # Required: Object detection model input height (default: shown below)
  height: 416



snapshots:
  # Optional: Enable writing jpg snapshot to /media/frigate/clips (default: shown below)
  # This value can be set via MQTT and will be updated in startup based on retained value
  enabled: true
  # Optional: print a timestamp on the snapshots (default: shown below)
  timestamp: true
  # Optional: draw bounding box on the snapshots (default: shown below)
  bounding_box: true
  # Optional: crop the snapshot (default: shown below)
  crop: False
#      # Optional: height to resize the snapshot to (default: original size)
#      height: 175
  # Optional: Camera override for retention settings (default: global values)
  retain:
    # Required: Default retention days (default: shown below)
    default: 10
    # Optional: Per object retention days
    objects:
      person: 100

I tested against yolo-tiny-416, yolo-tiny-288, yolo-416 and yolo-288. It seems, tiny model makes more sense to use. I did not see any detection improvements by using the full Yolo model. However, using yolo-tiny-288 gives you ~27ms inference speed vs ~37ms for yolo-tiny-416, however, I found yolo-tiny-288 has slightly worse detection accuracy.

Here is my debug output for yolo-tiny-416
image

Here is my debug output for yolo-tiny-288

image

The same for full yolo-416 model
image

@blakeblackshear
Copy link
Owner

This looks like a good start. I don't have any major issues with the direction here. It's probably best if we add gstreamer to all variations of the docker builds. It may be beneficial on other platforms too, and I don't want some config options to only work in some image variants.

Also, you may want to look at Ambianic since it also uses gstreamer. There may be some useful insights there.

@yury-sannikov
Copy link
Contributor Author

Yup, that makes perfect sense. This work is still in the feasibility study phase, I think. I have to figure out how to make gstreamer re-stream RTSP since it can't be backed by the FFmpeg.

@blakeblackshear
Copy link
Owner

What if you use ffmpeg to relay the RTMP stream and gstreamer just for decoding. Can you mix and match ffmpeg and gstreamer? I'm planning to replace ffmpeg for rtmp relay in the future anyway.

@yury-sannikov
Copy link
Contributor Author

Didn't do much for the last couple of days other than running side by side tolo4-tiny-416 on JetsonNano with TensorRT and gstreamer (7 cameras) and Intel NUC with vanilla Frigate using CPU inference (2 cameras). The configuration file for both setups is almost the same, except NUC has 2 cameras and CPU interference. I feel 7 cams is sort of okay but I would not run more on such a tiny device as Nano though I observe no issues except one. I got a bus error overnight, which was, hopefully, solved by tweaking the shm-size.

What I'm really happy about is that yolo-4, seems, have better sensitivity in low light/night mode conditions. Not sure if the inference speed makes any difference, though yolo4 tiny was able to detect a person ~25 meters away during the rain at night. I observed no such events registered by NUC, fed from the same camera.

I'm excited to finalize this PR and I might need some guidance from @blakeblackshear to complete that. Here is the plan which I have in mind:

  • make tflite runtime work on CPU and finalize edgetpu refactoring
  • figure out gstreamer + ffmpeg RTMP relay
  • autodetect necessary gstreamer pipeline elements using gst-discoverer-1.0
  • figure out the better way of converting darknet Yolo weights into the TensorRT model
  • optimize docker image
  • make available gstreamer to the other platforms as an alternative to the ffmpeg decoding
    I will appreciate any suggestions on the above.

It seems I won't be able to get away with having no Nvidia runtime enabled for the Jetson family builds. It seems other projects has the same issues.

@ozett
Copy link

ozett commented Dec 30, 2021

yolo-4, seems, have better sensitivity

i share that impression.

interesting to see what yolo5 does. its somehow new architecture based on pytorch.
and there are now really small nano-modells...
at my first glimpse it hink they also detect better in comparison to the google-models...

@blakeblackshear
Copy link
Owner

This PR will need to be rebased on the release-0.10.0 branch. It won't be included in that release, but that will update it to be based on the latest code. It's going to take time for me to review all of this in detail. Once it is merged in, I will be on the hook for keeping it functional, so I need to make sure I understand it all.

What inference speeds are you seeing on the yolo-4 model with the Nano?

@yury-sannikov
Copy link
Contributor Author

release-0.10.0 branch makes sense to me. Do you see any existing issues or upcoming work that may interfere with what I'm doing? I might hold on if this is the case.

If stats are correct, I'm getting ~37ms inference time for yolo-tiny-416 and ~27ms for yolo-tiny-288.

I haven't looked at the yolo5 in detail. I just noticed it's not the yolo4 successor, which is not necessarily bad. Also, I did not see any yolo5-tiny models. Looks like the full model is pretty heavy for Nano (although okay for more powerful devices).

I ran yolo-416 (218ms inference time) and yolo-tiny-416 (37ms) Full model better sees small elements on a camera and has fewer funny things like detecting a car as a donut, but the overall tiny model works pretty okay for my needs and it almost 6 times faster. I believe it's pretty easy to convert yolo5 into TensorRT format, I can try doing it after this PR is done.

@ozett
Copy link

ozett commented Dec 30, 2021

I haven't looked at the yolo5 in detail. I just noticed it's not the yolo4 successor, which is not necessarily bad.
Also, I did not see any yolo5-tiny models. Looks like the full model is pretty heavy for Nano (although okay for more powerful devices).

jetson nano is the board and yolo5v6n looks like the new yolo-tiny .
to be a nano (modell) not beeing to heavy for a nano (board) 😄

found some ncnn (??) of yolo5 explicit for the nano:
https://github.com/Qengineering/YoloV5-ncnn-Jetson-Nano

the yolo(4)-tiny maybe are now called yolo(5-v6)-nano:
https://github.com/ultralytics/yolov5/releases
image

running yolo5 on edgeTPU. some inspiration here:
ultralytics/yolov5#3428
ultralytics/yolov5#3428 (reply in thread)
image

@yury-sannikov
Copy link
Contributor Author

Oh, awesome! Sorry, Im pretty new to all of that stuff

@blakeblackshear
Copy link
Owner

There aren't any in progress changes that will conflict. The only thing that could conflict for the 0.11.0 release will be changes to ffmpeg and the use of RTMP. This will go out with 0.11.0 if ready. I just wish I could get a Jetson Nano for testing.

@ozett
Copy link

ozett commented Dec 31, 2021

I just wish I could get a Jetson Nano for testing.

preferably 4GB RAM version (over the smaller 2GB)
https://developer.nvidia.com/embedded/jetson-nano-developer-kit#collapseTechSpecs

out of stock @ the moment, [chip-crisis.. ⚠️ ]
https://www.seeedstudio.com/NVIDIA-Jetson-Nano-Development-Kit-B01-p-4437.html

@ozett
Copy link

ozett commented Dec 31, 2021

Oh, awesome! Sorry, Im pretty new to all of that stuff

i wish i could code, to help with the better road for yolo on the jetson nano. 😢

Yolo (v3/4/5) on Coral ?

the yolo-models could run on the coral, i guess,

v3
https://github.com/guichristmann/edge-tpu-tiny-yolo
v4
https://wiki.loliot.net/docs/lang/python/libraries/yolov4/python-yolov4-edge-tpu/
v5
https://www.codeproject.com/Articles/5293079/Deploying-YOLOv5-Model-on-Raspberry-Pi-with-Coral
v5 issues
google-coral/edgetpu#405

Yolo5 formatoptions:

image
ultralytics/yolov5#251

Yolo (v5v6?) as TensorRT ?

but maybe a better way to get them running "natively" as Tensor-RT models (like nvidia does itself) ?
https://www.amine-hy.com/project/yolo+tensorrt/
https://github.com/enazoe/yolo-tensorrt
https://github.com/jkjung-avt/tensorrt_demos#yolov4
https://github.com/TrojanXu/yolov5-tensorrt

Yolo (v3/4/5 and others) as TensorRTx ?

https://github.com/wang-xinyu/tensorrtx

Whats needed?

dont know whats needed to convert - and that frigate handles the results correctly...

Thought: as this is WiP for the jetson-nano-sbc (or jetson-tx?) and has powerful AI onboard,, maybe the jetson-sbc platform should not use the coral (or at least not depend on it) and use its own AI-capbilites? would make a single-hardware solution..

@yury-sannikov yury-sannikov deleted the gstreamer branch December 31, 2021 13:08
@yury-sannikov
Copy link
Contributor Author

yury-sannikov commented Dec 31, 2021

Rebased to 0.10 #2548 and apparently screwed up this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants