Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running in Docker #46

Open
derkenblosh opened this issue Jul 23, 2020 · 34 comments
Open

Segmentation fault when running in Docker #46

derkenblosh opened this issue Jul 23, 2020 · 34 comments

Comments

@derkenblosh
Copy link

derkenblosh commented Jul 23, 2020

Using the hosted docker image, B800 camera, and no other clients connecting to the camera

I'm able to test in BI, and connect to the subStream without issue (i do get random Segmentation Faults, but they do not seem to effect the stream). But when i attempt to access the mainStream, i get repeated Segmentation Faults every few seconds. an the stream will not load.

#https://github.com/thirtythreeforty/neolink
bind = "0.0.0.0"

[[cameras]]
name = "HD1"
username = "admin"
password = ""
address = "192.168.11.207:9000"
stream = "both"
format = "H264"
timeout = { secs = 5, nanos = 0 }
~
~
~
~
~
~
~
~
~
~
~
~
~
neolink.toml 1,4 All

[2020-07-23T01:03:51Z INFO neolink] HD1: Connected to camera, starting video stream mainStream,
Segmentation fault (core dumped),

@derkenblosh
Copy link
Author

... Going to setup on the host machine and report back if I receive the same errors. After some research, this looks like it might actually be a docker issue.

@m1k1o
Copy link
Contributor

m1k1o commented Jul 26, 2020

@derkenblosh maybe releated to #37? It looks like the same problem. Try to connect to stream explicitly using UDP. If it works, then it's that TCP issue.

@thirtythreeforty
Copy link
Owner

I'd bet the root cause is Gstreamer handling the extra data from #49 and #50 incorrectly. Granted it shouldn't crash, but I bet when that's implemented this problem will go away.

@thirtythreeforty
Copy link
Owner

@m1k1o would you try the Docker image for the magic_headers branch?

docker pull thirtythreeforty/neolink:magic_headers

It still has implementation issues but I want to see if it outright crashes on you.

@m1k1o
Copy link
Contributor

m1k1o commented Aug 2, 2020

@thirtythreeforty nope, still crashes. On top of that, UDP stream can't be watched. It's just gray, 100% loss of picture.

@thirtythreeforty
Copy link
Owner

Well gross.

@QuantumEntangledAndy
Copy link
Collaborator

@m1k1o If you have the time could you try my branch quantumentangledandy/neolink:magic_headers

docker pull quantumentangledandy/neolink:magic_headers

docker run -p 8554:8554 "--volume=$(pwd)/my_config.toml:/etc/neolink.toml" quantumentangledandy/neolink:magic_headers

Sorry about all of these test, but I can't seem to replicate your issues and that makes it difficult to debug

@QuantumEntangledAndy
Copy link
Collaborator

QuantumEntangledAndy commented Aug 5, 2020

p.s. I am still having a bit of trouble on UDP (think thats my connection though as I'm testing over a ssh tunnel) if you can please test tcp

Update: UDP is fine for me when I set the buffer large enough

ffplay -buffer_size 1M -rtsp_transport udp rtsp://.....

@QuantumEntangledAndy
Copy link
Collaborator

Update 2: UDP works without docker but not using. I think the reason for this is that its not forwarding the UDP ports. The docker image is set to just expose the 8554 tcp port. The UDP ports for me were

  • 63155
  • 63154
  • 63159
  • 63158
  • 60561
  • 60560
  • 60565
  • 60564

these ports were only created after a connection to the TCP port. The port are dynamic and depends on the number of cameras.

You might be able to get UDP to work with docker using

docker run -p 8554:8554 -p 6000-7000:6000-7000/udp "--volume=$(pwd)/my_config.toml:/etc/neolink.toml" quantumentangledandy/neolink:magic_headers

But I haven't had success with this and it is a slow start up because it maps the ports one by one.

@m1k1o
Copy link
Contributor

m1k1o commented Aug 5, 2020

@QuantumEntangledAndy UDP is working fine. With large buffer i was able to mitigate flickering, about 95% of the time is stream smooth. Your image is working well too, but unfortunately TCP is failing.

I am connecting to my container withing one network, from another container where I'm running ffmpeg, so I don't need to expose any UDP ports in order to get it working.

@QuantumEntangledAndy
Copy link
Collaborator

Oh thats a good way to do the UDP. I am quite new to docker so not sure of the options. But I suppose you are bridging the networks.

This branch is definitely an improvement and seems to be properly passing the BC format. I was hoping it would fix the TCP issue though.

Do you know if TCP works outside of docker?

@QuantumEntangledAndy
Copy link
Collaborator

QuantumEntangledAndy commented Aug 5, 2020

@m1k1o So.... I thought I'd search for segfaults on the GStreamer/gst-rtsp-server github page and found this and this

They point to a segfaults that are fixed in v1.17.2. The docker image uses 1.16.2. I am wondering if I can compile a docker image with newer gst binaries

Update Nevermind I was misreading the tag that it was inside, These have been fixed before 1.16.2

@QuantumEntangledAndy
Copy link
Collaborator

@m1k1o Could you try this one? docker pull quantumentangledandy/neolink:deb_experimental

@m1k1o
Copy link
Contributor

m1k1o commented Aug 5, 2020

That won't even start. :(

image

@QuantumEntangledAndy
Copy link
Collaborator

Wow really! It worked ok on my machine what is up...

@QuantumEntangledAndy
Copy link
Collaborator

I am building too more to test at the moment, and if those don't do it I am going to cry

@QuantumEntangledAndy
Copy link
Collaborator

@m1k1o I have a few more dockers to test if your free

  • These two should cover a few more versions of gst and also test without using alpine (see next notes)
    • quantumentangledandy/neolink:deb_testing
    • quantumentangledandy/neolink:deb_stable
  • This one is alpine but with some linker flag changes (it turns out alpine and rust don't play well together e.g., e.g., e.g.)
    • quantumentangledandy/neolink:docker_alt

@m1k1o
Copy link
Contributor

m1k1o commented Aug 6, 2020

quantumentangledandy/neolink:deb_testing and quantumentangledandy/neolink:deb_stable can't even start, same as above.
image

quantumentangledandy/neolink:docker_alt started, with UDP running smoothly and after requesting TCP it fails.
image

Update more logs:

quantumentangledandy/neolink:docker_alt
image

@QuantumEntangledAndy
Copy link
Collaborator

The lack of starting on debian is very odd, it is just a different os and one I've tested many times with neolink... Can you reproduce without docker as I am beginning to suspect this is tied to docker in some way....


also crying in the corner
TwT

@m1k1o
Copy link
Contributor

m1k1o commented Aug 8, 2020

@QuantumEntangledAndy I might spawn VM and try it like this, when I have some free time, I'll keep you updated.

@thirtythreeforty thirtythreeforty changed the title Segmentation fault Segmentation fault when running in Docker Sep 16, 2020
@ruimarinho
Copy link

Hi! Quick update from my side having tested neolink for the first time today and in docker. I can confirm a segmention fault when TCP is used. By default, VLC will try to first connect via UDP and if that fails, TCP is attempted, causing neolink to crash.

One workaround is to open multiple ports as suggested by #46 (comment), but this is a guess game. Another is to run in --network=host where port binding isn't used.

Lastly, when using UDP, the 4K stream sometimes has intermittent grey key frames. Even by tuning the UDP buffers as suggested on another issue, I am unable to completely avoid them. On the command line, here's what I see when this happens:

[hevc @ 0x7fb7ee0b2200] Could not find ref with POC 0
[hevc @ 0x7fb7ee0b9c00] Could not find ref with POC 4
[hevc @ 0x7fb7ee0bdc00] Could not find ref with POC 7
[hevc @ 0x7fb7ee09e600] Could not find ref with POC 11
[hevc @ 0x7fb7ee0ae200] Could not find ref with POC 13
[hevc @ 0x7fb7ee03de00] Could not find ref with POC 17
[hevc @ 0x7fb7ee0b9c00] Could not find ref with POC 23
[hevc @ 0x7fb7ee0ae200] Could not find ref with POC 29
[hevc @ 0x7fb7ee0c5800] Could not find ref with POC 32
[hevc @ 0x7fb7ee0bcc00] Could not find ref with POC 36
[hevc @ 0x7fb7ee099c00] Could not find ref with POC 38
[hevc @ 0x7fb7ee0b2200] Could not find ref with POC 3
[hevc @ 0x7fb7ee0b9c00] Could not find ref with POC 6
[hevc @ 0x7fb7ee060200] Could not find ref with POC 10
[hevc @ 0x7fb7ee09e600] Could not find ref with POC 12
[hevc @ 0x7fb7ee0ae200] Could not find ref with POC 14
[hevc @ 0x7fb7ee0c5800] Could not find ref with POC 16
[hevc @ 0x7fb7ee03de00] Could not find ref with POC 18

Hopefully it will let you pinpoint the issue more easily!

@m1k1o
Copy link
Contributor

m1k1o commented Nov 13, 2020

I managed to get rid of grey key frames, but I am seeing such lines sometimes. I am suspecting, that video it is somehow incorrectly parsed and maybe the audio is causing disturbance. It is just an idea, unfortunately I am not able to proove that.

cam

@Homas
Copy link

Homas commented Sep 11, 2021

I've the same issue with default docker image and D800 (h265 only).
I've pulled quantumentangledandy/neolink:udp_docker image mentioned in #157
Looks like it works with the new image.

@rhatguy
Copy link

rhatguy commented Sep 23, 2021

This branch does "work" on my 410-5mp. It doesn't seem very stable though. I'll keep trying a few things, but my substreams are not connecting and the main stream frequently won't load. It does "work" though.

@QuantumEntangledAndy
Copy link
Collaborator

I see, some performance issues then. But at least this specific issue with the segfault seems fixed.

5MP I think is a 1920p, which may be a bit much to handle simultaneously with both the HD and SD. We try to keep neolink as optimal as we can and just grap the stream and pass it to the rtsp as best we can but perhaps there's room for improvement.

Currently we stream all the time even when noone is listening to the rtsp. Pulling both the SD and HD might be too much. I have a branch in the works that will only stream on motion and thinking about it perhaps I can make it only pull the stream from the camera when an rtsp client is active. That might help with this sort of issue. I'll have a go when I get time.

@0dragosh
Copy link

I'm getting the same error in Docker, using the latest tag as per the docs.

debug logs:

"xmlns": "http://www.w3.org/2000/xmlns/"})
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched Characters(0)
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched EndElement(recording)
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched StartElement(timeStamp, {"": "", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"})
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched Characters(0)
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched EndElement(timeStamp)
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched EndElement(AlarmEvent)
[2022-05-18T08:57:41Z DEBUG yaserde::de] Fetched EndElement(AlarmEventList)
[2022-05-18T08:57:41Z DEBUG neolink_core::bc_protocol::connection::bcconn] Ignoring uninteresting message ID 33
Segmentation fault (core dumped)

Is there anything we can do? is this related to the UDP ports thing? would something like network mode host solve it?

@QuantumEntangledAndy
Copy link
Collaborator

@0dragosh it's a known docker issue that doesn't seem to replicate when you run it direct. I suspect it's happening in gstreamer upstream as all our code is safe rust which is quite hard to segfault. We've found that if you use a Debian based docker image the issue goes away (perhaps because of the gstreamer version difference) there's a PR that swaps to this image but it hasn't been merged.

@0dragosh
Copy link

@0dragosh it's a known docker issue that doesn't seem to replicate when you run it direct. I suspect it's happening in gstreamer upstream as all our code is safe rust which is quite hard to segfault. We've found that if you use a Debian based docker image the issue goes away (perhaps because of the gstreamer version difference) there's a PR that swaps to this image but it hasn't been merged.

@QuantumEntangledAndy Thanks! What's the versioning like? I see two tags on master but the docker image is all over the place.

@QuantumEntangledAndy
Copy link
Collaborator

There's a docker image per branch. So if you say want to use mqtt branch you'd pull the image with the same name. I think I've got a recent buster version on my fork but I'm not sure.

@0dragosh
Copy link

It would be great if someone were to document this, 5 lines of Markdown is enough. It's very confusing for someone that just stumbled upon the app and wants to deploy.

@QuantumEntangledAndy
Copy link
Collaborator

I can understand that it's a bit over the place. Id prefer some more formal version scheming and tagging with releases.

The docker image was never our intended delivery method though. It's something a fellow end user setup and shared with us all and we've kinda just rolled with it.

I've had to learn all sort of docker stuff just to support the issues people have with using it, when running it on bare metal works just fine. The docker image is a whole other level of abstraction with managing NATs and ports that can confuse all sorts of things. Anyways we do what we can.

@0dragosh
Copy link

I am pretty well versed in docker/kubernetes, I can help with that. But I'm not an expert on rtsp/gstreamer/reolink/neolink, it would be nice to have a starting point.

@thirtythreeforty
Copy link
Owner

@QuantumEntangledAndy do you know of any reason we shouldn't use a Debian based image? I only thought Alpine was used because it ended up being (much) smaller. But I'd rather "larger and working" over "small but crashy."

I'd like to close out all the tickets for this, since this seems to be a recurring thorn in users' sides. Happy to provide a new Dockerfile.

@QuantumEntangledAndy
Copy link
Collaborator

Debian is find and preferred for me. Only issue with the debian-docker PR was that it was using musl which I think might not be necessary.

QuantumEntangledAndy referenced this issue in kevin-david/neolink May 19, 2023
…feature/UidAddr

Permit connect with UID at an Addr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants