Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter data sent to gsteamer using magic bytes #50

Merged

Conversation

QuantumEntangledAndy
Copy link
Collaborator

The data sent to gstreamer includes non video related data, such as audio. This causes some players to have to perform extra work to filter out invalid packets (see #49).

This PR uses the headers of each packet to deduce its types and the appropriate action.

fixes #49

Proving that the same headers are used throughout the reolink family this should work for all, but it would be best to test on a few more cameras.

@hh1209
Copy link
Contributor

hh1209 commented Jul 28, 2020

Would it be feasible to stream the "filtered out" audio to an RTSP audio stream?

@QuantumEntangledAndy
Copy link
Collaborator Author

Yes that is my eventual plan. First step filter out from the video. Second step put it into own stream. I'm working on it, no guarantee though.

@thirtythreeforty
Copy link
Owner

thirtythreeforty commented Jul 29, 2020

Nice sleuthing. If this is the case (and you are probably completely right) then the BC protocol has gone from "obnoxious" to "exceedingly terrible." Why on earth wouldn't you make this two separate message types?

Would you mind explaining each of the magic values in a comment? Even just something like the following would be helpful:

/// Video I-Frame: signals start of H264/H265 I-Frame

etc.

I'm also curious about how you learned these. In particular the "binary packet with no header" is really annoying from an implementation standpoint, since we don't actually know what kind it is except "probably it goes with the last data." This "in band signaling" is problematic for correctness in case a continuation packet happens to have the same magic bytes as a new packet. Is there any indication of how much data goes with each magic value - kind of a "sub packet"? That way the parser could keep track of how much more data goes with the previous video packet.

It's okay if you don't know the answers - that's part of reverse engineering :)

Regarding the implementation, it might be better if the byte parsing was moved to the bc module. Maybe ModernMsg should not contain Vec<u8> but instead contain an enum of VideoData, AudioData, etc:

pub enum BinaryData {
    VideoData(Vec<u8>),
    AudioData(Vec<u8>),
    // do we need Unknown(Vec<u8>) here?
}

pub struct ModernMsg {
    pub xml: Option<BcXml>,
    pub binary: Option<BinaryData>,
}

and then bc_protocol can simply match on the type.

@QuantumEntangledAndy
Copy link
Collaborator Author

The magic codes are based of the work of @twistedddx in issue #4. The format he described matached my own investigations and I worked it up into this PR.

With regards to extra info. After each magic header there is data such as the number of expected bytes in the packet. I believe that if this number is >40000 it gets chunked. We could set it up to expect this chunking but it seems like a lot of extra work.

After the magick header in the video stream there is also the text H264. We may be able to use this to auto detect the format.

@QuantumEntangledAndy
Copy link
Collaborator Author

QuantumEntangledAndy commented Jul 29, 2020

With regards to understandig why no header is a continued packet:
I had the program only print out unknown headers and looked for patterns. It was mostly random without a pattern indicating that it was just data. I then made it print know, unknown and size (its current print statements) and it became clear that this unknown was large packets that were chunked after the video.

Also if I told it to not pass the unknown headers to the video stream and tried to play it only the top half of the video was valid indicating that the later half of the frame was missing.

@QuantumEntangledAndy
Copy link
Collaborator Author

Perhaps this will ultimately require a more complex structure for each type to parse correctly.

E.g. In video type We extract the data such as video frame size and type. I also think that the PTS and DTS are in the header so we could maybe attach proper time stamps.

@QuantumEntangledAndy
Copy link
Collaborator Author

It is probably good for us that they don't use separate header for continuation.

I believe most players search the stream for the nal unit separators. info on h264 stream

This is why passing all the data is currently working. Our VLC and ffmpegs are scanning the streaming for the NAL start codes then playing assuming all the rest (until the next NAL unit start) is the video data. If there was extra data mid NAL unit we would probably be seeing artifacts in the video.

@QuantumEntangledAndy
Copy link
Collaborator Author

I've gotten the basics for putting it into the enum you mentioned. Still have to refractor more if we want to properly handle that overflow issue you mentioned

@twistedddx
Copy link
Contributor

Something that may be worth knowing before it costs you hours of time:
The I and P frames listed data size is up to 16 bytes short of the actual data.
Eg header says 234567 bytes but then I found 234574 bytes of actual data followed.
If you stop writing at 234567 bytes you can get video distortion at the bottom of effected frames.
I observe this in the temp file the official client writes to disk when exporting.

Now that I see these chunked packets during transport maybe the difference is that the official client incorrectly picks up some bytes each time the data was chunked? Maybe there is a continuation byte marker that the client doesn't remove from the video frame.

@QuantumEntangledAndy
Copy link
Collaborator Author

Thanks you for the info I will keep that in mind when I get around to keeping track of the sizes. But at it's currently implementation it is lazy and dosent worry about it. If there is a continuation marker maybe we are getting small artifacts... Hmmm

@QuantumEntangledAndy
Copy link
Collaborator Author

I'm wondering if filtering all this out will help improve the UDP connections.

@QuantumEntangledAndy
Copy link
Collaborator Author

QuantumEntangledAndy commented Jul 30, 2020

@thirtythreeforty could you help level up my rust here:

My current implementation of parsing more structure from the video data looks like this

Click to expand
#[derive(Debug, PartialEq, Eq)]
pub struct VideoIFrame {
    pub magic: Vec<u8>,
    pub video_type: Vec<u8>,
    pub data_size: Vec<u8>,
    pub unknowna: Vec<u8>,
    pub timestamp: Vec<u8>,
    pub unknownb: Vec<u8>,
    pub clocktime: Vec<u8>,
    pub unknownc: Vec<u8>,
    pub video_data: Vec<u8>,
}

impl VideoIFrame {
    pub fn from_binary(binary: &[u8]) -> VideoIFrame {
        VideoIFrame {
            magic: binary[0..4].to_vec(),
            video_type: binary[4..8].to_vec(),
            data_size: binary[8..12].to_vec(),
            unknowna: binary[12..16].to_vec(),
            timestamp: binary[16..20].to_vec(),
            unknownb: binary[20..24].to_vec(),
            clocktime: binary[24..28].to_vec(),
            unknownc: binary[28..32].to_vec(),
            video_data: binary[32..].to_vec(),
        }
    }

    pub fn len(&self) -> usize {
        self.video_data.len() + 32
    }

    pub fn as_slice(&self) -> &[u8] {
        let mut result = self.magic.clone();
        result.extend(&self.video_type);
        result.extend(&self.data_size);
        result.extend(&self.unknowna);
        result.extend(&self.timestamp);
        result.extend(&self.unknownb);
        result.extend(&self.clocktime);
        result.extend(&self.unknownc);
        result.extend(&self.video_data);
        result.as_slice()
    }
}

#[derive(Debug, PartialEq, Eq)]
pub struct VideoPFrame {
    pub magic: Vec<u8>,
    pub video_type: Vec<u8>,
    pub data_size: Vec<u8>,
    pub unknowna: Vec<u8>,
    pub timestamp: Vec<u8>,
    pub unknownb: Vec<u8>,
    pub video_data: Vec<u8>,
}

impl VideoPFrame {
    pub fn from_binary(binary: &[u8]) -> VideoPFrame {
        VideoPFrame {
            magic: binary[0..4].to_vec(),
            video_type: binary[4..8].to_vec(),
            data_size: binary[8..12].to_vec(),
            unknowna: binary[12..16].to_vec(),
            timestamp: binary[16..20].to_vec(),
            unknownb: binary[20..24].to_vec(),
            video_data: binary[24..].to_vec(),
        }
    }

    pub fn len(&self) -> usize {
        self.video_data.len() + 24
    }

    pub fn as_slice(&self) -> &[u8] {
        let mut result = self.magic.clone();
        result.extend(&self.video_type);
        result.extend(&self.data_size);
        result.extend(&self.unknowna);
        result.extend(&self.timestamp);
        result.extend(&self.unknownb);
        result.extend(&self.video_data);
        result.as_slice()
    }
}

#[derive(Debug, PartialEq, Eq)]
pub enum VideoFrame {
    IFrame(VideoIFrame),
    PFrame(VideoPFrame),
}

impl VideoFrame {
    pub fn len(&self) -> usize {
        match self {
            VideoFrame::IFrame(binary) => binary.len(),
            VideoFrame::PFrame(binary) => binary.len(),
        }
    }

    pub fn as_slice(&self) -> &[u8] {
        match self {
            VideoFrame::IFrame(binary) => binary.as_slice(),
            VideoFrame::PFrame(binary) => binary.as_slice(),
        }
    }
}

#[derive(Debug, PartialEq, Eq)]
pub enum BinaryData {
    VideoData(VideoFrame),
    AudioData(Vec<u8>),
    InfoData(Vec<u8>),
    Unknown(Vec<u8>),
}

The trouble I am having is with the as_slice (used as part of the serialisation and the passing to gstreamer). I think this function is meant to represent the data a single continuous piece, but when the struct is constructed I break it up. In as_slice I try to reconstruct it as a single piece using extends. But it complains about how I am returning a loacally variable as a borrow

returns a value referencing data owned by the current function

I am thinking that I could instead write it as something like this:

pub struct VideoPFrame {
    pub binary Vec<u8>,
}

impl VideoPFrame {
    pub fn from_binary(binary: &[u8]) -> VideoPFrame {
        VideoPFrame {
            binary: binary.to_vec(),
        }
    }
   pub fn magic(&self) -> &[u8] {
      self.binary[0..4]
   }
   pub fn video_type(&self) -> &[u8] {
      self.binary[4..8]
   }

Instead but it doesn't feel as elegent. I suppose on the plus side though we could make:

enum VideoType {
   H264,
   H265,
   Unknown,
}
pub fn video_type(&self) -> VideoType {
      const h264_magic: &[u8] = &[0x48, 0x32, 0x36, 0x34];
      const h265_magic: &[u8] = &[0x48, 0x32, 0x36, 0x35];
      match self.binary[4..8] {
         h264_magic => VideoType::H264,
         h265_magic => VideoType::H265,
         _ => VideoType::Unkown,
      }
   }

What do you think would be best?

@QuantumEntangledAndy
Copy link
Collaborator Author

I'll push my work so far and you can see it in the wild but it will fail to compile

@QuantumEntangledAndy
Copy link
Collaborator Author

So I decided to go with the getters method where the struct just stores it as raw and then there are methods to get the data from it.

I am now having fun trying to get the correct size info, there seems to be some extra bytes somewhere when the frame is split

@QuantumEntangledAndy
Copy link
Collaborator Author

I am having a sinking feeling about the extra bytes. The h264 format should have a start code of 0, 0, 1 the bc seems to have 0, 0, 0, 1. I am wondering if it is null terminating things.... I really hope not but testing now

@QuantumEntangledAndy
Copy link
Collaborator Author

Ok on the plus side it seems unrelated to the number of start code... on the negative side why are the start codes having an extra 0?

@QuantumEntangledAndy
Copy link
Collaborator Author

QuantumEntangledAndy commented Jul 30, 2020

So the number of extra bytes in the packet frame seems to be arbitary I have seen 0, 3, 4, 5 and 6 extra bytes without a pattern. This number of extra bytes does not scale linearly with the number of chunks indicating not a continuation marker.

Ok this seems a bit dodgy but.....

The frames are split like this:

  • Full data 85000, chunked in 40000:
  • first packet: 50000
  • second packet: 400000
  • last packet: 400000

Example of weird count:

  • Expected bytes 54843
  • Bytes in first packet: 14848
  • Bytes in second packet: 40000
  • Total received bytes: 54848
  • Difference: 5

In otherwords the first packet has the odd number of bytes

I am wondering if their rounding logic for the count is incorrect somehow.

I could get it to work by....

  • Stopping when we get a negative number
  • Use this to get correct bytes: expected_bytes - modulo(expected, 40000) + bytes_in_first_packet

I have been looking at the beginning and end of each packet, with and without a header and there is no pattern that I can see. No numbers near the beginning or end of the packet correspond to anything like the packet size and the hexes are not ASCII either

@QuantumEntangledAndy
Copy link
Collaborator Author

QuantumEntangledAndy commented Jul 30, 2020

So I am having some success with the altered expected bytes expected_bytes - modulo(expected, 40000) + bytes_in_first_frame however everynow and then my trace messages seem to come out of order and then we get a panic because there seemed to be an audio packet mid chunked data. Are the messages from the camera coming in from some async source or perhaps the camera dropped some packet....?

@QuantumEntangledAndy
Copy link
Collaborator Author

QuantumEntangledAndy commented Jul 30, 2020

So at this point I'm feeling that BC will either:

  • drop the rest of the packets if it detects any of the other magic bytes. Even if it's not finished

  • assume only video is chunked and will process audio while waiting for next non magic code to fulfill the packet

@QuantumEntangledAndy
Copy link
Collaborator Author

So I have observed the start of another video frame even before the rest of the chucks are processed. So it seems to have dropped the packet for some reason.... Sigh

@twistedddx
Copy link
Contributor

twistedddx commented Jul 30, 2020

More things I have observed in files written to disk.
The files are coming from a NVR with 8 cameras camera's connected. I have had SWN-NHD805/806/818, RLC-410-4MP/5MP and RLC-520 cameras connected. Obviously Neolink is more interested in data coming direct from cameras rather than from NVR.
Swann 805 and 806 cameras are no longer connected but are available for testing.
Swann cameras use the ADPCM audio, Reolink use AAC. This appears to just be a change in firmware that Swann never released for their products.

Several frames from another camera can appear mid stream.
Eg Camera 5 data is exported and Camera 2 data for a few frames can appear mid file. I saw this maybe 3 times within 100 test files. I did have old and "unsupported" cameras connected which may have caused additional problems.
Different cameras types leave different finger prints in the h264 data because of different encode settings eg width, height, bitrate, profile etc.
For I frames byte 22 was significant and for P frames byte 5 was significant. If that byte changed, the data had come from a different model camera and I drop the data.

P or I frames may end prematurely. This sounds like your described BC stop processing a I or P frame if it is interrupted by the start of another frame.

Audio is not always a single frame at a time, but extremely often is.

BC pad the frames with zeros to byte align to the next 8 byte boundary. But rarely a frame does not start on a 8 byte boundary.

Basically there are no rules and data corruption is not uncommon.

@thirtythreeforty
Copy link
Owner

impl VideoIFrame {
    pub fn as_slice(&self) -> &[u8] {
        let mut result = self.magic.clone();
        result.extend(&self.video_type);
        result.extend(&self.data_size);
        result.extend(&self.unknowna);
        result.extend(&self.timestamp);
        result.extend(&self.unknownb);
        result.extend(&self.clocktime);
        result.extend(&self.unknownc);
        result.extend(&self.video_data);
        result.as_slice()
    }
}

The trouble I am having is with the as_slice (used as part of the serialisation and the passing to gstreamer). I think this function is meant to represent the data a single continuous piece, but when the struct is constructed I break it up. In as_slice I try to reconstruct it as a single piece using extends. But it complains about how I am returning a loacally variable as a borrow

Right - this is basically stopping you from returning a dangling pointer. To do this you would need to return a Vec<u8>, the "owned" version of a byte slice.

I am thinking that I could instead write it as something like this:

pub struct VideoPFrame {
    pub binary Vec<u8>,
}

impl VideoPFrame {
    pub fn from_binary(binary: &[u8]) -> VideoPFrame {
        VideoPFrame {
            binary: binary.to_vec(),
        }
    }
   pub fn magic(&self) -> &[u8] {
      self.binary[0..4]
   }
   pub fn video_type(&self) -> &[u8] {
      self.binary[4..8]
   }

Instead but it doesn't feel as elegent. What do you think would be best?

I would say the latter is more elegant. You're returning a reference with a lifetime based on the original argument's lifetime (a reference to borrowed data). Without lifetime elision, this looks like:

pub fn magic<'a>(&'a self) -> &'a [u8] {
    self.binary[0..4]
}

-- here 'a is the lifetime of self.

All this said, however, it looks like you're trending further toward writing a real parser, for which I'm currently using Nom. Although I'm not super happy with it, it does give you a couple niceties, like being able to operate directly on a stream. The deal with Nom is that you return functions ("combinators") that do the actual parsing. This lets the Nom library control parsing execution.

Don't worry too much about Nom for now, just wanted to put the bug in your ear. I'd say let's figure out the BC data stream format first, then we can worry about the implementation. I'll put more comments about the format in a second comment since this one is getting long...

@QuantumEntangledAndy
Copy link
Collaborator Author

I think I've got all your changes in please let me know what you think

@QuantumEntangledAndy
Copy link
Collaborator Author

Also can we remove

#![allow(dead_code)]
#![allow(unused_variables)]

from lib.rs?

Then the linter will pick up unused functions are vars.

@QuantumEntangledAndy
Copy link
Collaborator Author

Raw data may be useful especially if we have it from a few different camera, but for now just trace level should tell us when it finds a packet it dosen't recognise.

@QuantumEntangledAndy
Copy link
Collaborator Author

I'd like to remove the full header check function too. It is there in case of the very small chance that 4 bytes are not enough to pick out the header in the video data as suggested by @hh1209. However I've never had this as neccessary and my back of the envelope calculation suggests that the odds of it happening randomly are 1 in 4,228,250,625 bytes. So once every 4GB.The advance header check is only really needed during the recovery phase when we are advancing by one byte at a time because we found an invalid packet. 99% of the time (as in all the time in my hour long test except during camera disconnection caused by my WiFi) we are not in the recovery phase so the chance of getting a magic packet out of place is even more unlikely then 1 in 4GB.

Bottom line can we remove the advance header check and clean up code a bit more.

@twistedddx
Copy link
Contributor

I'd like to remove the full header check function too. It is there in case of the very small chance that 4 bytes are not enough to pick out the header in the video data as suggested by @hh1209. However I've never had this as neccessary and my back of the envelope calculation suggests that the odds of it happening randomly are 1 in 4,228,250,625 bytes. So once every 4GB.The advance header check is only really needed during the recovery phase when we are advancing by one byte at a time because we found an invalid packet. 99% of the time (as in all the time in my hour long test except during camera disconnection caused by my WiFi) we are not in the recovery phase so the chance of getting a magic packet out of place is even more unlikely then 1 in 4GB.

Bottom line can we remove the advance header check and clean up code a bit more.

That would of been me. My suggestion is born out of my approach and code is absolutely terrible. I am a data cabler by trade and have written some code in my spare time.
My code was 100% based on brute force searching for magic bytes.
Although in my defense my code didn't get the luxury of the BC network protocol keeping the frames mostly split up in sane ways.

I agree that because your code is not dependent on brute force searching every byte for magic bytes this increased resilience to false positives may be over the top.

@thirtythreeforty
Copy link
Owner

I'd like to remove the full header check function too. It is there in case of the very small chance that 4 bytes are not enough to pick out the header in the video data as suggested by @hh1209. However I've never had this as neccessary and my back of the envelope calculation suggests that the odds of it happening randomly are 1 in 4,228,250,625 bytes. So once every 4GB.The advance header check is only really needed during the recovery phase when we are advancing by one byte at a time because we found an invalid packet. 99% of the time (as in all the time in my hour long test except during camera disconnection caused by my WiFi) we are not in the recovery phase so the chance of getting a magic packet out of place is even more unlikely then 1 in 4GB.

Bottom line can we remove the advance header check and clean up code a bit more.

Yes, that sounds okay to me. The packet stream seems reasonably well-formed, and your code copes with the cases where it's not.

@thirtythreeforty
Copy link
Owner

thirtythreeforty commented Aug 12, 2020

Thank you for the changes removing the continuation kind. This looks great!

I would like to rescind my previous feedback about the grey frames and errors. I had managed to get my local Git client into a weird state where I thought I was up to date but I was actually running old code. The tip of this branch is butter smooth for me in ffplay and VLC*. So yes, make any final changes you would like and I am good to merge this!

* VLC in TCP mode, that is. Its buffer size is still too small for UDP.

@QuantumEntangledAndy
Copy link
Collaborator Author

QuantumEntangledAndy commented Aug 12, 2020

Ah yes sorry, that's why I shouldn't tag from memory. Anyways while I agree this is a good idea for brute forcing every byte, we don't use that as a method and mostly use the size field given in the header.

Also revised estimate on the odds as I forget to take into account multiple types of magic header

Magic Byte 1 Byte 2 Byte 3 Byte 4
InfoA 31 30 30 31
InfoB 31 30 30 32
AAC 30 35 77 62
ADP 30 30 77 62
Vid-I 30 30 64 63
Vid-P 30 31 63 63
------- ------- -------- -------- --------
Odds 2 in 255 3 in 255 4 in 255 4 in 255

Total Chance 1/255 * 3/255 * 4/255 * 4/255 = 1.13522E-08

Total 1 in 88,088,554 or 1 in 88MB

@QuantumEntangledAndy
Copy link
Collaborator Author

Ok so that's it then all updated! We also remove the Invalid Media Kind too since the only thing that really checked that was this brute forcing

@thirtythreeforty
Copy link
Owner

Once we realized it's an inner packet stream, have you actually seen any occurrences of truncated packets? We can take that probability to zero if it's parsing out well-formed packets :)

Also if there's a very very occasional truncated packet, the scan-for-magic-bytes behavior will smoothly pick the media packet stream back up after the corruption passes by.

@QuantumEntangledAndy
Copy link
Collaborator Author

I also need a larger buffer even for my E1, maybe that needs a note on the readme

@QuantumEntangledAndy
Copy link
Collaborator Author

I haven't actually seen a truncated packet even during my hour long test

@QuantumEntangledAndy
Copy link
Collaborator Author

Well actually I did see it but only during a reconnect

@thirtythreeforty
Copy link
Owner

One last change - can you add a warn!() to yell if we actually hit a truncated packet? If I never see any going forward, I am likely to remove that so there is no chance of accidentally truncating a packet that happens to start with magic bytes.

@thirtythreeforty
Copy link
Owner

thirtythreeforty commented Aug 12, 2020

During a reconnect, truncated packets would probably be okay. In these kinds of edge cases the Unknown code would kick in. Or the media packet code could simply swallow the garbage until it hits a header. Lots of mitigation behaviors are okay.

@QuantumEntangledAndy
Copy link
Collaborator Author

We already have mitigation set up where we just drop the buffer coming from the camera until we hit magic again, so I think that's enough for such cases

@thirtythreeforty
Copy link
Owner

Sweet. This has been quite the journey from "grep the extra junk out of the stream" to "rats there's another packet format" to "make a whole new parsing layer."

Thank you very much @twistedddx and @QuantumEntangledAndy

@thirtythreeforty thirtythreeforty merged commit 9f3181e into thirtythreeforty:master Aug 12, 2020
@QuantumEntangledAndy
Copy link
Collaborator Author

Yeah it's been quite a journey. But I just want to say "Yay" merged!

@thirtythreeforty
Copy link
Owner

I think audio will be a piece of cake now. If you have a branch you want to post, you can, or I can start on it

@QuantumEntangledAndy
Copy link
Collaborator Author

Yes audio and auto detect are done. I am just rebasing onto master

@QuantumEntangledAndy QuantumEntangledAndy deleted the magic_headers branch September 17, 2020 02:54
entropin pushed a commit to entropin/neolink that referenced this pull request Aug 18, 2021
…c_headers

Filter data sent to gsteamer using magic bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid NAL units reported by ffmpeg
4 participants