Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demuxer/DecodeSurfaceFromPacket coming back at some point? #55

Open
rcode6 opened this issue Jul 13, 2024 · 11 comments
Open

Demuxer/DecodeSurfaceFromPacket coming back at some point? #55

rcode6 opened this issue Jul 13, 2024 · 11 comments

Comments

@rcode6
Copy link

rcode6 commented Jul 13, 2024

Thank you so much for keeping this project going as VALI!

I'm currently using VPF in my project and was working on swapping over to VALI, but it looks like you've removed demuxing into packets & decoding from packets. Is that something you'd consider adding back in at some point? It's currently a large part of my processing pipeline.

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jul 13, 2024

Hi @rcode6

For now VALI doesn’t support demuxing.

I understand it’s usefulness for advanced users but unfortunately it leaves too many ways to e. g. break decoder internal state with seek to particular packet or to overflow vRAM when sending packets with multiple frames in each but receiving frames one by one.

I’m not against the whole demuxing idea, I just don’t have clear understanding of how to do it right. BTW can it be done with something like PyAV? Muxing isn’t computationally expensive and it doesn’t have HW support anyway, so any other alternatives to VALI may be just as good.

@rcode6
Copy link
Author

rcode6 commented Jul 13, 2024

Hi @RomanArzumanyan,

You're right that demuxing can be done with other libraries, but my goal is to still have the latter parts of the pipeline all on the GPU: decoding, pixel format conversions & resizing. That's a bit harder to manage. PyAV does demux into packets, but decoding always ends up on the cpu. And Nvidia's spinoff of VPF, PyNvVideoCodec doesn't do surface conversions or resizing.

I completely respect your decision to keep it simpler for users though! Would love it if you'd consider otherwise, since it's already happening behind the scenes.

I think, from reading PyNvVideoCodec's very sparse documentation, that they do support dlpack, so maybe I can use PyNvVideoCodec for demuxing & decoding, then use from_dlpack to jump into VALI for the rest of the work.

@RomanArzumanyan
Copy link
Owner

Hi @rcode6

Could you share your demux / decode use case? As far as I understand, the only difference between builtin and standalone modes in PyNvCodec was the ability to extract elementary stream and seek through the packets.

@rcode6
Copy link
Author

rcode6 commented Jul 14, 2024

Hi @RomanArzumanyan,

My project processes multiple live camera stream inputs for a security system, so demuxing and timestamping the feeds quickly and with minimal latency is important, and then packets are placed into a queue for slower decoding/processing later.

The processing thread pulls packets off the queue, then decodes the packets into frames, and does pixel format conversions and resizing before passing them on for further processing work. All that work is also done entirely on the gpu, without ever downloading to the host. There's a combination of selective image processing, motion detection, and object detection going on there, followed by possibly recording back to disk. This queue will fall behind for periods of time and catch back up later.

The primarily gain I get from the demuxing process is that I can keep the unprocessed video as packets on the host, which are basically the compressed video feed. If I were to instead decode earlier instead of just demuxing, the unprocessed video ends up being stored in the queue as decompressed surfaces in vram.

Another place I can get the same type of space savings is that if I want to keep the last 10 seconds of video footage in memory before a recording event, I can also just keep them as compressed packets on the host, instead of decompressed on the gpu. For instance, with a 30fps feed, that's 300 frames I can keep compressed to decode later, vs using up vram for 300 uncompressed surfaces in whatever resolution the feeds were in. So far, I've found that the extra load of periodically decoding packets twice to be minimal, while the vram savings have been huge.

In a nutshell, my demuxing use case just lets me save a lot of vram while avoiding to have to download/upload from device/host too much.

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jul 14, 2024

Thanks for the reply @rcode6

As far as I understand, one missing thing is the ability of PyDecoder to take some sort of file handle (or AVPacket directly) as input.

The rest is can be done with e. g. PyAV which will write demuxed packets to some queue. Am I missing something?

@rcode6
Copy link
Author

rcode6 commented Jul 14, 2024

Hi @RomanArzumanyan,

Yes, if there was a way for PyDecoder to take in AVPacket directly, that would work. Basically what DecodeSingleSurfaceFromPacket used to do.

I think the problem is what an AVPacket object in python would look like for PyDecoder. From my understanding, AVPacket isn't just the raw byte data for a single packet, it's also been parsed (pts, dts, flags, etc). If demuxing using PyAV, it generates av.packet.Packet objects. The raw bytes look to be exposed along with the individual properties, but then dts, pts, etc would need to be manually copied over (which sounds prone to a lot of user error, or possible struct changes), unless PyDecoder demuxes the raw bytes again (which negates the purpose of using another library).

I can't seem to think of an efficient way other than the same library being used for the demux and decode steps, with the AVPacket objects being exposed in between for the user to hold onto. Otherwise, using from_dlpack seems like the best option at the moment as a handover step if a separate library needs to be used.

@Yves33
Copy link
Contributor

Yves33 commented Sep 25, 2024

Upvoting for pyAV compatibility. Would be fantastic if we could directly décode AVpackets (and retrieve them from encoder).
At some point in VPF, what was lacking was bitstream filters in pyAV, but it seems that these have been incorporated now.

@rcode6
Copy link
Author

rcode6 commented Jan 25, 2025

Hi @RomanArzumanyan,

Would you consider having a way to decode using python_vali.PacketData as inputs? Either passing them to DecodeSingleSurface or even to PyDecoder prior to a regular decode call?

I realized that I don't need access to the demuxer since the decode functions do provide PacketData, so if there was just a way to feed those back in to a separate decoder instance that's all I would need.

Also, according to ffmpeg, AVPacket should only ever contain one compressed frame for video streams (https://ffmpeg.org/doxygen/3.2/structAVPacket.html):

For video, it should typically contain one compressed frame. For audio it may contain several compressed frames. Encoders are allowed to output empty packets, with no compressed data, containing only side data (e.g. to update some stream parameters at the end of encoding).

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jan 26, 2025

Hi @rcode6

Honestly I’d like to avoid that at all costs.

PyDecoder is already the single biggest and most complex class which takes ~70% of all tests in the codebase (60-something out of 90-something).

I understand the importance of the stand-alone demuxer, however I’d like not to expose private ffmpeg-specific AVPacker API as demuxer API.

E. g. how am I going to test if demuxer produces correct output ? Is it going to be annex.b for h264 / h265 / av1 or avcc for h264? What’s about vp8 / vp9 ?

How am I going to explain what’s wrong with someone’s demuxer output ? Will I need a bitstream analyzer for that ? Can I dump AVPacket to disk and open it with VLC?

So instead of all that I’ve added a constructor which accepts any Python object which has read attribute returning a byte array.

You can demux to a pipe with ffmpeg / pyav / any other tool and the give the pipe (file handle or your own adapter class - you name it) to PyDecoder as input.

What’s the difference you may ask? The answer is quite simple - I don’t have to cripple API. You call DecodeSingleSurface - you get it. No “return True in case of success or None if decoder needs more data” and such.

@rcode6
Copy link
Author

rcode6 commented Jan 26, 2025

Hi @RomanArzumanyan,

I completely understand. And thanks for pointing me to that other PyDecoder constructor, I think you're right and that could do the trick here.

Would you consider exposing the encoded bitstream data in PacketData?

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jan 26, 2025

Hi @rcode6

I think I should just rewrite the whole PyNvEncoder class the similar way it was done to PyDecoder with the ability to write to pipe or open file descriptor and proper mixing support.

Initial “bare annex.b output” design was fine back then, but now it becomes a headache.

WRT the PyDecoder ability to read from pipe or adapter - you can take a closer look at this feature in VALI samples or my smallish hobby project called Potion.

https://github.com/RomanArzumanyan/Potion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants