Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_UNKNOWN: unknown error #122

Open
vincentvic opened this issue Dec 19, 2024 · 26 comments
Open

CUDA_ERROR_UNKNOWN: unknown error #122

vincentvic opened this issue Dec 19, 2024 · 26 comments

Comments

@vincentvic
Copy link

vincentvic commented Dec 19, 2024

Hi !

When i'm running this part of code:

pyDec = vali.PyDecoder(
    url,
    CONFIG_FFMPEG,
    gpu_id=0)
pkt_data = vali.PacketData()
frame_idx = 0
while True:
    #NV12 surface
    success, details = pyDec.DecodeSingleSurface(surf_nv12, pkt_data)

I have this error with this message, what does it mean ?

[h264_cuvid @ 0x558a8aac76c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
Error while sending a packet to the decoder. Error description: Generic error in an external library

@RomanArzumanyan
Copy link
Owner

Hi @vincentvic

That’s FFmpeg struggling to communicate to the gpu driver.

Please check if the driver is available by running nvidia-smi. If using docker make sure that you run the image with video driver capabilities.

@vincentvic
Copy link
Author

vincentvic commented Dec 19, 2024

No I do not think that it comes from the driver, because I run the exact same script with a different video and it works.
And if i re-encode the video with command it also works
ffmpeg -i <input.mp4> -c:v h264_nvenc -preset slow -crf 22 'output.mp4'

Thanks

@RomanArzumanyan
Copy link
Owner

Thanks for the update @vincentvic

If the error is specific to a particular video I assume it’s not fully conformant with H.264 standard or there’s a bug somewhere within ffmpeg / video codec sdk.

As a workaround I can recommend you to catch exceptions from decoder and re-create it in SW mode. For that you simply need to use gpu_id=-1. SW decoder is often more resilient to problematic videos.

@vincentvic
Copy link
Author

Thanks, it gives me now, a clear explanation, the pyDec.Format is in Yuv420 rather than NV12

@RomanArzumanyan
Copy link
Owner

@vincentvic

Nvdec native format is nv12, sw decoder outputs in yuv420.

@vincentvic
Copy link
Author

vincentvic commented Dec 19, 2024

Indeed, I did not find the reason why it does not work for this video for now.

Do you have any idea of what rules or compliance (h264) can lead to the error ?

Thanks a lot

@RomanArzumanyan
Copy link
Owner

@vincentvic

Do you have any idea of what rules or compliance (h264) can lead to the error ?

If you're interested in finding out what's possibly wrong with the video, I invite you to move this topic to discussions.

However, if you need to process multitude of files in production environment, that won't be very helpful and the simplest approach would be to decode problematic videos with SW decoder. E. g. like that

# Please don't just copy-paste this code.
# It was never properly debugged and only serve as sample.

def decode_impl(py_dec, dec_frame, dec_surf, seek_once = -1):
    seek_ctx = None
    if seek_once != -1:
        seek_ctx = vali.SeekContext(seek_once)

    if py_dec.IsAccelerated:        
        return py_dec.DecodeSingleSurface(dec_surf, seek_ctx)
    else:
        return py_dec.DecodeSingleFrame(dec_frame, seek_ctx)

def decode(py_dec, dec_frame, dec_surf, seek_once = -1):
    frame_idx = 0
    success = True

    if seek_once != -1:
        success, details = decode_impl(py_dec, dec_frame, dec_surf, seek_once)
        if success:
            frame_idx += 1

    while success:
        success, details = decode_impl(py_dec, dec_frame, dec_surf)
        if success:
            frame_idx += 1

    return frame_idx

py_dec = vali.PyDecoder(url, {},  gpu_id=0)
surf = vali.Surface.Make(pyDec.Format, pyDec.Width, pyDec.Height, gpu_id=0)
frame = np.ndarray(dtype=np.uint8, shape=(surf.HostSize))
curr_frame = 0

try:
    # Try to decode file as normal
    curr_frame = decode(py_dec, frame, surf)

except Exception as e:
    # Re-create decoder in SW mode, seek to last decoded frame, continue
    py_dec = vali.PyDecoder(url, {}, gpu_id=-1)
    decode(py_dec, frame, surf, curr_frame)

@vincentvic
Copy link
Author

Hi!

I still have the same error message with this video..

[h264_cuvid @ 0x558a8aac76c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
Error while sending a packet to the decoder. Error description: Generic error in an external library

@vincentvic
Copy link
Author

Is it possible that it comes from the opts parameters ?

CONFIG_FFMPEG = {
'codec': 'h264',
'hwaccel_output_format': 'cuda',
'hwaccel': 'cuda',
'ignore_editlist': 'true',
'preset': 'hq',
}
pyDec = vali.PyDecoder(url, opts=CONFIG_FFMPEG, gpu_id=0)

@RomanArzumanyan
Copy link
Owner

Hi @vincentvic

According to log message you’re still using gpu decoder. Pass gpu_id=-1 to use SW decoder instead.

@vincentvic
Copy link
Author

vincentvic commented Dec 20, 2024

`pyDec = vali.PyDecoder(url, opts=CONFIG_FFMPEG, gpu_id=-1)
print(pyDec.Format)
surf_yuv = vali.Surface.Make(format=vali.PixelFormat.YUV420, width=pyDec.Width, height=pyDec.Height, gpu_id=0)
pkt_data = vali.PacketData()
while True:
    success, details = pyDec.DecodeSingleSurface(surf_yuv, pkt_data)
    if not success:
        print(success)
        print(details)
    break`

it does not give me any information, i do not really understand the goal to pass the gpu_id to "-1"

PixelFormat.YUV420
False
TaskExecInfo.SUCCESS

@vincentvic
Copy link
Author

Can we desactivate the decoder cuvid ?

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Dec 20, 2024

Is it possible that it comes from the opts parameters ?

CONFIG_FFMPEG = { 'codec': 'h264', 'hwaccel_output_format': 'cuda', 'hwaccel': 'cuda', 'ignore_editlist': 'true', 'preset': 'hq', } pyDec = vali.PyDecoder(url, opts=CONFIG_FFMPEG, gpu_id=0)

@vincentvic

You don't need most of those options.
VALI will automatically choose HW decoding options, you just need to pass gpu_id.
Take a look at the decoding sample: https://github.com/RomanArzumanyan/VALI/blob/main/samples/sample_decode_show.ipynb

Just pass actual gpu id for HW decoding of -1 for SW decoding, that's it.

@vincentvic
Copy link
Author

I test two commands with ffmpeg on the video, the first one works well, but not the second with the same error message.
ffmpeg -hwaccel cuda -c:v h264 -i input.mp4 output.mp4

ffmpeg -loglevel verbose -hwaccel cuda -c:v h264_cuvid -i input.mp4 output.mp4

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Dec 20, 2024

@vincentvic

That's an interesting observation, but I'm not sure if it's relevant to the PyDecoder behavior.

  • -hwaccel cuda means 'keep decoded frames in vRAM`
  • output.mp4 without specifying the encoder means "guess the most relevant encoder". Most probably, ffmpeg will choose libx264 or whatever else it has available which is compatible with MP4 container. So decoded frames will be kept in vRAM, then downloaded to RAM, then given to encoder.

VALI works basically like this ffmpeg -hwaccel cuda -c:v h264 -i input.mp4

  • If gpu_id is meaningfull, VALI will automatically select proper decoder accelerated by Nvdec and will keep decoded frames in vRAM as Surfaces.

Decoder selection happens here:

static const std::map<AVCodecID, std::string>
hwaccel_codecs({std::make_pair(AV_CODEC_ID_AV1, "av1_cuvid"),
std::make_pair(AV_CODEC_ID_HEVC, "hevc_cuvid"),
std::make_pair(AV_CODEC_ID_H264, "h264_cuvid"),
std::make_pair(AV_CODEC_ID_MJPEG, "mjpeg_cuvid"),
std::make_pair(AV_CODEC_ID_MPEG1VIDEO, "mpeg1_cuvid"),
std::make_pair(AV_CODEC_ID_MPEG2VIDEO, "mpeg2_cuvid"),
std::make_pair(AV_CODEC_ID_MPEG4, "mpeg4_cuvid"),
std::make_pair(AV_CODEC_ID_VP8, "vp8_cuvid"),
std::make_pair(AV_CODEC_ID_VP9, "vp9_cuvid"),
std::make_pair(AV_CODEC_ID_VC1, "vc1_cuvid")});

@vincentvic
Copy link
Author

vincentvic commented Dec 20, 2024

Yes so it's logical that the error appear because vali will use the h264_cuvid decoder if I am refer to your screen of code ?
But it does not give the reason..

@RomanArzumanyan
Copy link
Owner

@vincentvic

Yes, you see same errors produced both by ffmpeg and VALI because VALI relies on ffmpeg decoder.

@vincentvic
Copy link
Author

Sorry, last point, I notice that all videos that have a problem have metadata on the first frame with a pkt_dts egal to 1536 and a pkt_dts egal to 512. Is it a coincidence ?

I use this command to get the information

command = [ 'ffprobe', '-i', url, '-show_entries', 'frames', '-print_format', 'json', '-select_streams', 'v:0', '-read_intervals', '%+1' ]

@RomanArzumanyan
Copy link
Owner

@vincentvic

DTS is decode time stamp. It's the moment of time in stream time base units when the packet is to be decoded.
PTS is presentation time stamp. Similar thing but it describes the time decoded frame shall be presented to user (shown in video player etc.).

If there's a frame reordering (e. g. B frames are there), PTS and DTS of same packet may be different.
DTS shall increase monotonically and it doesn't have to start from zero.

So values of 512 and 1536 don't tell much by themselves.

@vincentvic
Copy link
Author

vincentvic commented Jan 15, 2025

Hello,

First Happy New Year !!
I think, I have probably found why some videos does not work with this message.

[h264` @ 0x56293b426c80] decoder->cvdl->cuvidCreateDecoder(&decoder->decoder, params) failed -> CUDA_ERROR _INVALID_VALUE: invalid argument [h264 @ 0x56293b426c80] Using more than 32 (33) decode surfaces might cause nvdec to fail. [h264 @ 0x56293b426c80] Try lowering the amount of threads. Using 5 right now. [h264 @ 0x56293b426c80] Failed setup for format cuda: hwaccel initialisation returned error.

Do you what does it mean exactly and if we can fix it ?
Thanks a lot

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jan 15, 2025

Hi @vincentvic

I think, I have probably found why some videos does not work with this message.

Did you get this exact message from VALI error logs ? If so, under what conditions ?
I'm a bit surprised, let me explain below.

Do you what does it mean exactly and if we can fix it ?

VALI uses cuvid decoder path within libavcodec which isn't similar to nvdec.
Some time ago I actually submitted the patch to ffmpeg that sets up minimal possible amount of surfaces to be allocated for decoder internal pool:

https://github.com/FFmpeg/FFmpeg/blob/4f3c9f2f03378a08692a26532bc3146414717f8c/libavcodec/cuviddec.c#L320

    fifo_size_inc = ctx->nb_surfaces;
    ctx->nb_surfaces = FFMAX(ctx->nb_surfaces, format->min_num_decode_surfaces + 3);

    if (avctx->extra_hw_frames > 0)
        ctx->nb_surfaces += avctx->extra_hw_frames;

What happens here is cuvid takes minimal amount of surfaces required to store in DPB and adds 3 extra surfaces to deal with async stuff (doing the other way will harm the performance).

To my best knowledge, high H.264 / H.265 levels and tiers require up to 16 decoded frames in internal buffer, so the overall amount shall not go higher then 19.

@vincentvic
Copy link
Author

Hi!

I have this specifc message when I try to re-encode with this command :
ffmpeg -i <input.mp4> -c:v h264_nvenc -preset slow -crf 22 'output.mp4'

because vali crash at the first frame with this error message
[h264_cuvid @ 0x558a8aac76c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error

In re-encoding the video, it works but the messsage of ffmpeg with the number of decode surfaces seem to be very linked to the error in vali.

@RomanArzumanyan
Copy link
Owner

@vincentvic

I'm afraid the discussion is a bit derailed.
Can you provide me with MVP that illustrates the erroneous VALI behavior ?
I'll try to repro on my machine.

@vincentvic
Copy link
Author

vincentvic commented Jan 17, 2025

Hi @RomanArzumanyan

This is the script that I'm trying to run. I do not know if can share you a small cut of the video "input.mp4"?

import python_vali as vali

class` StopExecution(Exception):
    def _render_traceback_(self):
        return []

CONFIG_FFMPEG = {
    'codec': 'h264',
    'hwaccel_output_format': 'cuda',
    'hwaccel': 'cuda',
    'ignore_editlist': 'true',
    'preset': 'hq',
}
pyDec = vali.PyDecoder('./input.mp4', CONFIG_FFMPEG, gpu_id=0)
surf_nv12 = vali.Surface.Make(format=pyDec.Format, width=pyDec.Width, height=pyDec.Height, gpu_id=0)
surf_yuv = vali.Surface.Make(format=vali.PixelFormat.YUV420, width=pyDec.Width, height=pyDec.Height, gpu_id=0)
surf_rgb = vali.Surface.Make(format=vali.PixelFormat.RGB, width=pyDec.Width, height=pyDec.Height, gpu_id=0)
surf_pln = vali.Surface.Make(format=vali.PixelFormat.RGB_PLANAR, width=pyDec.Width, height=pyDec.Height, gpu_id=0)
to_yuv = vali.PySurfaceConverter(vali.PixelFormat.NV12, vali.PixelFormat.YUV420, gpu_id=0)
to_rgb = vali.PySurfaceConverter(vali.PixelFormat.YUV420, vali.PixelFormat.RGB, gpu_id=0)
to_pln = vali.PySurfaceConverter(vali.PixelFormat.RGB, vali.PixelFormat.RGB_PLANAR, gpu_id=0)
cc_ctx = vali.ColorspaceConversionContext(vali.ColorSpace.BT_601, vali.ColorRange.MPEG)

pkt_data = vali.PacketData()
frame_idx = 0
while True:
    # NV12 surface
    success, details = pyDec.DecodeSingleSurface(surf_nv12, pkt_data,)
    if not success:
        raise VideoError(f'At frame {frame_idx}: {details} => need to analyse/re-encode')

    # NV12 -> YUV420
    success, details = to_yuv.Run(surf_nv12, surf_yuv, cc_ctx)
    if not success:
        raise StopExecution
    # YUV420 -> RGB
    success, details = to_rgb.Run(surf_yuv, surf_rgb, cc_ctx)
    if not success:
        raise StopExecution
    # RGB -> RGB Planar
    success, details = to_pln.Run(surf_rgb, surf_pln, cc_ctx)
    if not success:
        raise StopExecution

This python script return this error:
[h264_cuvid @ 0x55a737be82c0] ctx->cvdl->cuvidParseVideoData(ctx->cuparser, &cupkt) failed -> CUDA_ERROR_UNKNOWN: unknown error
Error while sending a packet to the decoder. Error description: Generic error in an external library

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jan 17, 2025

Hi @vincentvic

To start with, your CONFIG_FFMPEG parameters are really unusual.
Let me explain:

CONFIG_FFMPEG = {
    # You don't need these 3 lines. They are ffmpeg-specific. VALI will do that under the hood for you.
    'codec': 'h264',
    'hwaccel_output_format': 'cuda',
    'hwaccel': 'cuda',
    # No comments on this, don't know the meaning.
    'ignore_editlist': 'true',
    # This is encoder preset. No need to pass it to decoder.
    'preset': 'hq',
}

Please clean them up and re-check

@vincentvic
Copy link
Author

vincentvic commented Jan 17, 2025

We need the ignore_editlist parameter in our case but indeed we can comment the others but it does not change anything in the error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants