RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. #25

PCanavelli · 2025-02-06T18:12:20Z

Hey there,

First thing first: massive congrats, and even more massive thanks for JoyCaption. This is by far the best general-purpose captioner I have ever tried, especially for creating Flux datasets.

In a local Ubuntu environment (WSL 2, python 3.12), the model runs flawlessly. Unfortunately, I have been stuck with a very strange error when trying to run it from within a Docker image.

When calling the generate method, I get:

  File "/opt/src/captioning/oh/captioner.py", line 67, in __call__
    generate_ids = self.model.generate(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2228, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3209, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llava/modeling_llava.py", line 491, in forward
    inputs_embeds = self.get_input_embeddings()(input_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet.
If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details: https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html
If you're using Caffe2, Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.

This error is wild. I could only find a couple results for it, none of which have helpful solutions. It looks like it comes from the C back-end of Torch. This screams of CUDA / dependencies issues, but I haven't been able to find any details regarding the requirements / install best practices for JoyCaption.

Is this a known issue, and is there a recommended environment (OS + CUDA + dependencies) for running JoyCaption?

For more context: my Dockerfile builds on top of nvidia/cuda:12.4.0-runtime-ubuntu22.04, and my requirements.txt is:

accelerate==1.1.1
bitsandbytes==0.43.3
boto3==1.36.11
diffusers==0.30.2
fastapi==0.115.8
gputil==1.4.0
loguru==0.7.2
nest_asyncio==1.6.0
peft==0.14.0
protobuf==3.20.3
pydantic==2.10.6
pytest==7.4.4
pytest-cov==5.0.0
pytest-lazy-fixture==0.6.3
pytest-loguru==0.3.0
runpod==1.7.0
sentencepiece==0.2.0 
sentry-sdk==2.14.0
torch==2.5.1
torchaudio==2.5.1
torchvision==0.20.1
transformers==4.48.0
uvicorn==0.33.0

I'm omitting the bulk of my code / Dockerfile for brevity, since all those do is basically spin up a FastAPI app and forward requests to the captioner, but let me know if those could actually help.

Also, this happens regardless of the model version (alpha one / two), and doesn't happen when running a standard Llava model in the exact same Docker.

Cheers,

Pierre.

The text was updated successfully, but these errors were encountered:

fpgaminer · 2025-02-12T17:51:12Z

Yeah that's a weird one. The only difference between a standard llava model and JoyCaption would be the vision module, which uses so400m instead of openai clip. So maybe it's some weird implementation detail of so400m that's triggering a bug. In either case my gut reaction is that it is an issue with a specific dependency version or the docker container. All too common with PyTorch...

Try giving some of the containers here a try: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags
I've had better luck using those versus the ones hosted on docker hub, for whatever reason. But they tend to use bleeding edge versions, so you might have to pull a few versions to find one that works.

Could also try swapping in different versions of either the pytorch packages or the transformers package.

And finally it could be a mismatch between the cuda version running in the docker container versus the driver on the host system. Usually isn't an issue, but Windows might be more sensitive.

PCanavelli · 2025-02-13T18:57:21Z

Hey @fpgaminer, and thanks for the reply

Honestly: weirdest Torch bug I've ever had

I tried using one of the images you linked (nvcr.io/nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04) and regressing down to 12.4, still no change.

Any change to get a pip freeze of the environment you're using? This could save me quite some time trying out way too many version combinations for torch / transformers / diffusers / etc.

I'll gladly post a working Dockerfile + requirements when I get something to work, as I suspect I won't be the only one dealing with this issue when trying to deploy JoyCaption in a container

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. #25

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. #25

PCanavelli commented Feb 6, 2025

fpgaminer commented Feb 12, 2025

PCanavelli commented Feb 13, 2025

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. #25

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. #25

Comments

PCanavelli commented Feb 6, 2025

fpgaminer commented Feb 12, 2025

PCanavelli commented Feb 13, 2025