Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Assertion srcIndex < srcSelectDimSize failed #2971

Closed
Omegastick opened this issue Sep 19, 2023 · 9 comments
Closed

[Bug] Assertion srcIndex < srcSelectDimSize failed #2971

Omegastick opened this issue Sep 19, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@Omegastick
Copy link

Omegastick commented Sep 19, 2023

Describe the bug

Sometimes, XTTS inference will fail with a long list of ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. exceptions, followed by

It seems random, around 1 in 20 calls fail. Longer inputs seem more likely to fail, but I might be imagining it.

Once it fails one, the Python runtime has to be restarted. Any further attempts to use CUDA give RuntimeError: CUDA error: device-side assert triggered.

To Reproduce

Run the example code from the docs a few times:

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()

outputs = model.synthesize(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    config,
    speaker_wav="/data/TTS-public/_refclips/3.wav",
    gpt_cond_len=3,
    language="en",
)

Expected behavior

It should run every time without issue.

Logs

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

And here's the relevant stacktrace:

  File "/home/omega/src/storytime/api/app/tts/xtts.py", line 38, in generate_audio
    outputs = self.model.synthesize(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 428, in synthesize
    return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 450, in inference_with_config
    return self.inference(text, ref_audio_path, language, **settings)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 550, in inference
    gpt_codes = gpt.generate(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 535, in generate
    gen = self.gpt_inference.generate(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1648, in generate
    return self.sample(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2730, in sample
    outputs = self(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt_inference.py", line 97, in forward
    transformer_outputs = self.transformer(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward
    outputs = block(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 390, in forward
    attn_outputs = self.attn(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 331, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 201, in _attn
    mask_value = torch.full([], mask_value, dtype=attn_weights.dtype).to(attn_weights.device)

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.1+cu117",
        "TTS": "0.17.4",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.11",
        "version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023"
    }
}

Additional context

No response

@Omegastick Omegastick added the bug Something isn't working label Sep 19, 2023
@WeberJulian
Copy link
Contributor

Hey, thanks for the bug report, do you mind sharing the reference as well so we can reproduce?

@feizi
Copy link

feizi commented Sep 27, 2023

I'm also getting the same error

@isaac1987a
Copy link

I think the GPU is running out of memory. I'm using this to make an audiobook out of royalroad. I'm breaking my content up into chapters and then running text_to_speach on an already intalized GPU. I am going to try to re-intalize TTS every chapter. Failing that, I'll try tortoise or another model.

from TTS.api import TTS
import re
from pydub import AudioSegment
import os
import subprocess
from datetime import datetime

Initialize TTS

#tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)

Load and split the input file

file = open("Primal Hunter, The - Zogarth.txt", "r")
delimiter = r'(Chapter [0-9]{1,4}(?:.[0-9])?)'
text_read = file.read()

text_set = re.split(delimiter, text_read)
text_set = [i for i in text_set if i and i.strip()] # removing empty or whitespace-only strings

Remove the first item

text_set = text_set[1:]

Pair up items in the list

chapter_num = [text_set[i] for i in range(0, len(text_set) - 1, 2)]
text_set = [str(text_set[i]) + str(text_set[i + 1]) for i in range(0, len(text_set) - 1, 2)]

Function to convert wav to mp3

def convert_wav_to_mp3(wav_file, mp3_file):
audio = AudioSegment.from_wav(wav_file)
audio.export(mp3_file, format="mp3")

Function to get the last processed chapter from chapters.txt

def get_last_processed_chapter():
if os.path.exists("chapters.txt"):
with open("chapters.txt", "r") as file:
content = file.readlines()
if content:
for line in reversed(content):
match = re.search(r'title=(Chapter [0-9]{1,4}(?:.[0-9])?)', line)
if match:
return match.group(1)
return None

Output directory setup

output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)

last_chapter = get_last_processed_chapter()

Determine where to start processing

if last_chapter:
beginning_chapter = chapter_num.index(last_chapter) + 1
else:
beginning_chapter = 0

Open both files.txt and chapters.txt for writing (append mode to avoid overwriting)

with open('chapters.txt', 'a') as chapters, open('files.txt', 'a') as files:
total_duration_ms = 0 if not last_chapter else sum([int(float(subprocess.run(["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", os.path.join(output_dir, f"My Dungeon Life_ Rise of the Slave Harem - whatsawhizzer {chap}.mp3")], stdout=subprocess.PIPE, stderr=subprocess.STDOUT).stdout) * 1000) for chap in chapter_num[:beginning_chapter]])

for chapter, original_text in zip(chapter_num[beginning_chapter:], text_set[beginning_chapter:]):
    now = datetime.now()
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)
    wav_file_name = "Primal Hunter, The - Zogarth.txt " + str(chapter) + ".wav"
    mp3_file_name = os.path.join(output_dir, wav_file_name.replace(".wav", ".mp3"))

    # Create the .wav file
    tts.tts_to_file(text=original_text, file_path=wav_file_name, speaker_wav="source_voice/Recording0001.wav", language="en")

    # Convert the .wav file to .mp3 and store in output directory
    convert_wav_to_mp3(wav_file_name, mp3_file_name)

    # Delete the .wav file
    os.remove(wav_file_name)

    # Write the MP3 file path to files.txt
    files.write(f"file '{mp3_file_name}'\n")

    # Get the duration of the MP3
    result = subprocess.run(["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", mp3_file_name], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    mp3_duration_ms = int(float(result.stdout) * 1000)

    # Write the chapter info to chapters.txt
    chapters.write("[CHAPTER]\n")
    chapters.write(f"TIMEBASE=1/1000\n")
    chapters.write(f"START={total_duration_ms}\n")
    chapters.write(f"END={total_duration_ms + mp3_duration_ms}\n")
    chapters.write(f"title={chapter}\n")

    # Update total_duration_ms
    total_duration_ms += mp3_duration_ms
    processing_time = datetime.now() - now
    print(f"Processed {mp3_file_name} in {processing_time} seconds")

@Edresson
Copy link
Contributor

Edresson commented Oct 18, 2023

Hi @feizi @Omegastick @isaac1987a,

It happens because the GPT encoder is able to produce more tokens than the gpt_max_audio_tokens. max_length should be set to self.max_mel_tokens:

max_length=self.max_mel_tokens * 2 + self.max_prompt_tokens + self.max_text_tokens,

I added this fix in a private branch and It should be fixed in the next release.

A work around for you guys would be split the long sentences in small ones.

@Edresson
Copy link
Contributor

I will close this issue because it was fixed on the PR #3086 and it will be merged soon. Feel free to reopen it if needed.

Edresson added a commit that referenced this issue Oct 21, 2023
@CRochaVox
Copy link

Hello everyone, I'm having the same problem only in version v2 of the xxts model.
I made the change suggested for the v1 model by @Edresson, #2971 (comment) but it was not resolved.

Explanation for having the error

In my scenario I managed to understand how to cause the error.
First I will contextualize my system, I have an endpoint that uses ThreadedHTTPServer, so I receive simultaneous requests.
To cause the error, two simultaneous requests must be sent (I tested with 10 ms delay up to 900 ms from one sentence to the other), one containing a small text and the other a larger text.
In this error, the phrases "hi" and "hi, my name is Caio" were sent.
The competition of a small sentence followed by a slightly larger sentence on GPU gives me this error.
I've already made sure that the problem is the GPU's memory because I've already processed five identical sentences in parallel, with the same size and larger than the example I gave and it worked.

Logs

../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Error processTTS Message ( CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
 )
Error processTTS Message ( CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 1024 n 51 k 1024 mat1_ld 1024 mat2_ld 1024 result_ld 1024 abcType 0 computeType 68 scaleType 0 )
Error processTTS Message ( CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
 )

Ways I call processing

out = self.TTS.synthesize(
    text,
    config=self.config,
    language='pt',
    speaker_wav='Audio/audio5.wav',
    gpt_cond_len=self.cond_len,
)
out = self.TTS.inference(
    text,
    "pt",
    self.gpt_cond_latent,
    self.speaker_embedding,
    temperature=0.7,
    enable_text_splitting= True
)

@Poeroz
Copy link

Poeroz commented May 7, 2024

Same problem with XTTS-v2 model using the latest code.

@davaavirtualplus
Copy link

@CRochaVox hi did u fix it

@Coastchb
Copy link

Coastchb commented Oct 7, 2024

@Poeroz @davaavirtualplus @CRochaVox Hi guys, have you ever solved the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants