[Bug] Assertion `srcIndex < srcSelectDimSize` failed #2971

Omegastick · 2023-09-19T18:21:08Z

Describe the bug

Sometimes, XTTS inference will fail with a long list of ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. exceptions, followed by

It seems random, around 1 in 20 calls fail. Longer inputs seem more likely to fail, but I might be imagining it.

Once it fails one, the Python runtime has to be restarted. Any further attempts to use CUDA give RuntimeError: CUDA error: device-side assert triggered.

To Reproduce

Run the example code from the docs a few times:

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()

outputs = model.synthesize(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    config,
    speaker_wav="/data/TTS-public/_refclips/3.wav",
    gpt_cond_len=3,
    language="en",
)

Expected behavior

It should run every time without issue.

Logs

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

And here's the relevant stacktrace:

  File "/home/omega/src/storytime/api/app/tts/xtts.py", line 38, in generate_audio
    outputs = self.model.synthesize(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 428, in synthesize
    return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 450, in inference_with_config
    return self.inference(text, ref_audio_path, language, **settings)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 550, in inference
    gpt_codes = gpt.generate(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 535, in generate
    gen = self.gpt_inference.generate(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1648, in generate
    return self.sample(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2730, in sample
    outputs = self(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt_inference.py", line 97, in forward
    transformer_outputs = self.transformer(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward
    outputs = block(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 390, in forward
    attn_outputs = self.attn(
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 331, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/omega/.cache/pypoetry/virtualenvs/storytime-zx2SQh-k-py3.10/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 201, in _attn
    mask_value = torch.full([], mask_value, dtype=attn_weights.dtype).to(attn_weights.device)

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.1+cu117",
        "TTS": "0.17.4",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.11",
        "version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023"
    }
}

Additional context

No response

The text was updated successfully, but these errors were encountered:

WeberJulian · 2023-09-20T09:26:34Z

Hey, thanks for the bug report, do you mind sharing the reference as well so we can reproduce?

feizi · 2023-09-27T02:55:01Z

I'm also getting the same error

isaac1987a · 2023-10-09T00:42:24Z

I think the GPU is running out of memory. I'm using this to make an audiobook out of royalroad. I'm breaking my content up into chapters and then running text_to_speach on an already intalized GPU. I am going to try to re-intalize TTS every chapter. Failing that, I'll try tortoise or another model.

from TTS.api import TTS
import re
from pydub import AudioSegment
import os
import subprocess
from datetime import datetime

Initialize TTS

#tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)

Load and split the input file

file = open("Primal Hunter, The - Zogarth.txt", "r")
delimiter = r'(Chapter [0-9]{1,4}(?:.[0-9])?)'
text_read = file.read()

text_set = re.split(delimiter, text_read)
text_set = [i for i in text_set if i and i.strip()] # removing empty or whitespace-only strings

Remove the first item

text_set = text_set[1:]

Pair up items in the list

chapter_num = [text_set[i] for i in range(0, len(text_set) - 1, 2)]
text_set = [str(text_set[i]) + str(text_set[i + 1]) for i in range(0, len(text_set) - 1, 2)]

Function to convert wav to mp3

def convert_wav_to_mp3(wav_file, mp3_file):
audio = AudioSegment.from_wav(wav_file)
audio.export(mp3_file, format="mp3")

Function to get the last processed chapter from chapters.txt

def get_last_processed_chapter():
if os.path.exists("chapters.txt"):
with open("chapters.txt", "r") as file:
content = file.readlines()
if content:
for line in reversed(content):
match = re.search(r'title=(Chapter [0-9]{1,4}(?:.[0-9])?)', line)
if match:
return match.group(1)
return None

Output directory setup

output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)

last_chapter = get_last_processed_chapter()

Determine where to start processing

if last_chapter:
beginning_chapter = chapter_num.index(last_chapter) + 1
else:
beginning_chapter = 0

Open both files.txt and chapters.txt for writing (append mode to avoid overwriting)

with open('chapters.txt', 'a') as chapters, open('files.txt', 'a') as files:
total_duration_ms = 0 if not last_chapter else sum([int(float(subprocess.run(["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", os.path.join(output_dir, f"My Dungeon Life_ Rise of the Slave Harem - whatsawhizzer {chap}.mp3")], stdout=subprocess.PIPE, stderr=subprocess.STDOUT).stdout) * 1000) for chap in chapter_num[:beginning_chapter]])

for chapter, original_text in zip(chapter_num[beginning_chapter:], text_set[beginning_chapter:]):
    now = datetime.now()
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)
    wav_file_name = "Primal Hunter, The - Zogarth.txt " + str(chapter) + ".wav"
    mp3_file_name = os.path.join(output_dir, wav_file_name.replace(".wav", ".mp3"))

    # Create the .wav file
    tts.tts_to_file(text=original_text, file_path=wav_file_name, speaker_wav="source_voice/Recording0001.wav", language="en")

    # Convert the .wav file to .mp3 and store in output directory
    convert_wav_to_mp3(wav_file_name, mp3_file_name)

    # Delete the .wav file
    os.remove(wav_file_name)

    # Write the MP3 file path to files.txt
    files.write(f"file '{mp3_file_name}'\n")

    # Get the duration of the MP3
    result = subprocess.run(["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", mp3_file_name], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    mp3_duration_ms = int(float(result.stdout) * 1000)

    # Write the chapter info to chapters.txt
    chapters.write("[CHAPTER]\n")
    chapters.write(f"TIMEBASE=1/1000\n")
    chapters.write(f"START={total_duration_ms}\n")
    chapters.write(f"END={total_duration_ms + mp3_duration_ms}\n")
    chapters.write(f"title={chapter}\n")

    # Update total_duration_ms
    total_duration_ms += mp3_duration_ms
    processing_time = datetime.now() - now
    print(f"Processed {mp3_file_name} in {processing_time} seconds")

Edresson · 2023-10-18T18:20:55Z

Hi @feizi @Omegastick @isaac1987a,

It happens because the GPT encoder is able to produce more tokens than the gpt_max_audio_tokens. max_length should be set to self.max_mel_tokens:

TTS/TTS/tts/layers/xtts/gpt.py

Line 551 in d21f15c

    
           max_length=self.max_mel_tokens * 2 + self.max_prompt_tokens + self.max_text_tokens,

I added this fix in a private branch and It should be fixed in the next release.

A work around for you guys would be split the long sentences in small ones.

Edresson · 2023-10-20T11:01:00Z

I will close this issue because it was fixed on the PR #3086 and it will be merged soon. Feel free to reopen it if needed.

CRochaVox · 2024-04-19T20:47:13Z

Hello everyone, I'm having the same problem only in version v2 of the xxts model.
I made the change suggested for the v1 model by @Edresson, #2971 (comment) but it was not resolved.

Explanation for having the error

In my scenario I managed to understand how to cause the error.
First I will contextualize my system, I have an endpoint that uses ThreadedHTTPServer, so I receive simultaneous requests.
To cause the error, two simultaneous requests must be sent (I tested with 10 ms delay up to 900 ms from one sentence to the other), one containing a small text and the other a larger text.
In this error, the phrases "hi" and "hi, my name is Caio" were sent.
The competition of a small sentence followed by a slightly larger sentence on GPU gives me this error.
I've already made sure that the problem is the GPU's memory because I've already processed five identical sentences in parallel, with the same size and larger than the example I gave and it worked.

Logs

../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [0,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Error processTTS Message ( CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
 )
Error processTTS Message ( CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 1024 n 51 k 1024 mat1_ld 1024 mat2_ld 1024 result_ld 1024 abcType 0 computeType 68 scaleType 0 )
Error processTTS Message ( CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
 )

Ways I call processing

out = self.TTS.synthesize(
    text,
    config=self.config,
    language='pt',
    speaker_wav='Audio/audio5.wav',
    gpt_cond_len=self.cond_len,
)

out = self.TTS.inference(
    text,
    "pt",
    self.gpt_cond_latent,
    self.speaker_embedding,
    temperature=0.7,
    enable_text_splitting= True
)

Poeroz · 2024-05-07T06:37:40Z

Same problem with XTTS-v2 model using the latest code.

davaavirtualplus · 2024-08-13T05:13:16Z

@CRochaVox hi did u fix it

Coastchb · 2024-10-07T13:31:45Z

@Poeroz @davaavirtualplus @CRochaVox Hi guys, have you ever solved the problem?

Omegastick added the bug Something isn't working label Sep 19, 2023

Edresson added a commit that referenced this issue Oct 19, 2023

Fix issue #2971

75dc0e1

Edresson mentioned this issue Oct 20, 2023

[Bug] token limit and model capability is not competible #3079

Closed

Edresson closed this as completed Oct 20, 2023

Edresson added a commit that referenced this issue Oct 21, 2023

Fix issue #2971

1f92741

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Assertion `srcIndex < srcSelectDimSize` failed #2971

[Bug] Assertion `srcIndex < srcSelectDimSize` failed #2971

Omegastick commented Sep 19, 2023 •

edited

Loading

WeberJulian commented Sep 20, 2023

feizi commented Sep 27, 2023

isaac1987a commented Oct 9, 2023

Edresson commented Oct 18, 2023 •

edited

Loading

Edresson commented Oct 20, 2023

CRochaVox commented Apr 19, 2024

Poeroz commented May 7, 2024

davaavirtualplus commented Aug 13, 2024

Coastchb commented Oct 7, 2024

[Bug] Assertion srcIndex < srcSelectDimSize failed #2971

[Bug] Assertion srcIndex < srcSelectDimSize failed #2971

Comments

Omegastick commented Sep 19, 2023 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

WeberJulian commented Sep 20, 2023

feizi commented Sep 27, 2023

isaac1987a commented Oct 9, 2023

Initialize TTS

Load and split the input file

Remove the first item

Pair up items in the list

Function to convert wav to mp3

Function to get the last processed chapter from chapters.txt

Output directory setup

Determine where to start processing

Open both files.txt and chapters.txt for writing (append mode to avoid overwriting)

Edresson commented Oct 18, 2023 • edited Loading

Edresson commented Oct 20, 2023

CRochaVox commented Apr 19, 2024

Poeroz commented May 7, 2024

davaavirtualplus commented Aug 13, 2024

Coastchb commented Oct 7, 2024

[Bug] Assertion `srcIndex < srcSelectDimSize` failed #2971

[Bug] Assertion `srcIndex < srcSelectDimSize` failed #2971

Omegastick commented Sep 19, 2023 •

edited

Loading

Edresson commented Oct 18, 2023 •

edited

Loading