Draft ONNX export for VITS #2563

erogol · 2023-04-27T12:39:50Z

~~Could not get it to output the variable length sequence. Dynamic shapes do not work for some reason. If anyone knows how, feel free to jump in.~~

Thanks to @manmay-nakhashi I fixed the issue.

This is how it works

from TTS.tts.models.vits import Vits
from TTS.tts.configs.vits_config import VitsConfig
from TTS.utils.audio.numpy_transforms import save_wav


config = VitsConfig()
config.load_json("config.json")
vits = Vits.init_from_config(config)
vits.load_checkpoint(config,  "model.pth")

vits.export_onnx()
vits.load_onnx("coqui_vits.onnx")

text = "This is a test"
text_inputs = np.asarray(
    vits.tokenizer.text_to_ids(text, language="en"),
    dtype=np.int64,
)[None, :]

audio = vits.inference_onnx(text_inputs)
print(audio.shape)

save_wav(wav=audio[0], path="coqui_vits.wav", sample_rate=config.audio.sample_rate)

Could not get it work to output variable length sequence

NeonBohdan · 2023-05-08T12:32:14Z

@erogol when I load a model with "init_discriminator": false, it returns error here

manmay-nakhashi

everything looks good to me.

erogol · 2023-05-08T21:00:13Z

@NeonBohdan can you give me sample code to reproduce?

NeonBohdan · 2023-05-09T13:11:17Z

@erogol sure
download this model
and run onnx_run.py.txt

Got AttributeError: 'Vits' object has no attribute 'disc'

erogol · 2023-05-11T08:50:04Z

@NeonBohdan can you post the code here in the thread? I don't want to download a file that I don't know.

NeonBohdan · 2023-05-11T08:51:36Z

With this model

from TTS.tts.models.vits import Vits
from TTS.tts.configs.vits_config import VitsConfig
from TTS.utils.audio.numpy_transforms import save_wav

import numpy as np


config = VitsConfig()
config.load_json("model/config.json")
vits = Vits.init_from_config(config)
vits.load_checkpoint(config,  "model/model_file.pth")

vits.export_onnx()
vits.load_onnx("coqui_vits.onnx")

text = "This is a test"
text_inputs = np.asarray(
    vits.tokenizer.text_to_ids(text, language="en"),
    dtype=np.int64,
)[None, :]

audio = vits.inference_onnx(text_inputs)
print(audio.shape)

save_wav(wav=audio[0], path="coqui_vits.wav", sample_rate=config.audio.sample_rate)

erogol · 2023-05-11T09:08:14Z

I think your file has discriminator layers but config sets discriminator false. I could not reproduce the issue with default settings.

NeonBohdan · 2023-05-11T09:14:08Z

It's not my file, it's single speaker model trained by Eren Gölge @erogol, it available on TTS package model list and it weights 140MiB so discriminator weights removed
There is no possibility to convert it with this code

Because of this line

erogol · 2023-05-11T10:03:22Z

I don't know Eren Golge. This sucker needs to do a better job. Let him know.

Which model is this? What's the name?

NeonBohdan · 2023-05-11T10:06:06Z

No problem, maybe the line With this model was lost previously

alessandropettenuzzo96 · 2023-05-11T10:56:02Z

Hi @erogol, thank you for this PR, it's saving me lot of time, could it be used also with Multi Speaker YourTTS?
It gives me an error executing vits.export_onnx()

RuntimeError: Given groups=1, weight of size [196, 196, 1], expected input[1, 192, 100] to have 196 channels, but got 192 channels instead

Thank you

NeonBohdan · 2023-05-11T10:58:00Z

@alessandropettenuzzo96 in code comment listed in this PR mentioned that it's only for single speaker models for now

karelnagel · 2023-05-15T16:06:14Z

No problem, maybe the line With this model was lost previously

Did you get it working? I have the same issue

SystemPanic · 2023-05-17T21:17:49Z

@erogol Onnx inference is 6-7 times slower than Pytorch due to EXHAUSTIVE convolution search on cuDNN, which is the default mode in OnnxRuntime when simply CUDAExecutionProvider is specified as provider.

Can you change from CUDAExecutionProvider to ("CUDAExecutionProvider", {"cudnn_conv_algo_search": "DEFAULT"}) here?

With DEFAULT as cudnn_conv_algo_search, OnnxRuntime performs ~20% better than the standard VITS Pytorch mode.

This performance issue is detailed on microsoft/onnxruntime#12880 (comment)

Thanks,

Javier.

karelnagel · 2023-05-19T08:08:11Z

No problem, maybe the line With this model was lost previously

Did you get it working? I have the same issue

If anyone else is in this situation, then changing vits.py file on line 1771 to this:

# rollback values
_forward = self.forward
disc = None
if hasattr(self, 'disc'):
    disc = self.disc
training = self.training

worked for me (recommended in discord by Jpg#0419) . And also make sure that you have onnx installed (pip install onnx worked for me) and that the export file isn't named onnx.py.

jpg-gamepad · 2023-05-21T21:28:29Z

@erogol
Hey thanks for the update! The ONNX exporter works great!

SchweitzerGAO · 2023-07-08T14:21:48Z

@alessandropettenuzzo96 hello, have you solved this? I encountered the same problem.

* Draft ONNX export for VITS Could not get it work to output variable length sequence * Fixup for onnx constant output * Make style * Remove commented code

erogol added 2 commits April 27, 2023 14:36

Draft ONNX export for VITS

1888e7e

Could not get it work to output variable length sequence

Fixup for onnx constant output

44f2b37

erogol requested a review from manmay-nakhashi May 8, 2023 10:44

Make style

a58ac55

manmay-nakhashi reviewed May 8, 2023

View reviewed changes

Remove commented code

834a589

erogol merged commit 4de797b into dev May 15, 2023

erogol deleted the vits_onnxx branch May 15, 2023 23:07

kristiankielhofner mentioned this pull request Jun 2, 2023

TTS Does not handle numbers in text toverainc/willow-inference-server#78

Closed

lexkoro mentioned this pull request Jun 5, 2023

[Feature request] Direct Java access #2630

Closed

mllopartbsc mentioned this pull request Jun 27, 2023

[Feature request] [Bug] ONNX inference not working for multi-speaker VITS model #2713

Closed

Iamgoofball mentioned this pull request Jun 30, 2023

Adds multispeaker support to inference_onnx with VITS, makes onnx inference faster #2725

Closed

SchweitzerGAO mentioned this pull request Jul 8, 2023

[Bug] RuntimeError: Given groups=1, weight of size [196, 196, 1], expected input[1, 192, 100] to have 196 channels, but got 192 channels instead #2753

Closed

anita-smith1 mentioned this pull request Nov 9, 2023

FYI: Download links about Android APKs for piper models rhasspy/piper#257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft ONNX export for VITS #2563

Draft ONNX export for VITS #2563

erogol commented Apr 27, 2023 •

edited

Loading

NeonBohdan commented May 8, 2023

manmay-nakhashi left a comment

erogol commented May 8, 2023

NeonBohdan commented May 9, 2023

erogol commented May 11, 2023

NeonBohdan commented May 11, 2023 •

edited

Loading

erogol commented May 11, 2023

NeonBohdan commented May 11, 2023 •

edited

Loading

erogol commented May 11, 2023

NeonBohdan commented May 11, 2023

alessandropettenuzzo96 commented May 11, 2023

NeonBohdan commented May 11, 2023

karelnagel commented May 15, 2023

SystemPanic commented May 17, 2023 •

edited

Loading

karelnagel commented May 19, 2023

jpg-gamepad commented May 21, 2023 •

edited

Loading

SchweitzerGAO commented Jul 8, 2023

Draft ONNX export for VITS #2563

Draft ONNX export for VITS #2563

Conversation

erogol commented Apr 27, 2023 • edited Loading

NeonBohdan commented May 8, 2023

manmay-nakhashi left a comment

Choose a reason for hiding this comment

erogol commented May 8, 2023

NeonBohdan commented May 9, 2023

erogol commented May 11, 2023

NeonBohdan commented May 11, 2023 • edited Loading

erogol commented May 11, 2023

NeonBohdan commented May 11, 2023 • edited Loading

erogol commented May 11, 2023

NeonBohdan commented May 11, 2023

alessandropettenuzzo96 commented May 11, 2023

NeonBohdan commented May 11, 2023

karelnagel commented May 15, 2023

SystemPanic commented May 17, 2023 • edited Loading

karelnagel commented May 19, 2023

jpg-gamepad commented May 21, 2023 • edited Loading

SchweitzerGAO commented Jul 8, 2023

erogol commented Apr 27, 2023 •

edited

Loading

NeonBohdan commented May 11, 2023 •

edited

Loading

NeonBohdan commented May 11, 2023 •

edited

Loading

SystemPanic commented May 17, 2023 •

edited

Loading

jpg-gamepad commented May 21, 2023 •

edited

Loading