Support onnx #21

Yosshi999 · 2021-10-19T13:42:28Z

Hiroshiba/vv_core_inference#1 に対応したonnxruntime実行サンプル

Google Colabで動かしたもの：https://gist.github.com/Yosshi999/f5c64cb5f0afe7f1c053af0204b1feb1

テキスト "親譲りの無鉄砲で小供の時から損ばかりしている。小学校に居る時分学校の二階から飛び降りて一週間ほど腰を抜かした事がある。"に対して、CPU版は14.8秒、GPU版は6.24秒だった。(GPU版の初回起動は34.5秒と時間がかかった。原因は不明です。）

Hiroshiba

良いですね！！！！！！！
僕も勉強しなきゃなと思ったので、いい感じの参考資料とかあったらぜひ教えていただきたいです！！

Yosshi999 · 2021-10-19T16:08:47Z

良いですね！！！！！！！僕も勉強しなきゃなと思ったので、いい感じの参考資料とかあったらぜひ教えていただきたいです！！

正直たいしたチュートリアルはなかったです...
参考にしたものは以下のリンクです

これをもとに勘で書きました

Hiroshiba · 2021-10-19T16:17:25Z

ありがとうございます！！
まとまった時間が取れるときにガッツリ見たいと思います！！

Hiroshiba · 2021-10-19T16:18:25Z

GPUがテストできない件に関しては、googleから提供されている、なぜか無料でgpuが使えるcolaboratoryを使うとテストできるかもしれません。

Yosshi999 · 2021-10-19T16:35:59Z

CUDAリンクがどうなってるか気になりましたが、どうやらonnxruntime-gpuに同梱されているonnxruntime_providers_cuda.dllを動的にロードするようです

Hiroshiba · 2021-10-20T08:25:06Z

なるほど〜　でも全部のDLL含めてもlibtorchよりは圧倒的に軽いので嬉しいですね…！

Yosshi999 · 2021-10-21T12:23:39Z

Windows, linux (WSL, GoogleColab)で動作確認済み

https://gist.github.com/Yosshi999/f5c64cb5f0afe7f1c053af0204b1feb1

Hiroshiba · 2021-10-21T13:53:27Z

週末のまとまった時間が取れたタイミングで見てみたいと思います！！

Hiroshiba · 2021-10-24T18:53:30Z

こちらでも動きました！！！

WIndowsのGPU環境でも動きましたが、気になる点が２つありました。

CPU版よりGPU版のほうが実行時間が長かったです。
run.pyのコードをちょっと変えて、forwarder.forwardを連続で何回か実行するようにして時間を測定してみました。
CPU版だと1回目が0.8秒、2回目以降は0.7秒だったのですが、GPU番だと1回目が1.6秒、2回目以降は1.0秒でした。
利用しているGPUはGTX1060なので、古いですがさすがにCPUには圧勝できるはずなので気になりました。

あと、yukarin_saの実行結果（f0_list）がCPUとGPUで異なっていました。
phoneme_lengthとf0_listをそれぞれ出力してみるとこんな感じでした。
phoneme_lengthはほぼ一致していて、f0_listが異なっています。
生成された音声を聞くとCPU版は正しそうに聞こえますが、GPU版は声が上がっていくように聞こえました。

↓CPU版

phoneme_length
[1.3397267 0.06358747 0.07098808 0.03593455 0.09774899 0.04697677
0.09010375 0.08894082 0.11680754 0.06584772 0.05094525 0.07460123
0.08330087 0.05011118 0.07428109 0.08238048 0.09245145 0.06637056
0.06612038 0.06495053 0.06862129 0.05226339 0.08244184 0.05181761
0.05277015 0.06118139 0.10580397 0.06581997 0.04480745 0.0946921
0.04909584 0.03383401 0.06933177 0.03648905 0.06981966 0.07857117
0.1557335 1.186644 ]

f0_list
[5.571943 5.382277 5.4611235 5.646292 5.6966295 5.7262206 5.7680383
5.7643228 5.720693 5.5305376 5.6798224 5.785994 5.7878523 5.726377
5.7864146 5.7111397 5.5586343 5.413412 5.4239593 5.6004224 5.549212
5.3954773 5.4413342]

↓GPU版

phoneme_length
[1.3397267 0.06358746 0.07098801 0.03593457 0.09774898 0.04697674
0.09010376 0.08894081 0.11680754 0.06584772 0.05094522 0.07460122
0.08330087 0.05011114 0.0742811 0.0823805 0.09245142 0.06637056
0.06612033 0.06495053 0.06862128 0.05226342 0.08244181 0.05181763
0.05277017 0.06118139 0.10580397 0.06581999 0.04480739 0.0946921
0.04909583 0.0338341 0.06933178 0.03648902 0.06981972 0.07857122
0.1557335 1.1866437 ]

f0_list
[2.074989 3.622395 4.4289107 5.0321183 5.463549 5.764526 5.974276
6.1226244 6.229909 6.309424 6.3697863 6.4166245 6.4536705 6.483452
6.5077105 6.5276814 6.5442557 6.5580945 6.5696974 6.5794516 6.58766
6.594566 6.6003685]

速度がGPUのほうが遅そうな理由と、yukarin_saの結果がGPUとCPUで異なる点の心当たりがあれば伺いたいです。

ちなみにですが、libtorch版の場合はCPUで2.1秒、GPUで1.5秒でした。
onnx版のCPU実行が信じられないほど速いのですが、なぜなんでしょう･･･？

onnx/python/setup.py

Hiroshiba

コードも読んでみました！　C++が全然わからず質問ばかりですが、よろしくおねがいします。

onnx/python/setup.py

onnx/core.cpp

onnx/CMakeLists.txt

onnx/.clang-format

onnx/core.cpp

Yosshi999 · 2021-10-25T07:57:54Z

CPU版よりGPU版のほうが実行時間が長かったです。

調べてみます。

yukarin_saの実行結果（f0_list）がCPUとGPUで異なっていました。

なるほど...ただのfloat誤差ではなさそうですね...
yukarin_saの特徴としては唯一torch.jit.scriptを使ってonnxに変換している点ですが、この現象についてはさっぱりです

onnx版のCPU実行が信じられないほど速い

onnx変換時にいくらか最適化を走らせているようです。詳しいことは分かりませんが、例えばbatchnorm -> convolutionを一つのconvolutionに変換するといったことなどをやっているようです。

Yosshi999 · 2021-10-26T12:09:50Z

yukarin_sとyukarin_saはCPUでも十分早いのでGPUモードでもCPU上で走らせるようにしました。
ボトルネックであるdecodeのみをGPUで走らせるようにしています
これでyukarin_saの問題を回避しつつ、GPU処理のオーバーヘッドを減らしています。

これに加えて、GPU版では初期化時にlength=500のダミーデータでdecodeを実行し、メモリ確保を行わせています

またfinalize関数をcore.hに追加しました。僕の用意したGPU環境では以下のコマンドを実行したときにexit時例外が発生したためです。どうやらdestructorをシステムに任せると例外がおこってしまうようです
time python run.py --text "hoge" --speaker_id 1 --use_gpu

Yosshi999 · 2021-10-26T15:46:40Z

おそらく通信がかなりのコストになっているのですが、decode.onnxをcudaに載せたときのログを置いておきます。
一部のノードがCPUに載っていることが分かります

verbose log

2021-10-25 10:40:38.356702414 [V:onnxruntime:, inference_session.cc:152 VerifyEachNodeIsAssignedToAnEp] Node placements
2021-10-25 10:40:38.356746062 [V:onnxruntime:, inference_session.cc:159 VerifyEachNodeIsAssignedToAnEp]  Provider: [CPUExecutionProvider]: [Gather (Gather_10), Unsqueeze (Unsqueeze_15), Concat (Concat_17), Reshape (Reshape_19), Equal (Equal_24), Where (Where_25), Gather (Gather_35), Unsqueeze (Unsqueeze_36), Concat (Concat_37), Gather (Gather_41), Unsqueeze (Unsqueeze_42), Concat (Concat_43), Gather (Gather_47), Cast (Cast_48), Gather (Gather_60), Gather (Gather_67), Equal (Equal_83), Where (Where_84), Equal (Equal_91), Where (Where_92), Equal (Equal_99), Where (Where_100), Slice (Slice_108), Concat (Concat_109), Gather (Gather_121), Gather (Gather_128), Equal (Equal_144), Where (Where_145), Equal (Equal_152), Where (Where_153), Equal (Equal_160), Where (Where_161), Slice (Slice_169), Concat (Concat_170), Gather (Gather_184), Gather (Gather_191), Equal (Equal_207), Where (Where_208), Equal (Equal_215), Where (Where_216), Equal (Equal_223), Where (Where_224), Slice (Slice_232), Concat (Concat_233), Gather (Gather_247), Gather (Gather_254), Equal (Equal_270), Where (Where_271), Equal (Equal_278), Where (Where_279), Equal (Equal_286), Where (Where_287), Slice (Slice_295), Concat (Concat_296), Gather (Gather_328), Div (Div_330), Gather (Gather_335), Sub (Sub_336), Add (Add_338), Gather (Gather_341), Div (Div_343), Gather (Gather_348), Add (Add_349), Unsqueeze (Unsqueeze_350), Unsqueeze (Unsqueeze_351), Gather (Gather_386), Concat (Concat_390), Concat (Concat_395), Unsqueeze (Unsqueeze_399), Concat (Concat_400), Gather (Gather_405), Unsqueeze (Unsqueeze_407), Concat (Concat_408), Gather (Gather_420), Gather (Gather_423), Gather (Gather_426), Unsqueeze (Unsqueeze_427), Unsqueeze (Unsqueeze_428), Unsqueeze (Unsqueeze_429), Concat (Concat_430), Gather (Gather_435), Gather (Gather_438), Gather (Gather_441), Add (Add_443), Gather (Gather_446), Unsqueeze (Unsqueeze_447), Unsqueeze (Unsqueeze_448), Unsqueeze (Unsqueeze_449), Unsqueeze (Unsqueeze_450), Concat (Concat_451), Slice (Slice_464), Squeeze (Squeeze_465), Div (Div_467), Add (Add_471), Unsqueeze (Unsqueeze_472), Gather (Gather_480), Unsqueeze (Unsqueeze_493), Concat (Concat_494), Gather (Gather_583), Concat (Concat_587), Concat (Concat_592), Unsqueeze (Unsqueeze_596), Concat (Concat_597), Gather (Gather_602), Unsqueeze (Unsqueeze_604), Concat (Concat_605), Gather (Gather_617), Gather (Gather_620), Gather (Gather_623), Unsqueeze (Unsqueeze_624), Unsqueeze (Unsqueeze_625), Unsqueeze (Unsqueeze_626), Concat (Concat_627), Gather (Gather_632), Gather (Gather_635), Gather (Gather_638), Add (Add_640), Gather (Gather_643), Unsqueeze (Unsqueeze_644), Unsqueeze (Unsqueeze_645), Unsqueeze (Unsqueeze_646), Unsqueeze (Unsqueeze_647), Concat (Concat_648), Slice (Slice_661), Squeeze (Squeeze_662), Div (Div_664), Add (Add_668), Unsqueeze (Unsqueeze_669), Gather (Gather_677), Unsqueeze (Unsqueeze_690), Concat (Concat_691), ]
2021-10-25 10:40:38.356816599 [V:onnxruntime:, inference_session.cc:159 VerifyEachNodeIsAssignedToAnEp]  Provider: [CUDAExecutionProvider]: [Unsqueeze (Unsqueeze_0), Unsqueeze (Unsqueeze_1), Concat (Concat_2), Gather (Gather_3), Unsqueeze (Unsqueeze_4), Shape (Shape_8), Expand (Expand_26), Concat (Concat_27), MatMul (MatMul_28), Add (Add_29), Shape (Shape_30), ConstantOfShape (ConstantOfShape_31), Squeeze (Squeeze_32), ConstantOfShape (ConstantOfShape_38), ConstantOfShape (ConstantOfShape_44), Range (Range_49), Unsqueeze (Unsqueeze_50), Mul (Mul_52), Sin (Sin_53), Gather (Gather_55), Shape (Shape_56), Expand (Expand_57), Range (Range_64), Range (Range_71), Reshape (Reshape_73), Reshape (Reshape_75), Add (Add_76), Shape (Shape_78), Expand (Expand_85), Unsqueeze (Unsqueeze_86), Expand (Expand_93), Unsqueeze (Unsqueeze_94), Expand (Expand_101), Unsqueeze (Unsqueeze_102), Concat (Concat_103), Shape (Shape_104), Reshape (Reshape_110), ScatterND (ScatterND_111), Mul (Mul_113), Cos (Cos_114), Gather (Gather_116), Shape (Shape_117), Expand (Expand_118), Range (Range_125), Range (Range_132), Reshape (Reshape_134), Reshape (Reshape_136), Add (Add_137), Add (Add_138), Shape (Shape_139), Expand (Expand_146), Unsqueeze (Unsqueeze_147), Expand (Expand_154), Unsqueeze (Unsqueeze_155), Expand (Expand_162), Unsqueeze (Unsqueeze_163), Concat (Concat_164), Shape (Shape_165), Reshape (Reshape_171), ScatterND (ScatterND_172), Mul (Mul_174), Mul (Mul_176), Sin (Sin_177), Gather (Gather_179), Shape (Shape_180), Expand (Expand_181), Range (Range_188), Range (Range_195), Reshape (Reshape_197), Reshape (Reshape_199), Add (Add_200), Shape (Shape_202), Expand (Expand_209), Unsqueeze (Unsqueeze_210), Expand (Expand_217), Unsqueeze (Unsqueeze_218), Expand (Expand_225), Unsqueeze (Unsqueeze_226), Concat (Concat_227), Shape (Shape_228), Reshape (Reshape_234), ScatterND (ScatterND_235), Mul (Mul_237), Mul (Mul_239), Cos (Cos_240), Gather (Gather_242), Shape (Shape_243), Expand (Expand_244), Range (Range_251), Range (Range_258), Reshape (Reshape_260), Reshape (Reshape_262), Add (Add_263), Add (Add_264), Shape (Shape_265), Expand (Expand_272), Unsqueeze (Unsqueeze_273), Expand (Expand_280), Unsqueeze (Unsqueeze_281), Expand (Expand_288), Unsqueeze (Unsqueeze_289), Concat (Concat_290), Shape (Shape_291), Reshape (Reshape_297), ScatterND (ScatterND_298), Reshape (Reshape_304), Shape (Shape_305), Reshape (Reshape_310), Slice (Slice_315), Unsqueeze (Unsqueeze_316), Slice (Slice_321), Unsqueeze (Unsqueeze_322), Concat (Concat_323), Mul (Mul_325), Shape (Shape_339), Shape (Shape_346), Slice (Slice_353), Transpose (Transpose_365), Conv (Conv_368), Transpose (Transpose_369), Mul (Mul_371), Add (Add_372), Shape (Shape_384), MatMul (MatMul_387), Add (Add_388), Reshape (Reshape_391), MatMul (MatMul_392), Add (Add_393), Reshape (Reshape_396), MatMul (MatMul_397), Add (Add_398), Reshape (Reshape_401), Transpose (Transpose_402), Shape (Shape_403), MatMul (MatMul_406), Reshape (Reshape_409), Add (Add_410), Transpose (Transpose_411), Add (Add_412), Transpose (Transpose_413), Transpose (Transpose_414), MatMul (MatMul_415), Transpose (Transpose_416), MatMul (MatMul_417), ConstantOfShape (ConstantOfShape_431), Concat (Concat_432), Reshape (Reshape_452), Slice (Slice_457), Reshape (Reshape_459), Shape (Shape_460), Slice (Slice_474), Add (Add_475), Div (Div_477), Shape (Shape_478), Unsqueeze (Unsqueeze_481), Equal (Equal_483), Where (Where_486), Softmax (Softmax_487), Where (Where_490), MatMul (MatMul_491), Transpose (Transpose_492), Reshape (Reshape_495), MatMul (MatMul_496), Add (Add_497), Add (Add_498), Transpose (Transpose_510), Conv (Conv_511), Split (Split_512), Sigmoid (Sigmoid_513), Mul (Mul_514), Conv (Conv_515), Sigmoid (Sigmoid_516), Mul (Mul_517), Conv (Conv_518), Transpose (Transpose_519), Add (Add_520), Transpose (Transpose_532), Conv (Conv_535), Transpose (Transpose_536), Mul (Mul_538), Add (Add_539), Transpose (Transpose_562), Conv (Conv_565), Transpose (Transpose_566), Mul (Mul_568), Add (Add_569), Shape (Shape_581), MatMul (MatMul_584), Add (Add_585), Reshape (Reshape_588), MatMul (MatMul_589), Add (Add_590), Reshape (Reshape_593), MatMul (MatMul_594), Add (Add_595), Reshape (Reshape_598), Transpose (Transpose_599), MatMul (MatMul_603), Reshape (Reshape_606), Add (Add_607), Transpose (Transpose_608), Add (Add_609), Transpose (Transpose_610), Transpose (Transpose_611), MatMul (MatMul_612), Transpose (Transpose_613), MatMul (MatMul_614), ConstantOfShape (ConstantOfShape_628), Concat (Concat_629), Reshape (Reshape_649), Slice (Slice_654), Reshape (Reshape_656), Shape (Shape_657), Slice (Slice_671), Add (Add_672), Div (Div_674), Shape (Shape_675), Equal (Equal_680), Where (Where_683), Softmax (Softmax_684), Where (Where_687), MatMul (MatMul_688), Transpose (Transpose_689), Reshape (Reshape_692), MatMul (MatMul_693), Add (Add_694), Add (Add_695), Transpose (Transpose_707), Conv (Conv_708), Split (Split_709), Sigmoid (Sigmoid_710), Mul (Mul_711), Conv (Conv_712), Sigmoid (Sigmoid_713), Mul (Mul_714), Conv (Conv_715), Transpose (Transpose_716), Add (Add_717), Transpose (Transpose_729), Conv (Conv_732), Transpose (Transpose_733), Mul (Mul_735), Add (Add_736), MatMul (MatMul_759), Add (Add_760), Transpose (Transpose_761), Conv (Conv_762), Tanh (Tanh_763), Conv (Conv_764), Tanh (Tanh_765), Conv (Conv_766), Tanh (Tanh_767), Conv (Conv_768), Tanh (Tanh_769), Conv (Conv_770), Transpose (Transpose_771), Add (Add_772), Gather (Gather_774), Transpose (Transpose_775), Unsqueeze (Unsqueeze_776), Conv (Conv_777), LeakyRelu (LeakyRelu_778), ConvTranspose (ConvTranspose_779), Conv (Conv_781), LeakyRelu (LeakyRelu_782), Conv (Conv_783), Add (Add_784), LeakyRelu (LeakyRelu_785), Conv (Conv_786), LeakyRelu (LeakyRelu_787), Conv (Conv_788), Add (Add_789), LeakyRelu (LeakyRelu_790), Conv (Conv_791), LeakyRelu (LeakyRelu_792), Conv (Conv_793), Add (Add_794), Conv (Conv_796), LeakyRelu (LeakyRelu_797), Conv (Conv_798), Add (Add_799), LeakyRelu (LeakyRelu_800), Conv (Conv_801), LeakyRelu (LeakyRelu_802), Conv (Conv_803), Add (Add_804), LeakyRelu (LeakyRelu_805), Conv (Conv_806), LeakyRelu (LeakyRelu_807), Conv (Conv_808), Add (Add_809), Add (Add_810), LeakyRelu (LeakyRelu_811), Conv (Conv_812), LeakyRelu (LeakyRelu_813), Conv (Conv_814), Add (Add_815), LeakyRelu (LeakyRelu_816), Conv (Conv_817), LeakyRelu (LeakyRelu_818), Conv (Conv_819), Add (Add_820), LeakyRelu (LeakyRelu_821), Conv (Conv_822), LeakyRelu (LeakyRelu_823), Conv (Conv_824), Add (Add_825), Add (Add_826), Div (Div_827), LeakyRelu (LeakyRelu_828), ConvTranspose (ConvTranspose_829), Conv (Conv_831), LeakyRelu (LeakyRelu_832), Conv (Conv_833), Add (Add_834), LeakyRelu (LeakyRelu_835), Conv (Conv_836), LeakyRelu (LeakyRelu_837), Conv (Conv_838), Add (Add_839), LeakyRelu (LeakyRelu_840), Conv (Conv_841), LeakyRelu (LeakyRelu_842), Conv (Conv_843), Add (Add_844), Conv (Conv_846), LeakyRelu (LeakyRelu_847), Conv (Conv_848), Add (Add_849), LeakyRelu (LeakyRelu_850), Conv (Conv_851), LeakyRelu (LeakyRelu_852), Conv (Conv_853), Add (Add_854), LeakyRelu (LeakyRelu_855), Conv (Conv_856), LeakyRelu (LeakyRelu_857), Conv (Conv_858), Add (Add_859), Add (Add_860), LeakyRelu (LeakyRelu_861), Conv (Conv_862), LeakyRelu (LeakyRelu_863), Conv (Conv_864), Add (Add_865), LeakyRelu (LeakyRelu_866), Conv (Conv_867), LeakyRelu (LeakyRelu_868), Conv (Conv_869), Add (Add_870), LeakyRelu (LeakyRelu_871), Conv (Conv_872), LeakyRelu (LeakyRelu_873), Conv (Conv_874), Add (Add_875), Add (Add_876), Div (Div_877), LeakyRelu (LeakyRelu_878), ConvTranspose (ConvTranspose_879), Conv (Conv_881), LeakyRelu (LeakyRelu_882), Conv (Conv_883), Add (Add_884), LeakyRelu (LeakyRelu_885), Conv (Conv_886), LeakyRelu (LeakyRelu_887), Conv (Conv_888), Add (Add_889), LeakyRelu (LeakyRelu_890), Conv (Conv_891), LeakyRelu (LeakyRelu_892), Conv (Conv_893), Add (Add_894), Conv (Conv_896), LeakyRelu (LeakyRelu_897), Conv (Conv_898), Add (Add_899), LeakyRelu (LeakyRelu_900), Conv (Conv_901), LeakyRelu (LeakyRelu_902), Conv (Conv_903), Add (Add_904), LeakyRelu (LeakyRelu_905), Conv (Conv_906), LeakyRelu (LeakyRelu_907), Conv (Conv_908), Add (Add_909), Add (Add_910), LeakyRelu (LeakyRelu_911), Conv (Conv_912), LeakyRelu (LeakyRelu_913), Conv (Conv_914), Add (Add_915), LeakyRelu (LeakyRelu_916), Conv (Conv_917), LeakyRelu (LeakyRelu_918), Conv (Conv_919), Add (Add_920), LeakyRelu (LeakyRelu_921), Conv (Conv_922), LeakyRelu (LeakyRelu_923), Conv (Conv_924), Add (Add_925), Add (Add_926), Div (Div_927), LeakyRelu (LeakyRelu_928), ConvTranspose (ConvTranspose_929), Conv (Conv_931), LeakyRelu (LeakyRelu_932), Conv (Conv_933), Add (Add_934), LeakyRelu (LeakyRelu_935), Conv (Conv_936), LeakyRelu (LeakyRelu_937), Conv (Conv_938), Add (Add_939), LeakyRelu (LeakyRelu_940), Conv (Conv_941), LeakyRelu (LeakyRelu_942), Conv (Conv_943), Add (Add_944), Conv (Conv_946), LeakyRelu (LeakyRelu_947), Conv (Conv_948), Add (Add_949), LeakyRelu (LeakyRelu_950), Conv (Conv_951), LeakyRelu (LeakyRelu_952), Conv (Conv_953), Add (Add_954), LeakyRelu (LeakyRelu_955), Conv (Conv_956), LeakyRelu (LeakyRelu_957), Conv (Conv_958), Add (Add_959), Add (Add_960), LeakyRelu (LeakyRelu_961), Conv (Conv_962), LeakyRelu (LeakyRelu_963), Conv (Conv_964), Add (Add_965), LeakyRelu (LeakyRelu_966), Conv (Conv_967), LeakyRelu (LeakyRelu_968), Conv (Conv_969), Add (Add_970), LeakyRelu (LeakyRelu_971), Conv (Conv_972), LeakyRelu (LeakyRelu_973), Conv (Conv_974), Add (Add_975), Add (Add_976), Div (Div_977), LeakyRelu (LeakyRelu_978), Conv (Conv_979), Tanh (Tanh_980), Squeeze (Squeeze_981), FusedConv (Conv_366_Relu_367), FusedConv (Conv_533_Relu_534), FusedConv (Conv_563_Relu_564), FusedConv (Conv_730_Relu_731), LayerNormalization (LayerNormalization), LayerNormalization (LayerNormalization_token_0), LayerNormalization (LayerNormalization_token_1), LayerNormalization (LayerNormalization_token_2), LayerNormalization (LayerNormalization_token_3), LayerNormalization (LayerNormalization_token_4), LayerNormalization (LayerNormalization_token_5), LayerNormalization (LayerNormalization_token_6), LayerNormalization (LayerNormalization_token_7), LayerNormalization (LayerNormalization_token_8), LayerNormalization (LayerNormalization_token_9), ]

https://netron.app/ にdecode.onnxを放り込んで↑のログとにらめっこすると何かわかるかもしれません。ただonnxruntimeをいじってノードに置き場所を強制するのは多分不可能で、onnx変換時にモデルを分割して頑張ることになりそうです

Hiroshiba

コメント追加などありがとうございます、とても読みやすいです！
１つだけ追加でコメントしました。そちらが完了し次第、一旦マージできればと思っています。

GPUの方が遅いことに関して実測してみたところ、テキスト長を増やしていくと逆転することがわかりました！

テキストは１番目は「テスト、0」、二番目は「テスト、0、1」というように増やしています。
どうやらCPU・GPUどちらを使ったほうが速いかは、文字数（というよりはyukarin_sの出力による音素の全長）に依存しそうです。

おそらく通信がかなりのコストになっている

同意です！　メモリ転送・命令通信どっちも変なコストが掛かってそうな印象です。
CPU→GPUのメモリ転送やその逆はもっと早くする方法があるかもです。（pytorchにおけるpinned memory）
命令通信はCUDAレベルの最適化が必要な気がしています。文字数が少ないときはたぶんもっと速くなれるはずです。

onnx/python/core/__init__.py

Hiroshiba

LGTM･･･！！！！！！！！！！！

頂いたプルリクエストは、VOICEVOXの大きな課題を２つ解決しています。

１つ目は容量が非常に大きい問題です。
チェックや暗号化などが必要な関係ですぐに製品版VOICEVOXに反映できませんが、おそらく次の次のマイナーアップデートの際に容量が激減するはずです。

２つ目は、コアの部分のOSS化が全くできていなかったC++実装の一歩目を作っていただけたことです。
入出力さえあっていれば別のモデルでもちゃんと動き、かつ（非常に険しい道程ですが）自分のVOICEVOXを作ることができるようになるはずなので、かなり大きな一歩だと思います。

リリースの際は少し大きめに広告し、 @Yosshi999 さんの貢献であることも紹介したいと思います。（もしよろしければツイッターIDを教えてください・・・！）

一方で、頂いたこの実装に機能を追加するには、onnx化されたモデルが必要になります。
今だとvv_core_inferenceにあるモデルをコンバートする必要があって、チェック実行にかなり手間がかかってしまいます。
もしよろしければ、vv_core_inferenceにあるモデルをこのリポジトリに追加するプルリクエストや、ドキュメントの追加もお願いしたいです。

本当にありがとうございました！！！

Yosshi999 · 2021-10-28T18:06:40Z

もしよろしければツイッターID

😄 https://twitter.com/__dAi00

vv_core_inferenceにあるモデルをこのリポジトリに追加するプルリクエスト

一部(decode.onnx)が50MB近くあるのですが、そのまま突っ込んでしまっても良いものでしょうか。一応gdrive linkもあるのですが僕の一存で共有が止まるようなところに置いておくのもアレな気がします。
GitHub LFSはPRに対してどういう扱いになるんでしたっけ

ドキュメントの追加

👍
onnx/README.md の内容を充実させるということですかね？

Hiroshiba · 2021-10-28T18:59:25Z

ツイッターIDありがとうございます！

一部(decode.onnx)が50MB近くある

LFSにしてもよいのですが、50MBだと月20回pullされるだけで帯域がなくなるんですよね･･･
50MBくらいならそんなに重たくないので、一旦そのままpushでお願いします！

onnx/README.md の内容を充実させる

ですね！
README.mdにあるpythonのコード例のように、とりあえずコピペしたら実行できそうなものがあると良いのかなと！

to 0.1.3

Yosshi999 added 2 commits October 19, 2021 16:11

support onnx

70bf782

add README

ad9c70f

Hiroshiba reviewed Oct 19, 2021

View reviewed changes

Yosshi999 added 2 commits October 20, 2021 01:38

copy cuda-provider dll

deb7def

dependency: onnxruntime_providers_shared.dll

8581335

Yosshi999 added 5 commits October 21, 2021 17:50

refactoring

a8908cc

python setup

aaea8df

bugfix

27e680a

set rpath

27383cf

bugfix: old gcc can't handle std::filesystem

59c2016

Yosshi999 marked this pull request as ready for review October 21, 2021 12:44

set runtime_lib_dirs to empty for Windows

fc0c344

Hiroshiba reviewed Oct 24, 2021

View reviewed changes

onnx/python/setup.py Outdated Show resolved Hide resolved

Hiroshiba requested changes Oct 24, 2021

View reviewed changes

Yosshi999 added 2 commits October 25, 2021 18:22

apply suggestions

1eac345

bugfix: broken variable

bf63a04

Yosshi999 force-pushed the support-onnx branch from c353702 to bf63a04 Compare October 25, 2021 10:07

Yosshi999 added 4 commits October 25, 2021 19:13

bugfix: some compilers can't infer type

03f65d8

leave todo comment and eliminate exception

1a01df5

re-initialization

6d36b02

finalize

5522745

tuning

38ff823

Hiroshiba requested changes Oct 26, 2021

View reviewed changes

onnx/python/core/__init__.py Outdated Show resolved Hide resolved

Yosshi999 added 2 commits October 27, 2021 22:05

fix broken document

8f95ae7

now finalize is called on user-side

af7de9f

Hiroshiba approved these changes Oct 27, 2021

View reviewed changes

Hiroshiba merged commit 291ea10 into VOICEVOX:main Oct 27, 2021

This was referenced Oct 27, 2021

ビルドできるようなファイル構造に変える #27

Closed

Github Actionsでビルドする #28

Closed

aoirint mentioned this pull request Oct 28, 2021

ONNX版の自動ビルドを追加 #33

Merged

3 tasks

Yosshi999 mentioned this pull request Oct 30, 2021

onnx版のドキュメントを詳しくする＆モデルの同梱 #34

Merged

Hiroshiba mentioned this pull request Nov 10, 2021

Expand GPU support with onnx runtime #39

Closed

Hiroshiba mentioned this pull request Dec 2, 2021

コアのonnx化 VOICEVOX/voicevox_project#4

Closed

Hiroshiba mentioned this pull request Jan 6, 2022

previewのonnx版コアを使った合成で、開始・終了無音を0.3~0.4秒くらいにすると音声が変になることがある #62

Closed

qryxip pushed a commit to qryxip/voicevox_core that referenced this pull request Jan 19, 2023

Merge pull request VOICEVOX#21 from SHAREVOX/sv-release-0.1

59c0f02

to 0.1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support onnx #21

Support onnx #21

Yosshi999 commented Oct 19, 2021 •

edited

Loading

Hiroshiba left a comment

Yosshi999 commented Oct 19, 2021

Hiroshiba commented Oct 19, 2021

Hiroshiba commented Oct 19, 2021

Yosshi999 commented Oct 19, 2021

Hiroshiba commented Oct 20, 2021

Yosshi999 commented Oct 21, 2021

Hiroshiba commented Oct 21, 2021

Hiroshiba commented Oct 24, 2021

Hiroshiba left a comment

Yosshi999 commented Oct 25, 2021

Yosshi999 commented Oct 26, 2021

Yosshi999 commented Oct 26, 2021

Hiroshiba left a comment

Hiroshiba left a comment •

edited

Loading

Yosshi999 commented Oct 28, 2021 •

edited

Loading

Hiroshiba commented Oct 28, 2021

Support onnx #21

Support onnx #21

Conversation

Yosshi999 commented Oct 19, 2021 • edited Loading

Hiroshiba left a comment

Choose a reason for hiding this comment

Yosshi999 commented Oct 19, 2021

Hiroshiba commented Oct 19, 2021

Hiroshiba commented Oct 19, 2021

Yosshi999 commented Oct 19, 2021

Hiroshiba commented Oct 20, 2021

Yosshi999 commented Oct 21, 2021

Hiroshiba commented Oct 21, 2021

Hiroshiba commented Oct 24, 2021

Hiroshiba left a comment

Choose a reason for hiding this comment

Yosshi999 commented Oct 25, 2021

Yosshi999 commented Oct 26, 2021

Yosshi999 commented Oct 26, 2021

Hiroshiba left a comment

Choose a reason for hiding this comment

Hiroshiba left a comment • edited Loading

Choose a reason for hiding this comment

Yosshi999 commented Oct 28, 2021 • edited Loading

Hiroshiba commented Oct 28, 2021

Yosshi999 commented Oct 19, 2021 •

edited

Loading

Hiroshiba left a comment •

edited

Loading

Yosshi999 commented Oct 28, 2021 •

edited

Loading