[Feature] existing streaming latency is still takes time, #417

kunci115 · 2024-07-23T07:40:43Z

streaming in 4090 tooks more than 2 second depend on length of token, is there a way to yield it/return while the engine still generating?

Stardust-minus · 2024-07-23T07:41:52Z

PR Welcome

PoTaTo-Mika · 2024-07-23T11:39:13Z

Please compile the model, or try the quantized version.

kunci115 · 2024-07-24T04:42:34Z

@PoTaTo-Mika what do you mean by compile the model ? also how to do quantized version? since I only do steps for inference in english documentation https://speech.fish.audio/en/inference/#2-create-a-directory-structure-similar-to-the-following-within-the-ref_data-folder

PoTaTo-Mika · 2024-07-24T04:47:42Z

there's a python file called quantize.py, you can view the file and choose to quantize.

kunci115 · 2024-07-24T05:00:30Z

there's a python file called quantize.py, you can view the file and choose to quantize.

its creating me a folder quantized version of the model now, just run it like previous run with that checkpoints model? still got the same latency

github-actions · 2024-09-16T00:22:36Z

This issue is stale because it has been open for 30 days with no activity.

kunci115 added the enhancement New feature or request label Jul 23, 2024

github-actions bot added the stale label Sep 16, 2024

AnyaCoder closed this as completed Oct 26, 2024

Provide feedback