-
-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537
Comments
Among the people who raised their thumb on this issue, anyone wants to take care of it? |
Hi, I am the initiator of Wespeaker, thanks for the interest of our toolkit! |
Thanks @wsstriving! I worked on this a few days ago and already have a working prototype. Instead of adding one more dependency to I am just stuck with the fact that WeSpeaker uses Apache-2.0 license, while pyannote uses MIT license. Both are permissive but I am not quite sure where and how to mention WeSpeaker license into pyannote codebase. Would putting it at the top of the Another option that I am considering is adding |
Hi Bredin, I think it's just fine for the first option. We implemented the CLI support and you can check it here https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md Now, it's easy to use the wespeaker model in pytorch as: import wespeaker
model = wespeaker.load_model('english')
model.set_gpu(0)
print(model.model)
# model.model(feats) Check https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/cli/speaker.py#L63 for more details to use it. |
Quick update:
Could any of you (who raised their thumbs) try the following:
|
I made a quick test, I don't have checked the results and i'm unsure of the pipeline def. what i run: from pyannote.audio.pipelines import SpeakerDiarization
from pyannote.audio.pipelines.utils.hook import ProgressHook
import torch
pipeline = SpeakerDiarization(segmentation="pyannote/segmentation-3.0",embedding="pyannote/wespeaker-voxceleb-resnet34-LM")
pipeline.instantiate({
"segmentation": {
"min_duration_off": 0.0,
},
"clustering": {
"method": "centroid",
"min_cluster_size": 12,
"threshold": 0.7045654963945799,
},
})
pipeline.to(torch.device("mps"))
with ProgressHook() as hook:
diarization = pipeline("./download/test.wav", hook=hook) i got this warning: Seems to work on CPU. For GPU (mac m1 max) i got this error: |
Thanks @stygmate for the feedback. To use the same setup as from pyannote.audio.pipelines import SpeakerDiarization
from pyannote.audio.pipelines.utils.hook import ProgressHook
from pyannote.audio import Audio
import torch
pipeline = SpeakerDiarization(
segmentation="pyannote/segmentation-3.0",
segmentation_batch_size=32
embedding="pyannote/wespeaker-voxceleb-resnet34-LM",
embedding_exclude_overlap=True,
embedding_batch_size=32)
# other values of `*_batch_size` may lead to faster processing.
# the larger may not necessarily be the faster.
pipeline.instantiate({
"segmentation": {
"min_duration_off": 0.0,
},
"clustering": {
"method": "centroid",
"min_cluster_size": 12,
"threshold": 0.7045654963945799,
},
})
# send the pipeline to your prefered device
device = torch.device("cpu")
device = torch.device("cuda")
device = torch.device("mps")
pipeline.to(device)
# load audio in memory (usually leads to faster processing)
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io(audio)
file = {"waveform": waveform, "sample_rate": sample_rate}
# process the audio
with ProgressHook() as hook:
diarization = pipeline(file, hook=hook) I'd love to get feedback from you all regarding possible algorithmic or speed regressions . |
@hbredin Give me a wav file to process, I will send you the results. |
Closing as latest version no longer relies on ONNX runtime. |
It work ok . But I use torch 1.XX And i made some changes for comptability with torch 1.xx and torch 2.xx
And change Same changes in file \pyannote\audio\models\blocks\pooling.py |
Thanks for the feedback (and the PR!). |
Since its introduction in
pyannote.audio
3.x, the ONNX dependency seems to cause lots of problem topyannote
users: #1526 #1523 #1517 #1510 #1508 #1481 #1478 #1477 #1475WeSpeaker does provide a pytorch implementation of its pretrained ResNet models.
Let's use this!
The text was updated successfully, but these errors were encountered: