-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference from pre-trained model #5
Comments
Hi, The pretrained models have only seen English data and haven't been verified to work well out-of-the-box on other languages (and even for English, it's still experimental and not a well-established metric) -- we've tried it on some data from other languages and the correlations were ok, but the errors were high. So, you can certainly try it and see, but I would recommend using it and interpreting the results with caution. |
Hi, I am linking back to the same question. I would like to know if it is possible to use your Code to perform inference on a different dataset than the one you used for the VoiceMOS Challenge. I tried to run the code but it obviously refers to your val_mos_list.txt . In case it is possible, what would be the steps to follow? I for the moment have referenced: python run_inference_for_challenge.py --datadir /mydata/ and the error I have is this: RuntimeError: Error loading audio file: failed to open file /home/aholab/sarah/IMS-Toucan/audios/Mono/Spanish_Aintzane/sys64e2f-uttad5f41e.wav And sys64e2f-uttad5f41e is one of the audio of the val_mos_list.txt, thus it does not exist in my dataset. |
Hi, We don't have straightforward inference scripts set up just yet, but we are in the process of adding some. In the mean time, please try the following: First, you have to download pretrained models which it sounds like you probably did already. (In Then you can look at
Replace this with a list of your own wav files, and you can just put dummy MOS numbers there, it's just for computing MSE and correlations, etc. to evaluate the trained MOS prediction model, it doesn't affect the predictions themselves. There is also expected to be a file called MOS predictions for each wav file will be written to an output file called By the way, the model was trained for MOS prediction on audio which was downsampled to 16kHz and normalized using sv56, so it's best if your input matches this and is also at a 16kHz sampling rate and has been sv56 normalized. |
Hello,
I am considering using your pre-trained model to perform an objective evaluation of some Spanish and Basque language models. Could you tell me if it is possible to use checkpoint on these languages as well or if the model is language-dependent?
Thank you
The text was updated successfully, but these errors were encountered: