Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference from pre-trained model #5

Open
ssolito opened this issue Feb 20, 2023 · 3 comments
Open

Inference from pre-trained model #5

ssolito opened this issue Feb 20, 2023 · 3 comments

Comments

@ssolito
Copy link

ssolito commented Feb 20, 2023

Hello,

I am considering using your pre-trained model to perform an objective evaluation of some Spanish and Basque language models. Could you tell me if it is possible to use checkpoint on these languages as well or if the model is language-dependent?
Thank you

@ecooper7
Copy link
Contributor

Hi,

The pretrained models have only seen English data and haven't been verified to work well out-of-the-box on other languages (and even for English, it's still experimental and not a well-established metric) -- we've tried it on some data from other languages and the correlations were ok, but the errors were high. So, you can certainly try it and see, but I would recommend using it and interpreting the results with caution.

@ssolito
Copy link
Author

ssolito commented Mar 14, 2023

Hi,

I am linking back to the same question. I would like to know if it is possible to use your Code to perform inference on a different dataset than the one you used for the VoiceMOS Challenge. I tried to run the code but it obviously refers to your val_mos_list.txt .

In case it is possible, what would be the steps to follow? I for the moment have referenced: python run_inference_for_challenge.py --datadir /mydata/

and the error I have is this:

RuntimeError: Error loading audio file: failed to open file /home/aholab/sarah/IMS-Toucan/audios/Mono/Spanish_Aintzane/sys64e2f-uttad5f41e.wav

And sys64e2f-uttad5f41e is one of the audio of the val_mos_list.txt, thus it does not exist in my dataset.

@ecooper7
Copy link
Contributor

Hi,

We don't have straightforward inference scripts set up just yet, but we are in the process of adding some. In the mean time, please try the following:

First, you have to download pretrained models which it sounds like you probably did already. (In run_inference_for_challenge.py see steps 1 and 2.)

Then you can look at predict.py for running inference -- the data directory that you point it to is expected to have a subdirectory called wav, as well as a file called sets/val_mos_list.txt that is just a list of wav files and their MOS ratings, e.g.:

sys64e2f-utt8c3d2b2.wav,4.0
sys64e2f-utt3a1aedf.wav,3.625
sys64e2f-utt549b7c4.wav,4.125
sys64e2f-utt0c4d719.wav,3.75
sys64e2f-utt4eddf90.wav,3.625

Replace this with a list of your own wav files, and you can just put dummy MOS numbers there, it's just for computing MSE and correlations, etc. to evaluate the trained MOS prediction model, it doesn't affect the predictions themselves.

There is also expected to be a file called mydata_system.csv which has system-level averaged MOS values. You can just comment this out (the section of code that starts with ### SYSTEM, up until the part that says ## generate answer.txt for codalab)

MOS predictions for each wav file will be written to an output file called answer.txt.

By the way, the model was trained for MOS prediction on audio which was downsampled to 16kHz and normalized using sv56, so it's best if your input matches this and is also at a 16kHz sampling rate and has been sv56 normalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants