-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extra arguments to hubert pretrain factory functions #2345
Conversation
@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the factory functions here support the mask_channel_length
parameter unlike what the PR summary describes -- which one should it be?
torchaudio/models/wav2vec2/model.py
Outdated
@@ -1096,6 +1114,9 @@ def hubert_pretrain_xlarge( | |||
encoder_ff_interm_dropout: float = 0.0, | |||
encoder_dropout: float = 0.0, | |||
encoder_layer_drop: float = 0.0, | |||
mask_prob: float = 0.8, | |||
mask_channel_prob: float = 0.0, | |||
mask_channel_length: float = 10, | |||
) -> HuBERTPretrainModel: | |||
# Overriding the signature so that the return type is correct on Sphinx | |||
"""hubert_pretrain_xlarge(encoder_projection_dropout: float = 0.0, encoder_attention_dropout: float = 0.0, encoder_ff_interm_dropout: float = 0.0, encoder_dropout: float = 0.0, encoder_layer_drop: float = 0.0) -> torchaudio.models.HuBERTPretrainModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above -- add new params to sphinx signature override
|
@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks
Summary: In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)). This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function. Pull Request resolved: pytorch#2345 Reviewed By: carolineechen, xiaohui-zhang Differential Revision: D35845117 Pulled By: nateanl fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7
In different pre-training and fine-tuning settings, the
mask_prob
,mask_channel_prob
, andmask_channel_length
are different. For example, the settings in pre-training and fine-tuning are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: fine-tune on 10 minutes of audio).This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning.
mask_length
is set to10
by default for all cases, hence it's not included in the factory function.