Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra arguments to hubert pretrain factory functions #2345

Closed
wants to merge 2 commits into from

Conversation

nateanl
Copy link
Member

@nateanl nateanl commented Apr 22, 2022

In different pre-training and fine-tuning settings, the mask_prob, mask_channel_prob, and mask_channel_length are different. For example, the settings in pre-training and fine-tuning are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: fine-tune on 10 minutes of audio).
This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. mask_length is set to 10 by default for all cases, hence it's not included in the factory function.

@facebook-github-bot
Copy link
Contributor

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the factory functions here support the mask_channel_length parameter unlike what the PR summary describes -- which one should it be?

torchaudio/models/wav2vec2/model.py Outdated Show resolved Hide resolved
@@ -1096,6 +1114,9 @@ def hubert_pretrain_xlarge(
encoder_ff_interm_dropout: float = 0.0,
encoder_dropout: float = 0.0,
encoder_layer_drop: float = 0.0,
mask_prob: float = 0.8,
mask_channel_prob: float = 0.0,
mask_channel_length: float = 10,
) -> HuBERTPretrainModel:
# Overriding the signature so that the return type is correct on Sphinx
"""hubert_pretrain_xlarge(encoder_projection_dropout: float = 0.0, encoder_attention_dropout: float = 0.0, encoder_ff_interm_dropout: float = 0.0, encoder_dropout: float = 0.0, encoder_layer_drop: float = 0.0) -> torchaudio.models.HuBERTPretrainModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above -- add new params to sphinx signature override

torchaudio/models/wav2vec2/model.py Show resolved Hide resolved
@nateanl
Copy link
Member Author

nateanl commented Apr 22, 2022

mask_channel_length is the one that is included. Since mask_length value doesn't change in pre-training and fine-tuning, it will not be included in the argument list.

@facebook-github-bot
Copy link
Contributor

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)).
This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function.

Pull Request resolved: pytorch#2345

Reviewed By: carolineechen, xiaohui-zhang

Differential Revision: D35845117

Pulled By: nateanl

fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants