Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about get_downsample_rate() in upstream/mockingjay/expert.py #96

Closed
Sreyan88 opened this issue Mar 1, 2021 · 5 comments
Closed
Labels
paper / research / method Questions related to paper or research

Comments

@Sreyan88
Copy link

Sreyan88 commented Mar 1, 2021

Hi there,

There might be a possible mistake in:

def get_downsample_rate(self):
        return 160

Since each audio is downsampled to 16khz, I think this should be 16?

I have the same doubt with:

example_wavs = [torch.zeros(160000, dtype=torch.float).to(device) for _ in range(16)]

in upstream/mockingjay/example_extract.py. Should it be 16000 instead?

Please let me know if I am making a mistake anywhere. Thank You.

@leo19941227
Copy link
Member

16khz is the sample rate of raw audio waveform.
160 is the further downstream rate of mockingjay

@Sreyan88
Copy link
Author

Sreyan88 commented Mar 2, 2021

Thank you for the clarification! :) Would be great if you provide a link or a small explanation as to how the downstream rate of mockingjay is being used as I am new to this. :)

@Sreyan88 Sreyan88 closed this as completed Mar 2, 2021
@Sreyan88 Sreyan88 reopened this Mar 2, 2021
@andi611
Copy link
Member

andi611 commented Mar 2, 2021

Hi,

Given a 16kHz audio, every downsample_rate amount of audio samples will be encoded to a representation vector. That is what downsample_rate means here.

I have the same doubt with:
example_wavs = [torch.zeros(160000, dtype=torch.float).to(device) for _ in range(16)]
in upstream/mockingjay/example_extract.py. Should it be 16000 instead?

As such, you can have any positive integer for torch.zeros(any_num, dtype=torch.float).

The downsampling rate is related to the feature preprocessing process itself, and not Mockingjay.
It just happens that we pre-train Mockingjay on features that have a downsampling rate of 160, as is usually done for reconstruction methods (for example APC, VQ-APC, NPC, etc).
Mockingjay will generate one representation per input acoustic frame.

I will close this for now since there is no mistake.
You can ask more follow-up questions in this thread if needed.

Andy

@andi611 andi611 closed this as completed Mar 2, 2021
@andi611 andi611 changed the title Possible mistake in get_downsample_rate() in upstream/mockingjay/expert.py Questions about get_downsample_rate() in upstream/mockingjay/expert.py Mar 2, 2021
@andi611 andi611 added the paper / research / method Questions related to paper or research label Mar 2, 2021
@Sreyan88
Copy link
Author

Sreyan88 commented Mar 9, 2021

Thank you so much for the great explanation @andi611 ! I am constantly reading through the repo/paper with a motive of coming up with a pull request soon! Will post more questions soon on the same thread! :)

@andi611
Copy link
Member

andi611 commented Mar 9, 2021

Thank you so much for the great explanation @andi611 ! I am constantly reading through the repo/paper with a motive of coming up with a pull request soon! Will post more questions soon on the same thread! :)

That sounds really nice! We look forward to your pull request!

@chenjoachim chenjoachim mentioned this issue Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paper / research / method Questions related to paper or research
Projects
None yet
Development

No branches or pull requests

3 participants