-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first attempt of mono downmix in the magnitude domain #45
base: master
Are you sure you want to change the base?
Conversation
The specific implementation seems alright in general. But probably a little more high-level question - are we gonna have waveform downmix, too? Ugh, out of sudden, I felt like what if we have data types of
Ok none of this comment is not relevant to this commit :P |
Hey, in #49, I tried to have this |
torchaudio_contrib/functional.py
Outdated
@@ -47,6 +47,10 @@ def stft(signal, fft_len, hop_len, window, | |||
return spect | |||
|
|||
|
|||
def spectral_downmix(tensor, power=1.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I would prefer to add an option axis=-2
that defaults to the expected axis in the case of (batch, channel, time)
. But why restrict to that? it might happen that some user has (batch, other_stuff, channel, time)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, yes. I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, this is a general comment. If we add the axis
option here a user would expect the same for all other functionals as well. But it should be fine, if we start with spectral_downmix
and later add it to other functionals.
157eb69
to
fad5826
Compare
Oops? Have no idea why all the commits I made to a different branch became a part of here. But thanks! Here're some comments.
. Also, for a batch tensor, let's use a plural form! |
should be fine now. There was a missbehaved rebase going on after I pulled in the recent master. Regarding you requests, I will update those. |
|
I don't think so. By having |
(icymi, we didn't really put it on a vote, but |
ha, yep, sorry... let vote first ;-) |
are the tests added in the master stable? In that case I can provide some for the downmix functions as well |
Yes I think so. |
#54 (comment) We got the names :) |
@faroit Hey, I'm a bit lost, but seems like we resolved all the naming issues as well as the implementations? Is it ready to be merged now? |
Well, I want to add unit tests but I am confused now if we could stick with pytest or not? |
I see, +1 for unit tests. I think we have to come back to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, plus we seem to have agreed to go from "noun+verb" to "verb+noun" naming!
@@ -103,6 +103,30 @@ def stft(waveforms, fft_len, hop_len, window, | |||
return complex_specgrams | |||
|
|||
|
|||
def waveform_downmix(waveforms, ch_dim=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just call it dim
. There is only one dimension to specify, and this is consistent with pytorch. Maybe someone may want to downmix across the batch dimension. Make sure it's included in the docstring, though. The docstring should also mention that it's downmixing by taking the mean.
Wrap torchaudio_contrib.waveform_downmix in an nn.Module. | ||
""" | ||
|
||
def __init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dim
option should be available (and documented) here as well.
Downmixing in the magnitude domain is the recommended way for multichannel audio, since its energy preserving. Let me know if the API makes sense here.