Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first attempt of mono downmix in the magnitude domain #45

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

faroit
Copy link
Collaborator

@faroit faroit commented May 11, 2019

Downmixing in the magnitude domain is the recommended way for multichannel audio, since its energy preserving. Let me know if the API makes sense here.

@faroit faroit added the Feature label May 11, 2019
@faroit faroit requested review from ksanjeevan and keunwoochoi May 11, 2019 09:57
@keunwoochoi
Copy link
Owner

The specific implementation seems alright in general. But probably a little more high-level question - are we gonna have waveform downmix, too?

Ugh, out of sudden, I felt like what if we have data types of WaveformTensor and SpectrumTensor with which we can do more like functional stuff e.g.,

x = WaveformTensor(batch_audio)
x.spectrogram().downmix().amplitude_to_decibel()

Ok none of this comment is not relevant to this commit :P

torchaudio_contrib/functional.py Show resolved Hide resolved
torchaudio_contrib/layers.py Outdated Show resolved Hide resolved
@keunwoochoi
Copy link
Owner

Hey, in #49, I tried to have this beta_something.py. Maybe without a test, this approach could make sense here, too? (not necessarily though)

@@ -47,6 +47,10 @@ def stft(signal, fft_len, hop_len, window,
return spect


def spectral_downmix(tensor, power=1.0):
Copy link
Contributor

@hagenw hagenw May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would prefer to add an option axis=-2 that defaults to the expected axis in the case of (batch, channel, time). But why restrict to that? it might happen that some user has (batch, other_stuff, channel, time).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, yes. I agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this is a general comment. If we add the axis option here a user would expect the same for all other functionals as well. But it should be fine, if we start with spectral_downmix and later add it to other functionals.

@faroit faroit force-pushed the feature_spectrogram_downmix branch from 157eb69 to fad5826 Compare June 4, 2019 15:36
@keunwoochoi
Copy link
Owner

Oops? Have no idea why all the commits I made to a different branch became a part of here. But thanks! Here're some comments.

  • I understood we agreed that modifying Spectrogram API (mono) is a separate issue -- and indeed it is, so it should be completely removed here.
  • On the name, I commented here and am still of the opinion (hence SpectrumDownmix and WaveformDownmix). And there, because 'waveform' is a noun, I'd prefer Spectrum to Spectral.
  • For waveform downmix, given that the current torchaudio is going to be changed a lot + this type conversion doesn't seem alright for me, probably it's better to implement our own functional here? Which will be simply..
return torch.mean(waveforms, ch_dim, True)

. Also, for a batch tensor, let's use a plural form!

@faroit
Copy link
Collaborator Author

faroit commented Jun 4, 2019

Oops? Have no idea why all the commits I made to a different branch became a part of here. But thanks! Here're some comments.

should be fine now. There was a missbehaved rebase going on after I pulled in the recent master.

Regarding you requests, I will update those.

@faroit
Copy link
Collaborator Author

faroit commented Jun 4, 2019

DownmixSpectrum would also work on power_specgrams. Should I rename the input for this to just specgrams then?

@keunwoochoi
Copy link
Owner

I don't think so. By having power= args we're expecting the input to be mag_specgrams.

@keunwoochoi
Copy link
Owner

(icymi, we didn't really put it on a vote, but DownmixSpectrum --> SpectrumDownmix? especially if you agree, which is kinda voting :)

@faroit
Copy link
Collaborator Author

faroit commented Jun 4, 2019

(icymi, we didn't really put it on a vote, but DownmixSpectrum --> SpectrumDownmix? especially if you agree, which is kinda voting :)

ha, yep, sorry... let vote first ;-)

@faroit faroit marked this pull request as ready for review June 4, 2019 16:01
@faroit
Copy link
Collaborator Author

faroit commented Jun 4, 2019

are the tests added in the master stable? In that case I can provide some for the downmix functions as well

@keunwoochoi
Copy link
Owner

Yes I think so.

@keunwoochoi
Copy link
Owner

#54 (comment) We got the names :) DownmixWaveform and DownmixSpectrum.

@keunwoochoi
Copy link
Owner

@faroit Hey, I'm a bit lost, but seems like we resolved all the naming issues as well as the implementations? Is it ready to be merged now?

@faroit
Copy link
Collaborator Author

faroit commented Jun 10, 2019

Well, I want to add unit tests but I am confused now if we could stick with pytest or not?

@keunwoochoi
Copy link
Owner

I see, +1 for unit tests. I think we have to come back to unittest, at least for some 'core' features that we plan to make a PR there, and this one would be definitely core.

Copy link
Collaborator

@f0k f0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, plus we seem to have agreed to go from "noun+verb" to "verb+noun" naming!

@@ -103,6 +103,30 @@ def stft(waveforms, fft_len, hop_len, window,
return complex_specgrams


def waveform_downmix(waveforms, ch_dim=1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just call it dim. There is only one dimension to specify, and this is consistent with pytorch. Maybe someone may want to downmix across the batch dimension. Make sure it's included in the docstring, though. The docstring should also mention that it's downmixing by taking the mean.

Wrap torchaudio_contrib.waveform_downmix in an nn.Module.
"""

def __init__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dim option should be available (and documented) here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants