Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse operations, wiener filter, softmask #5

Open
faroit opened this issue Feb 18, 2019 · 5 comments
Open

Inverse operations, wiener filter, softmask #5

faroit opened this issue Feb 18, 2019 · 5 comments

Comments

@faroit
Copy link
Collaborator

faroit commented Feb 18, 2019

Concerning additional operators, the most valuable at this point would be inverse operators. However, for this, we would have to wait until ISTFT is implemented.

But maybe it would be nice to implement some operations already like wiener filtering/soft masking/binary masking. Preferably all for multichannel spectrograms.

I could add this, if you like the idea

@dansuh17
Copy link
Contributor

Wiener filter would be incredible. I would love to have them.

@hagenw
Copy link
Contributor

hagenw commented Mar 13, 2019

Do all the audio transforms have to work on torch.Tensor. I implemented so far most of my audio transforms for numpy arrays and did a conversion to torch.Tensor at the end.

For stft and istft it allows to use librosa and also adds the possibility to work easily with signals of arbitrary dimensions, e.g. (note: some code is missing to simplify the argument):

def stft(signal, window_size, hop_size, window='hann', axis=-1):
    fft_config = dict(n_fft=window_size, hop_length=hop_size,
                             win_length=window_size, window=window)
    return np.apply_along_axis(librosa.stft, axis, signal, **fft_config)

def istft(spectrogram, window_size, hop_size, window='hann', axis=-2):
    ifft_config = dict(hop_length=hop_size, win_length=window_size,
                              window=window)
    # ... some reshaping code
    return np.apply_along_axis(_istft, axis, D, f, t, **ifft_config)

def _istft(spectrogram, frequency_bins, time_bins, **config):
    spectrogram = np.reshape(spectrogram, [frequency_bins, time_bins])
    return librosa.istft(spectrogram, **config)

@faroit
Copy link
Collaborator Author

faroit commented Mar 13, 2019

Do all the audio transforms have to work on torch.Tensor

yes, the aim is that all audio transforms (not to be confused with augmentation transforms should be able to run on GPU as aprt of the model.

In any case, we would need to wait for the istft

@hagenw
Copy link
Contributor

hagenw commented Mar 13, 2019

Ah, ok. I use all my audio transforms as augmentation transforms.

@faroit
Copy link
Collaborator Author

faroit commented Mar 13, 2019

Ah, ok. I use all my audio transform as augmentation transforms.

yes, currently I do the same. the dataset transforms are bound to CPU, but maybe that will change. I think we would still benefit from pure pytorch implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants