Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory footprint in Google Colab #2

Open
ghost opened this issue Dec 14, 2020 · 4 comments
Open

Memory footprint in Google Colab #2

ghost opened this issue Dec 14, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Dec 14, 2020

Thank you so much for this great model ! Wondeful job ! I have just a little question about the memory required for the separation. The model seem use a lot of memory and require to split the audio of a full song (> 1min / 1min30) in Google Colab (free version because no pro version for european users) and resample the audio to from hires (96000Hz) to lowres (44100Hz).

The current jupiter notebook show only process on very short samples (youtube video), I've slightly modify the code to allow using audio from Google Drive but seem to be limited to low resolution / short duration audio file without using splitting/merging audio subprocess. The same limitation of RAM footprint was resolved with Spleeter (Deezer) by a similar method but with some constraints (zero padding to remove in audio) (issue here : deezer/spleeter#391 (comment)).

Is someone already do the job?

@ws-choi
Copy link
Owner

ws-choi commented Dec 14, 2020

Hi MaxC2, thanks for the feedback.
As you mentioned, you have to resample the input file into 44100Hz audio file.
I'll add some code lines for auto resampling later.

, but you don't have to manually split and merge audio sub-process.
When you call the separate_track function of a pretrained model like

separated = model.separate_track(track.audio, 'vocals')

It automatically splits the given track into several sub-audio (each sub-audio has the same number of samples, and the last sub-audio is zero-padded), separates source for each sub-audio, and merges all the separated outputs to make a final audio file.

Below is the code for this.


 def separate_track(self, input_signal, target) -> torch.Tensor:

        import numpy as np

        self.eval()
        with torch.no_grad():
                db = SingleTrackSet(input_signal, self.hop_length, self.num_frame)
                assert target in db.source_names
                separated = []

                input_condition = np.array(db.source_names.index(target))
                input_condition = torch.tensor(input_condition, dtype=torch.long, device=self.device).view(1)

                for item in db:
                    separated.append(self.separate(item.unsqueeze(0).to(self.device), input_condition)[0]
                                     [self.trim_length:-self.trim_length].detach().cpu().numpy())

        separated = np.concatenate(separated, axis=0)

        import soundfile
        soundfile.write('temp.wav', separated, 44100)
        return soundfile.read('temp.wav')[0]

The pytorch dataset API SingleTrackSet automatically splits the given track, in an on-the-fly manner.

After iterating every sub-audio file, separate_track merges all outputs by separated = np.concatenate(separated, axis=0)

Thank you.

@ghost
Copy link
Author

ghost commented Dec 15, 2020

Yes, it's maybe because i've attempt to use to load 96KHz audio with librosa (sr=96000) before calling separate_track and get a kick out from Google Colab out of RAM. I have retry with a 44.1KHz cutted at ~1min30. So now, I will test will a full song resampled at the right sample rate. Thank you very much for your support, and once again well done for your great model !

@ws-choi ws-choi added the enhancement New feature or request label Dec 15, 2020
@ws-choi ws-choi self-assigned this Dec 15, 2020
@ghost
Copy link
Author

ghost commented Dec 15, 2020

OK, I've done some test. The problem come from the use of embedded audio player display(Audio(audio, rate=rate)) that seem duplicate the audio in some manner and use a lot of RAM. So for a big audio file (ex. more 10 minutes) you're always kicked out from Google Colab out of RAM limits.

To do the trick, the idea is not to use the embbed audio preview and directly call the separate_track process.

In a draft form, for using my audio stored in Google Drive, i've write two new cells. The first one is the common Google Drive mount:

    from google.colab import drive
    drive.mount('/content/gdrive', force_remount=True)

The second load any audio file, resample (best filter quality) and convert to stereo if needed. Each processed temp.wav are renamed and writed in a destination subfolder in Google Drive (separated for my case) in order to facilitate the download (zip file).

    import os
    import shutil
    import librosa
    import resampy

    gcolab_root = '/content/Conditioned-Source-Separation-LaSAFT/'
    gdrive_root = '/content/gdrive/My Drive/'

    destination_folder = 'separated'

    default_sample_rate = 44100
    sources = ['vocals', 'drums', 'bass', 'other']

    def load_audio(audio_path):
      audio, rate = librosa.load(audio_path, sr=None, mono=False)
      if rate != default_sample_rate:
        audio = resampy.resample(audio, rate, default_sample_rate, filter='kaiser_best')
      is_mono = audio.ndim == 1
      if is_mono:
        audio = np.asfortranarray(np.array([audio, audio]))
      return audio, rate, is_mono

    def separate_all_sources(audio, gdrive_path):
      for src in sources:
        print("separate '%s'" %src)
        model.separate_track(audio.T, src)
        shutil.copy(os.path.join(gcolab_root, 'temp.wav'), 
                    os.path.join(gdrive_path, src + '.wav'))

    # prepare google drive destination folder
    path = os.path.join(gdrive_root, destination_folder)
    try:
      os.makedirs(path, exist_ok = True)
    except OSError as error:
      print("Directory '%s' can not be created" %path)

    print('load audio source')
    audio_file = os.path.join(gdrive_root, 'audio/stairway/center.flac')
    audio, rate, is_mono = load_audio(audio_file)
    separate_all_sources(audio, path)
    print('finished')

I need to add a fallback to audio format:

  • back to mono
  • back to original sample rate

But for the moment this process work great on my audio files (>10mn, 96kbits 24 bits) without offline preprocessing.

Maybe the idea will be to add a extra method in the python code that do not write the test.wav in the root project folder, but a named .wav (vocals / drums / bass / other) in a project temporary subfolder (separated for example). And do a zip over the folder with a download link in Google Colab after the separation.

That can help some potential users that do not have a Google Drive account.

For me, it's OK and fine. Thank you very much.

@ws-choi
Copy link
Owner

ws-choi commented Dec 16, 2020

Thank you for sharing your experience.
I'll update the code to reflect what you've recommended, sooner or later 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant