Here is a bug on linear loss computation #113

begeekmyfriend · 2018-07-25T07:57:50Z

In the expression computing linear loss, num_mels should have been num_freq. See Keith Ito's version. It seems that this model does not compute the loss including effective bandwidth of the audio.

The text was updated successfully, but these errors were encountered:

Yeongtae · 2018-07-25T08:15:15Z

@begeekmyfriend how to fix it? just replace num_mels to num_freq?
When we fix it, what is the improvement compare with the previous one?

In my opinion, The tacotorn part converges well unlike the wavenet_vocoder part.

begeekmyfriend · 2018-07-25T08:22:41Z

linear_loss = 0.5 * tf.reduce_mean(l1) + 0.5 * tf.reduce_mean(l1[:,:,0:n_priority_freq])

This expression means we use 0.5 weight of the whole bandwidth of the frequency plus the remaining 0.5 weight of the priority bandwidth of the frequency as the complete linear loss to train the model. The num_freq factor would make influence on the bandwidth. So in my humble opinion, the higher frequency of the targets would be taken part in fitting the ground truth audio. Therefore Keith Ito's version is all you need.

Yeongtae · 2018-07-25T08:47:19Z

@begeekmyfriend Thank you for your good opinion.
did you test it? did it reduce noise the result such as 'step-xxxx-wave-from-mels'??

begeekmyfriend · 2018-07-25T08:48:26Z

It does nothing with the mel outputs. By the way, the quality of linear outputs typically perform better than mel ones.
By the way, here is my hyper parameters (under testing). We can see that I use 2048 fft size and 1025 number of frequency with Griffin-Lim vocoder.

	#Audio
	num_mels = 80, #Number of mel-spectrogram channels and local conditioning dimensionality
	num_freq = 1025, # (= n_fft / 2 + 1) only used when adding linear spectrograms post processing network
	rescale = True, #Whether to rescale audio prior to preprocessing
	rescaling_max = 0.999, #Rescaling value
	trim_silence = True, #Whether to clip silence in Audio (at beginning and end of audio only, not the middle)
	clip_mels_length = True, #For cases of OOM (Not really recommended, working on a workaround)
	max_mel_frames = 960,  #Only relevant when clip_mels_length = True

	# Use LWS (https://github.com/Jonathan-LeRoux/lws) for STFT and phase reconstruction
	# It's preferred to set True to use with https://github.com/r9y9/wavenet_vocoder
	# Does not work if n_ffit is not multiple of hop_size!!
	use_lws=False,
	silence_threshold=2, #silence threshold used for sound trimming for wavenet preprocessing

	#Mel spectrogram
	n_fft = 2048, #Extra window size is filled with 0 paddings to match this parameter
	hop_size = None, #For 22050Hz, 275 ~= 12.5 ms
	win_size = 1100, #For 22050Hz, 1100 ~= 50 ms (If None, win_size = n_fft)
	sample_rate = 22050, #22050 Hz (corresponding to ljspeech dataset)
	frame_shift_ms = 12.5,

Yeongtae · 2018-07-25T08:59:00Z

@begeekmyfriend does it affect this part?
Thanks a lot.

begeekmyfriend · 2018-07-25T09:01:26Z

It definitely does because I have expanded both the fft size and the number of frequency of linear outputs. So the audio signal process would be affected. That is saying you have to pre-process all the audio dataset and run again from scratch. By the way, these hyper parameters do not match wavenet vocoder but only for Griffin-Lim.

begeekmyfriend · 2018-07-25T10:05:33Z

We can also use L2 loss for linear outputs if you think L2 is better than L1

n_priority_freq = int(4000 / (hp.sample_rate * 0.5) * hp.num_freq)
linear_loss = 0.5 * tf.losses.mean_squared_error(self.linear_targets, self.linear_outputs) \
        + 0.5 * tf.losses.mean_squared_error(self.linear_targets[:,:,0:n_priority_freq], self.linear_outputs[:,:,0:n_priority_freq])

The reason why we prefer L2 is mentioned here #4 (comment)

Yeongtae · 2018-07-26T01:13:38Z

This is my test results.

--num_mels: 10000iteration--

--num_freq: 10000iteration--

begeekmyfriend · 2018-07-26T01:50:50Z

I am afraid there might be problems in your dataset. In my test it would achieve convergence in 4K steps when I adopted the solutions mentioned both on the 5th and 8th floor that used MSE for linear loss.

And below is one of the results from Griffin-Lim in 15K steps
step-15000-eval-waveform-linear.zip

Starlon87 · 2018-07-26T06:04:18Z

@begeekmyfriend 为什么你的4000 steps，loss = 0.59就可以看起来如此收敛，而我这边70000 steps，loss = 0.37还没苗条的曲线?...

Rayhane-mamah · 2018-08-04T11:48:47Z

@begeekmyfriend my good friend you are correct once more!

I have fixed that.. I apologize for the typo :) Thanks for your feedback!

Rayhane-mamah closed this as completed Aug 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Here is a bug on linear loss computation #113

Here is a bug on linear loss computation #113

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 25, 2018

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 26, 2018 •

edited

Loading

begeekmyfriend commented Jul 26, 2018 •

edited

Loading

Starlon87 commented Jul 26, 2018

Rayhane-mamah commented Aug 4, 2018

Here is a bug on linear loss computation #113

Here is a bug on linear loss computation #113

Comments

begeekmyfriend commented Jul 25, 2018 • edited Loading

Yeongtae commented Jul 25, 2018 • edited Loading

begeekmyfriend commented Jul 25, 2018 • edited Loading

Yeongtae commented Jul 25, 2018 • edited Loading

begeekmyfriend commented Jul 25, 2018 • edited Loading

Yeongtae commented Jul 25, 2018

begeekmyfriend commented Jul 25, 2018 • edited Loading

begeekmyfriend commented Jul 25, 2018 • edited Loading

Yeongtae commented Jul 26, 2018 • edited Loading

begeekmyfriend commented Jul 26, 2018 • edited Loading

Starlon87 commented Jul 26, 2018

Rayhane-mamah commented Aug 4, 2018

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

begeekmyfriend commented Jul 25, 2018 •

edited

Loading

Yeongtae commented Jul 26, 2018 •

edited

Loading

begeekmyfriend commented Jul 26, 2018 •

edited

Loading