Audio Super-Resolution

A project focused on the super-resolution of audio signals i.e improving sound quality of a digital recording, be it a vocal recording or music.

Dataset

The datasets used will be VCTK and some music dataset, possibly MagnaTagATune or The Million Song dataset.

Main references

MSc research project

Papers

Data augmentation

Data transformation

Digital audio antiquing - Signal processing methods for imitating the sound quality of historical recordings

Other resources

Data

Related papers that could be used as references

Audio signals

Image super-resolution

Time-series

Generative models

Deep Learning

Blogs

Datasets

The Million Song Dataset

Other

Possible uses

Improving sound quality of music
- insanely experimental and slightly unrealistic idea: use Deezer's model for separating tracks of a song and apply transfer learning to upsample each track separately (train a model with low-res/high-res guitar tracks, one with bass tracks, one for drums and another for voice tracks)
Voice-over-IP applications
Improving speech recognition
Remastering audio from old movies
Reconstructing old phonograph recordings (This one might be a bit far-fetched, since a lot of data is needed. It might actually be impossible.)

Converting a spectrogram to audio (might be useful later)

The Python module librosa can transform an audio signal into a spectrogram and vice-versa by using the Griffin-Lim algorithm.

Math and signal processing links

Research advices

How to Read AI (Audio) Research Papers Like a Rockstar
1. Skimming
  - read the abstract
  - read introduction + conclusion
  - check out figures and tables
  - ignore details
2. Reading the details
  - What's the state-of-the-art?
  - What are the tools/techniques used?
  - How did the authors evaluate their solution?
  - What are the results of the experiments?
3. Question everything
  - Do I understand everything the authors say?
  - Is the paper sound?
  - What would I have done differently?
  - Read referenced research papers
  - Internalise the math
  - Re-think the problem
  - Explain the paper to friends/colleagues
4. Check the code
  - Check the author's implementation
  - Re-implement the proposed solution
  - Run experiments
How to Conduct Literature Review Effectively

Select resource → Read resource → Take notes (topic, approach, results, contributions, limitations/weaknesses) → Keep track of reference → Summarise literature review findings
How to Select AI (Audio) Papers Effectively
How to Summarize a Research Article

Outline idea

Introduction
- Problem statement (what, why, how)
Theory
- Neural nets/CNNs
- Image super-resolution
- Time series super-resolution
- Autoencoders and U-Nets (state-of-the-art) - Audio super-resolution - literature review
- Time-series analysis - basic concepts (frequency spectrum, Fourier transform, spectrograms, sample rate)
My contributions
- Multiscale convolutions or
- Encoding the time-domain representation as images (GAFs, MTFs, recurrence plots or some other time-series imaging technique)
Experiments and results evaluation (metrics, tables etc.) -> dataset details (exploratory data analysis: histograms, spectrograms etc.)
Spring/Angular project description
Conclusion

A quick implementation of "Audio Super Resolution Using Neural Networks (Kuleshov, Enam, Ermon, 2017)"

First stage

Second stage

3rd stage

Increase the number of filters in the Conv1D layers from 32 to 64 and train

4th stage

Set the filter no. configuration to 64-128-256-256-256-128-64, with the last 2 upsampling blocks retaining the 64-64 arrangement

5th stage

Add BatchNorm to the downsampling blocks

6th stage

Perform an ablation study to analyze how much each residual block contributes to the noise found in the model output

7th stage

Replace the last two upsampling blocks with 1D Inception modules as an experiment

The values are the averages of the outputs for each test sample. These were computed with tf.keras.evaluate(...).

Evaluation metrics

3-layer model			4-layer model			5-layer model
MSE	SNR	NRMSE	MSE	SNR	NRMSE	MSE	SNR	NRMSE
67026	15.31	0.1779	68838	15.45	0.1667	64744	15.46	0.1689

Evaluation metrics (with Inception blocks)

3-layer model			4-layer model			5-layer model
MSE	SNR	NRMSE	MSE	SNR	NRMSE	MSE	SNR	NRMSE
170675	12.88	0.4275	81340	14.16	0.2289	75183	14.28	0.2341

8th stage

remove the dropout layers from the Inception blocks and notice the effect

Evaluation metrics

5-layer model (Inception modules with dropout)			5-layer model (Inception modules without dropout)
MSE	SNR	NRMSE	MSE	SNR	NRMSE
75183	14.28	0.2341	42914	16.62	0.1116

9th stage

remove the dropout layers from the last two upsampling blocks of the normal 5-layer model and notice the effect

Evaluation metrics

5-layer model (Inception modules with dropout)			5-layer model (Inception modules without dropout)			5-layer model (no Inception modules, no dropout in the last 2 blocks)
MSE	SNR	NRMSE	MSE	SNR	NRMSE	MSE	SNR	NRMSE
75183	14.28	0.2341	42914	16.62	0.1116	45324	16.57	0.1125

10th stage

remove the dropout layers from the last two upsampling blocks of the normal 3-layer and 4-layer models and notice the effect

	3-blocks model			4-blocks model			5-blocks model
	MSE	SNR	NRMSE	MSE	SNR	NRMSE	MSE	SNR	NRMSE
upsampling blocks with dropout	67026	22.78	0.1779	66944	22.92	0.1737	64744	23.07	0.1689
upsampling blocks without dropout	51023	25.43	0.1162	47582	25.74	0.1125	45324	25.73	0.1125
with Inception modules, with dropout	170675	15.23	0.4275	81340	19.79	0.2289	75183	20.27	0.2341
with Inception modules, without dropout	49434	25.64	0.1143	55401	25.41	0.1194	42914	25.87	0.1116

Baseline comparisons

The linear and cubic spline baselines have been computed as the means of the signal-to-noise ratio for all test samples. To compute the SNR for the model with Inception modules containing no dropout, a batch size of 1 is used, since Keras.evaluate technically computes a mean over the output metrics for each test sample (using a larger batch size would skew the result).

Linear interpolation	Cubic spline interpolation	5-layer model with Inception blocks containing no dropout
19.67	26.80	27.57

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
Audio-Super-Resolution-Spring-Angular-Project @ 73b444a		Audio-Super-Resolution-Spring-Angular-Project @ 73b444a
Paper		Paper
Project		Project
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

sutgeorge/Audio-Super-Resolution

Folders and files

Latest commit

History

Repository files navigation

Audio Super-Resolution

Dataset

Main references

MSc research project

Papers

Data augmentation

Data transformation

Other resources

Data

Related papers that could be used as references

Audio signals

Image super-resolution

Time-series

Generative models

Deep Learning

Blogs

Datasets

Other

Possible uses

Converting a spectrogram to audio (might be useful later)

Math and signal processing links

Research advices

Outline idea

A quick implementation of "Audio Super Resolution Using Neural Networks (Kuleshov, Enam, Ermon, 2017)"

First stage

Second stage

3rd stage

4th stage

5th stage

6th stage

7th stage

8th stage

9th stage

10th stage

Baseline comparisons

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages