Skip to content

Intelligent-Systems-Phystech/BasicDeepNets

Repository files navigation

BasicDeepNets

A collection of notebooks with basic deep neural networks implementation and explanation.

Motivation

Due to the large number of neural network libraries with ready-to-use models, it is planned to change the introductory course in Machine Learning. Namely, neural networks are going to be introduced along with the basic ML theory, as a possible first approach to solve applied problems: take a relevant model, train it, then analyze. For this to work, it is necessary to separate the type of network from the optimized criterion and from the optimization algorithm in the problem statement; it is also important to present the neural network as a mapping which fits the algebraic description of the feature space in which the measurements are made. The analysis of the statistical nature of the measurements should be performed through the Bayesian derivation, which ends with an error function.

Description

Notebooks presented in the repository are just laboratory works for third year students, which should help them become acquainted with neural networks and encourage to try to use basic neural networks as a first baseline solution for ML problems where possible.

Ideally, the repository should have the notebooks with all basic modern types of neural networks.

Below is a list of networks to be implemented (those which are already available have an associated link)

Notebook Requirements

Each network should be presented in its simplest form so that it is clear how it works. It is advisable to do everything with PyTorch, avoiding out-of-the-box solutions. Notebooks should have a heading, sectioning, explanatory comments (if possible, in English).

So, the notebooks must provide

  • clear code for constructing and training the nets
  • derscriptive text explanations, trying to express ideas in general terms, operating only with such notions as model, sample, error function — so that a beginner in ML can take, read, and understand everything more or less clearly
  • Bayesian analysis

Structure

Notebooks should have the following sections:

  • Name
  • Brief explanation
  • Data loading (preferably more than one sample)
  • Initial configuration
  • Parameter optimization (indicating the possibility of optimizing the structure)
  • Error analysis (plots)
  • List of links to more detailed sources, tutorials, and alternative solutions

Data

It is advisable to illustrate the network with a simple real task, selecting different tasks for different networks. It is advisable to illustrate the data and the final result with a plot.

Error Analysis

First of all, analysis of variance of error and parameters, change of error value during optimization (learning curve), analysis of sample size sufficiency (change of variance during replenishment), analysis of structure (change of error and variance with increasing complexity).

Notebook Quality Criterion

Criteria for the quality of laptops is just public benefit, and separability from the author: the code and text should be understandable, so that anyone can open the notebooks, change the code, and understand the core things.

One should provide explanations, explain everything with text, what happens and why. It is assumed that the notebook will be used by third year students. The code is possible to be changed: it is clear how to load another sample, change the network structure, change the error function.

In general, it is recommended to find a ready-made code (good external source) and just put it in order, maybe draw some plots.

References

Linear Models

What can be demonstrated in the practical part: solve a couple of problems, show that they can be solved analytically, using gradient methods and with the help of PyTorch or TensorFlow. And in the theoretical part: compare ordinary linear regression and Bayesian one

Logistic regression

What can be demonstrated (theoretical part): consider how the solution changes depending on the values of hyperparameters, visualize.

One-Layer Net

What can be demonstrated (theoretical part): universal function approximator, overfitting.

MLP

What can be demonstrated (theoretical part): visualize non-linearity, how the decisive surface changes depending on depth.

Autoencoder

What can be demonstrated (theoretical part): compare with PCA, show compression, denoising, sparsing.

CNN

RNN

What can be demonstrated (theoretical part): vanishing gradients, attention.

ResNets

Embeddings

Variational Models

GAN

T2T