Introductory machine learning

From linear regression to Bayesian kernel regression

Focus on weight-space view (linear form), numerical approach (to the extent possible), NumPy and Matplotlib

Part `one`: regression

01

Fit two 1D data points with a straight line, or linear model
Model formulation, model parameters, and matrix notation
Matrix inverse
Compute inverse of 2x2 matrix
Find model parameters through matrix inverse

02

Fit more 1D data points with a straight line
Matrix transpose
Analytical solution
Geometric interpretation of analytical solution as projection

03

Iterative numerical approach to find parameters
Loss or cost function to describe behavior we want our model to have
Choice of loss function (notes_01)
Derivation of gradient of squared loss function with respect to parameters
Gradient descent to iteratively find parameters that minimize loss function
Compare numerical solution to analytical solution
Gradient descent as minimizing regularized 1st order Taylor approximation (notes_02) of loss function

04

Extend linear regression to binary classification
Step function for discrete output of either 1 or 0 for classification
Sigmoid function to approximate step function
Derivation of gradient of squared loss function for classification
Implement binary classification of 1D data points

05

Cross-entropy loss function and intuition
Derivation of gradient of cross-entropy loss function
Implement binary classification of 1D data points

06

Implement binary classification of 2D data points with cross-entropy loss function

07

Use linear model to handle nonlinearity
Projection of input into feature space
Global and local function behavior
An infeasbile way of perfectly modeling local function behavior
From exact matching to correlation
Handle correlation with radial basis function (RBF)
Keep linear form of the model
Implement binary classification of 1D data points that are not linearly separable

08

Extend RBF kernel method to classify 2D data points that are not linearly separable

Part `two`: uncertainty

09

Uncertainty in data
Likelihood function and essence of minimizing squared loss function
Expression of likelihood interpreted as prediction distribution
Prior distribution and maximum-a-posteriori estimator of parameters
Uncertainty in model induced by posterior distribution of parameters
Sampling from posterior distribution of parameters
Taylor approximation (notes_02) and Laplace's method (notes_03)
Posterior distribution of prediction via Bayesian marginalization
Approximate Bayesian marginalization via Monte Carlo integration
Approximate prediction distribution for MC integration using sampling
Implement Bayesian linear regression over 1D data points

10

Extend Bayesian linear regression to Bayesian kernel regression
Kernel function and correlation preservation
Virtual samples through eigendecomposition (notes_04)
Retain linear form with virtual samples for nonlinear regression
Vector form of Taylor approximation for sampling from posterior distribution of parameters
Correction step for prediction uncertainty (notes_05, notes_06)
Implement Bayesian kernel regression over 1D data points

11

Revisit Bayesian linear regression from 09
Replace Taylor approximation and Laplace's method by Metropolis algorithm (notes_07) to sample from posterior distribution of parameters

12

Revisit Bayesian kernel regression from 10
Replace Taylor approximation and Laplace's method by Metropolis algorithm

13

Organize and speed up Bayesian kernel regression from 12
Package algorithm using Python Class
Vectorize (notes_08) kernel function and Monte Carlo integration in NumPy
Avoid computation of full kernel matrix for testing data points

14

Use Bayesian kernel regression from 13 to optimize the underlying (but unknown) function that generates our data
Lower confidence bound and the balance between exploitation and exploration in optimization
Implement Bayesian optimization to minimize a 1D function

15

Extend Bayesian kernel regression from 13 to handle 2D data points

16

Implement Bayesian optimization to minimize a 2D function, using Bayesian kernel regression from 15

Part `three`: look for better virtual samples

17

Need for data independent virtual samples
Random features
Random Fourier features (RFFs)
Matrix notation
Dot product of RFFs as approximation to RBF kernel function

18

Nystrom method to approximate (potentially large) kernel matrix with small sub-portion
Random sampling to construct sub-portion with eigen-structure similar to full kernel matrix
Virtual sample based on reduced eigendecomposition of sub-portion (notes_04)

Notes

notes_01

Squared loss function and influence of outliers
Absolute value loss function is less sensitive to outliers
Comparison of common loss functions (squared, absolute value, deadzone, Huber)

notes_02

Taylor approximation of functions with scalar input
Taylor approximation of functions with vector input

notes_03

Sampling from a distribution
Laplace's method for sampling based on 2nd order Taylor approximation of distribution
Use finite difference to compute second derivatives

notes_04

Concept of eigenvalues and eigenvectors
Compute eigenvalues and eigenvectors based on definition
Diagonalization of a matrix using eigenvalues and eigenvectors
Diagonalization simplifies many computations, such as power and exponential
Kernel matrix is symmetric
Symmetric matrices have real eigenvalues and orthonormal eigenvectors
Equivalence between transpose and inverse for orthonormal eigenvectors
Power iterations and Rayleigh quotient to numerically compute dominant eigenvalue and eigenvector
Generalize power iterations to compute all eigenvalues and eigenvectors for symmetric matrices

notes_05

Transformation of data induced by RBF kernel is infinite-dimensional

notes_06

Correction of posterior prediction variance due to finite-dimensional approximation of infinite-dimensional transformation under RBF kernel

notes_07

Metropolis algorithm for sampling
Proposal distribution to generate new sample candidates
Acceptance rate in Metropolis

notes_08

Vectorization and broadcasting in NumPy
Broadcasting rules

notes_09

Hamiltonian Monte-Carlo (HMC) sampling
Euler-Lagrange equation
Lagrangian mechanics
Hamiltonian equations
Leapfrog method
HMC and Metropolis algorithm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introductory machine learning

Part `one`: regression

Part `two`: uncertainty

Part `three`: look for better virtual samples

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
01_linear_fit_2_data_points.ipynb		01_linear_fit_2_data_points.ipynb
02_linear_fit_more_data_points.ipynb		02_linear_fit_more_data_points.ipynb
03_linear_gradient_descent.ipynb		03_linear_gradient_descent.ipynb
04_linear_1d_classification_squared_loss.ipynb		04_linear_1d_classification_squared_loss.ipynb
05_linear_1d_classification_cross_entropy.ipynb		05_linear_1d_classification_cross_entropy.ipynb
06_linear_2d_classification_cross_entropy.ipynb		06_linear_2d_classification_cross_entropy.ipynb
07_kernel_1d_classification.ipynb		07_kernel_1d_classification.ipynb
08_kernel_2d_classification.ipynb		08_kernel_2d_classification.ipynb
09_linear_1d_uncertainty_Laplace_mc.ipynb		09_linear_1d_uncertainty_Laplace_mc.ipynb
10_kernel_1d_uncertainty_Laplace_mc_vsample.ipynb		10_kernel_1d_uncertainty_Laplace_mc_vsample.ipynb
11_linear_1d_uncertainty_Metropolis_mc.ipynb		11_linear_1d_uncertainty_Metropolis_mc.ipynb
12_kernel_1d_uncertainty_Metropolis_mc_vsample.ipynb		12_kernel_1d_uncertainty_Metropolis_mc_vsample.ipynb
13_kernel_1d_uncertainty_Metropolis_mc_vsample_vectorization.ipynb		13_kernel_1d_uncertainty_Metropolis_mc_vsample_vectorization.ipynb
14_Bayesian_optimization_1d_lcb_vsample_vectorization.ipynb		14_Bayesian_optimization_1d_lcb_vsample_vectorization.ipynb
15_kernel_2d_uncertainty_Metropolis_mc_vsample_vectorization.ipynb		15_kernel_2d_uncertainty_Metropolis_mc_vsample_vectorization.ipynb
16_Bayesian_optimization_2d_lcb_vsample_vectorization.ipynb		16_Bayesian_optimization_2d_lcb_vsample_vectorization.ipynb
17_random_Fourier_features_rff.ipynb		17_random_Fourier_features_rff.ipynb
18_Nystrom_method.ipynb		18_Nystrom_method.ipynb
README.md		README.md
notes_01_choice_of_loss_function.ipynb		notes_01_choice_of_loss_function.ipynb
notes_02_Taylor_approximation.ipynb		notes_02_Taylor_approximation.ipynb
notes_03_Laplace_method_for_sampling.ipynb		notes_03_Laplace_method_for_sampling.ipynb
notes_04_eigenvalue_and_eigenvector.ipynb		notes_04_eigenvalue_and_eigenvector.ipynb
notes_05_the_mysterious_phi_x.ipynb		notes_05_the_mysterious_phi_x.ipynb
notes_06_correction_for_prediction_variance.ipynb		notes_06_correction_for_prediction_variance.ipynb
notes_07_Metropolis_sampling.ipynb		notes_07_Metropolis_sampling.ipynb
notes_08_vectorization_broadcasting_numpy.ipynb		notes_08_vectorization_broadcasting_numpy.ipynb
notes_09_Hamiltonian_mc_sampling.ipynb		notes_09_Hamiltonian_mc_sampling.ipynb

heyjianjing/eliu_summer_2024

Folders and files

Latest commit

History

Repository files navigation

Introductory machine learning

Part one: regression

Part two: uncertainty

Part three: look for better virtual samples

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Part `one`: regression

Part `two`: uncertainty

Part `three`: look for better virtual samples

Packages