Instance-based Label Smoothing for Logistic Regression using Kernel Density Estimation

For Instance-based label smoothing version in Neural Networks: Click`HERE`

Instance-based Label Smoothing for Logistic Regression using Kernel Density Estimation

This repository includes a new proposed method for instance-based label smoothing in Logistic regression based on Kernel density estimation by smoothing the labels of the more confident instances without introducing any additional noise to the labels of less confident ones will avoid overconfidence in all instances.
Additionally, the implementation of the Bayesian approach of finding the optimal model predictions in case of prior knowledge of data generative model distribution.
Besides, the repository includes Python Implementation of different logistic regression fitting approaches including Bayesian Logistic Regression using Cauchy priors for the model coefficients, L1/L2 regularization, and label smoothing using Platt scaling as shown in the table below.

Requirements

Python 3.x
pandas
numpy
scipy

Usage

Datasets

40 open-source datasets from OpenML and uploaded in the repository \Datasets folder.

Files Content

The project have a structure as below:

├── BayesianCoeffLogisticRegression.py
├── BayesianDataLogisticRegression.py
├── CustomLogisticRegression.py
├── KDELogisticRegression.py
├── DatasetGenerator.py
├── Datasets
│   ├── aecoli.csv
│   ├── balloon.csv
│   ├── ...

BayesianCoeffLogisticRegression.py is the implementation class for training the Bayesian logistic regression with a cauchy prior for the model coefficients Gelman et al., 2008.
CustomLogisticRegression.py is the implementation class for vanilla logistic regression, with Platt scaling label smoothing, with custom label smoothing factor or with L1/L2 regularization for logistic regression coefficients.
KDELogisticRegression.py is the implementation class for logistic regression with instance-based label smoothing using kernel density estimation.
BayesianDataLogisticRegression.py is the implementation class for the derived Bayesian approach of the optimal probability predictions for a dataset with a known generative model distribution (can be used with synthetic univariate datasets only).
DatasetGenerator.py class for the synthetic datasets generation.
Datasets/ includes all real-world datasets used in the evaluation experiments.

Examples:

Synthetic Dataset Generator | Generative Model of Datasets:

Open Example

DatasetGenerator.py

generator = SyntheticGaussianGenerator(p = ratio) #Positive/Negative classes instances are normally distributed with p*100% for positive class
generator.set_gaussian_parameters(mu0, mu1, sig, sig) #gaussian dataset generator object with mu0, mu1 for means of negative, positive classes respectively and sig, sig for their standard deviation
X_train, y_train = generator.generate(int(tr_size)) #generate dataset of size 'tr_size'
################################################################
prior_object = uniform_prior(l0min = a0, l0max = b0, l1min = a1, l1max = b1, lsmin=sigma, lsmax=sigma) #means of positive and negative classes gaussian distributions are drawn from uniform distributions of parameters a0,b0 for negative class mean and a1,b1 for positive class mean, and stddev from a uniform distribution with start of lsmin and end of lsmax (here is is fixed to sigma)
# OR
prior_object = beta_prior(a0 = a0, b0 = b0, a1 = a1, b1 = b1, shift=sep, lsmin=sigma, lsmax=sigma) #means of positive and negative classes gaussian distributions are drawn from beta distributions of parameters a0,b0 for negative class mean and a1,b1 for positive class mean shifted with shift, and stddev from a uniform distribution with start of lsmin and end of lsmax (here is is fixed to sigma)

mu0s, mu1s, sigs = prior_object.sample(generative_model_samples) #generate 'generative_model_samples' samples from prior distribution object for means/stdev of positive or negative classes

Bayesian Logistic Regression:

Open Example

BayesianCoeffLogisticRegression.py

model = BayesianCoeffLogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test) #predicted labels
y_pred_proba = model.predict_proba(X_test) # predicted probabilities
ece, loss = model.predict_ece_logloss(X_test) #calibration error, log loss

Custom Logistic Regression:

Open Example

CustomLogisticRegression.py

model = CustomLogisticRegression(
smoothing_factor_pos = 0, #smoothing factor of positive class instances
smoothing_factor_neg = 0, #smoothing factor of negative class instances
tolerance = 1e-3, #tolerance parameter (when to stop fitting logistic regression coefficients)
regularization = 'none', # use regularization or not ('none', 'l1', 'l2')
regularization_strength = 0, #regularization penalty parameter
platt_scaling = False) #use Platt scaling for label smoothing or not
model.fit(X_train, y_train)
y_pred = model.predict(X_test) #predicted labels
y_pred_proba = model.predict_proba(X_test) # predicted probabilities
ece, loss = model.predict_ece_logloss(X_test) #calibration error, log loss

Instance-based label smoothing Logistic Regression:

Open Example

KDELogisticRegression.py

model = KDELogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test) #predicted labels
y_pred_proba = model.predict_proba(X_test) # predicted probabilities
ece, loss = model.predict_ece_logloss(X_test) #calibration error, log loss

BayesianData Logistic Regression:

Open Example

BayesianDataLogisticRegression.py

##### Posterior Mean of Probability Predictions  #####
prior_object = uniform_prior(l0min = a0, l0max = b0, l1min = a1, l1max = b1, lsmin=sigma, lsmax=sigma) #dataset generative model of uniform distribution with parameters l0min, l0max for negative class mean | l1min, l1max for positive class mean | lsmin, lsmax for stddev
clf_bayes_preds = BayesianDataLogisticRegressionMeanPreds(prior_object, #object of dataset generative model distribution
prior_type = 'uniform', #type of prior distribution of dataset generative model
prior_samples = 10000) #how many samples to draw from generative model distribution to approximate the posterior probability of optimal logistic regression model
clf_bayes_preds.fit(X_train, y_train)
##### Posterior Mean of Logistic Regression Coefficients #####
clf_bayes_coeffs = BayesianDataLogisticRegressionMeanCoeffs(prior_object, prior_type = 'uniform', prior_samples = 10000)
clf_bayes_coeffs.fit(X_train, y_train)
##### Logistic Regression Model with Highest Posterior Probability  #####
clf_bayes_map = BayesianDataLogisticRegressionMAP(prior_object, prior_type = prior_type, prior_samples = prior_samples)
clf_bayes_map.fit(X_train, y_train)

y_pred = clf_bayes_preds.predict(X_test) #predicted labels
y_pred_proba = clf_bayes_coeffs.predict_proba(X_test) # predicted probabilities
ece, loss = clf_bayes_map.predict_ece_logloss(X_test) #calibration error, log loss

Results

Critical Difference diagram of the evaluated methods on real datasets in terms of log loss and expected calibration error can be found below:

Logistic Regression performance when fitting on whole real-world datasets (Higher rank is better).

Logistic Calibration performance of Linear SVM scores on real-world datasets (Higher rank is better).

For the synthetic datasets results, review the thesis text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

For Instance-based label smoothing version in Neural Networks: Click`HERE`

Instance-based Label Smoothing for Logistic Regression using Kernel Density Estimation

Requirements

Usage

Datasets

Files Content

Examples:

Synthetic Dataset Generator | Generative Model of Datasets:

Bayesian Logistic Regression:

Custom Logistic Regression:

Instance-based label smoothing Logistic Regression:

BayesianData Logistic Regression:

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Datasets		Datasets
Instance-based Smoothing in Neural Networks		Instance-based Smoothing in Neural Networks
results		results
BayesianCoeffLogisticRegression.py		BayesianCoeffLogisticRegression.py
BayesianDataLogisticRegression.py		BayesianDataLogisticRegression.py
CustomLogisticRegression.py		CustomLogisticRegression.py
DatasetGenerator.py		DatasetGenerator.py
KDELogisticRegression.py		KDELogisticRegression.py
LICENSE		LICENSE
README.md		README.md

License

mmaher22/Instance-based-smoothing

Folders and files

Latest commit

History

Repository files navigation

For Instance-based label smoothing version in Neural Networks: ClickHERE

Instance-based Label Smoothing for Logistic Regression using Kernel Density Estimation

Requirements

Usage

Datasets

Files Content

Examples:

Synthetic Dataset Generator | Generative Model of Datasets:

Bayesian Logistic Regression:

Custom Logistic Regression:

Instance-based label smoothing Logistic Regression:

BayesianData Logistic Regression:

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

For Instance-based label smoothing version in Neural Networks: Click`HERE`

Packages