HMS-Harmful Brain Activity Classification

This repository contains a series of experiments that improved the classification performance of EEG-Spectrogram Data in the Kaggle competition HMS - Harmful Brain Activity Classification.

Description

The Data consists of 50-second long EEG samples plus matched spectrograms covering a 10-minute window centered at the same time and labeled the central 10 seconds.
Each of these samples belongs to one of six categories: Seizure, LPD, GPD, LRDA, GRDA, or Other is determined by expert voters.
The vote count for each sample varies among several experts, ranging from 1 to 28.
The Competition Criterion is KLDivergence Loss between the predicted probability and the observed target.

I primarily focused on utilizing Spectrogram Image Data, employing both CNN and Transformer based approaches to enhance the Performance.

Configuration

For most of the experiments, I have followed the same configuration as described below.

Model: Efficientnets
Fold: StratifiedGroupKFold (5 Folds)
Epochs: 6
Eval_per_epoch: 2
Optimizer: AdamW
Learning Rate: 1e-3 (For CNN)/ 1e-4 (For Transformers)
Scheduler: One Cycle Policy with MaxLR: 1e-3 (For CNN)/ 1e-4 (For Transformers)
Loss: KLDiv Loss

CNN

1. Baseline Model

Our baseline model processes a spectrogram image composed of four panels stacked vertically: LL, LP, RL, and RP.

Input	OOF-CV	Public LB
Spectrogram Images	0.7287	0.45

2. Global Spectrogram Features

In this approach, rather than directly using the images, we extract the following statistics from four panel images and utilize them as input for our CNN:

 X_min = np.min([LL, LP, RL, RP])
 X_max = np.max([LL, LP, RL, RP])
 X_mean = np.mean([LL, LP, RL, RP])
 X_var = X_max - X_min

These can be seen as global spectrogram features. These derived statistics are then utilized as input features for our Convolutional Neural Network (CNN).

Input	OOF-CV	Public LB
Global Features	0.7324	0.46

3. Ensemble of 1 + 2

Ensemble can be performed in multiple ways; 1. Model Ensemble; where we take the weighted sum of the 2 models to get the final output. 2. Input Feature Ensemble; where we concat the input features from 1 and 2 and then train the model.

 # 1. Model Ensemble
 model = 0.5 * model_1 + 0.5 * model_2

 # 2. Input Ensemble
 input = np.hstack([baseline_features, global_features])

Type	OOF-CV	Public LB
Model Ensemble	NA	0.42
Input Feature Ensemble	0.7027	0.43

4. EEG Spectrograms

Instead of using Kaggle-provided spectrograms, we generated Spectrograms from EEG Data as described in this notebook.

Note that for the baseline model, we concatenated percentile features along with the Input Features. that gave us a good 0.04 boost on CV and 0.01 boost on LB.

 # Percentiles
 X_20p = np.percentile(X, q=20, axis=0)
 X_40p = np.percentile(X, q=40, axis=0)
 X_60p = np.percentile(X, q=60, axis=0)
 X_80p = np.percentile(X, q=80, axis=0)
 X_median = np.vstack([X_20p, X_40p, X_60p, X_80p])
 
 input_img = np.hstack([input_img, X_median])

Inputs	OOF-CV	Public LB
Baseline + Percentiles	0.7104	0.45
Global Features	0.7537	0.46
Model Ensemble	NA	0.42

5. Kaggle + EEG Ensemble

This is the ensemble of the models yielded in 3 and 4.

Type	OOF-CV	Public LB
Kaggle Ensemble	NA	0.42
EEG Ensemble	NA	0.42
Kaggle + EEG Ensemble	NA	0.38

6. Vote-Weighted KLDiv Loss

So far we were only using KL-Divergence Loss as a Cost function; ignoring the total expert votes for a given sample.

The idea here is that samples with more votes are more reliable. So we modify the cost function to take account of the total number of votes along with KLDiv-Loss. we modify the cost function to:

Loss = KLDiv * torch.log(total_votes + 1)

This alone gave us a total of 0.02 boost in CV and 0.02 boost in LB. we further added percentile features as described in 4 and used Same-Class Cutmix Augmentation to get an additional 0.03 boost in CV and 0.01 boost in LB over baseline described in 1.

Input	OOF-CV	Public LB
Spectrogram + Percentiles	0.6767	0.42
Global Features	0.6971	0.42
Ensemble	NA	0.40

Table: Kaggle Spectrograms

7. Global Normalization

So far we have been normalizing the spectrogram images according to their mean and variance as shown below.

# Normalization
m = np.nanmean(img.flatten())
s = np.nanstd(img.flatten())
img = (img - m) / (s + ep)

Instead of doing this, we derived the mean and standard deviation from the training data and used them for normalization. This gave us a a good 0.04 boost in CV and 0.02 boost in LB. (Thanks to Sandeep Anna for suggesting this idea.)

Input	OOF-CV	Public LB
Spectrograms	0.6355	0.40
Global Features	0.6566	0.41
Ensemble	NA	0.38

Table: Kaggle Spectrograms

8. Mosaic Warmup + xloss

Mosaic Warmup: We combined 4 spectrogram images into one image and labeled them as the average of their labels. we use these images and labels as warmup training for 3-epochs.
xloss: we further modified the loss function to

Loss = KLDiv * torch.clamp(total_votes , 10)

Input	OOF-CV	Public LB
Spectrograms	0.6290	0.37
Global Features	0.6402	0.39
Ensemble	NA	0.36

Table: Kaggle Spectrograms

9. 2-Stage Training

we divided the samples into two categories based on the total number of expert votes.
1. 1-3 Votes => Noisy Labels
2. 4-28 votes => Good Labels

In Stage 1, we trained the model only on Noisy Labels. (1-3 votes). Later in Stage 2, we finetuned the models on Good Labels. (4-28 votes)

Input	OOF-CV	Public LB
Kaggle Ensemble	0.4142	0.34
EEG Ensemble	0.4608	0.37
Final Ensemble	NA	0.32

Table: Ensemble

Competition Result

Our final solution achieved a ranking of 329th among 2768 candidates.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
CNN		CNN
Transformers		Transformers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMS-Harmful Brain Activity Classification

Description

Configuration

CNN

1. Baseline Model

2. Global Spectrogram Features

3. Ensemble of 1 + 2

4. EEG Spectrograms

5. Kaggle + EEG Ensemble

6. Vote-Weighted KLDiv Loss

7. Global Normalization

8. Mosaic Warmup + xloss

9. 2-Stage Training

Competition Result

References

About

Releases

Packages

Languages

Cranjis-McB/HMS-Harmful-Brain-Activity-Classification

Folders and files

Latest commit

History

Repository files navigation

HMS-Harmful Brain Activity Classification

Description

Configuration

CNN

1. Baseline Model

2. Global Spectrogram Features

3. Ensemble of 1 + 2

4. EEG Spectrograms

5. Kaggle + EEG Ensemble

6. Vote-Weighted KLDiv Loss

7. Global Normalization

8. Mosaic Warmup + xloss

9. 2-Stage Training

Competition Result

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages