39th Detail Solution Child Mind Institute - Detect Sleep States

Thanks to Kaggle for hosting this meaningful competition. Thanks to all the Kagglers for your discussions and shared perspectives. This was our first time participating in a formal tabular competition, and we've learned a lot from the experience.

Team Avengers will always continue the journey on Kaggle.

Main GitHub Repo: Here

PrecTime GitHub Repo: Here

Original Discussion: Here

Here is the Detail Solution

Baseline Code

Here, we need to thank tubotubo for providing the baseline code. We didn't join the competition from the very beginning, this baseline code provided us with some ideas and basic model structures.

Dataset Preparation

We didn't use any methods to handle the dirty data, which might be one reason why we couldn't improve our scores anymore.
On the evening before the competition ended, my teammate found this discussion. Consequently, we attempted to clean the data by removing the data for the days where the event was empty. However, due to the time limitation, we didn't make significant progress.
We believe data cleaning should be helpful because the model using this method showed a smaller difference in scores on the private leaderboard.

Feature Engineering

We generated new features using shift, different and rolling window functions.

The final set we utilized 24 rolling features in addition to the original 4, making a total of 28 features. The use of new features did not significantly improve the model's score, which was somewhat unexpected for us.

Code: Here

*[pl.col("anglez").diff(i).alias(f"anglez_diff_{i}") for i in range(diff_start, diff_end, diff_step)],
*[pl.col("anglez").shift(i).alias(f"anglez_lag_{i}")
  for i in range(shift_start, shift_end, shift_step) if i != 0],
*[pl.col("anglez").rolling_mean(window_size).alias(
    f"anglez_mean_{window_size}") for window_size in window_steps],
*[pl.col("anglez").rolling_min(window_size).alias(
    f"anglez_min_{window_size}") for window_size in window_steps],
*[pl.col("anglez").rolling_max(window_size).alias(
    f"anglez_max_{window_size}") for window_size in window_steps],
*[pl.col("anglez").rolling_std(window_size).alias(
    f"anglez_std_{window_size}") for window_size in window_steps]

Wandb sweep

Wandb sweep is a hyperparameter optimization tool provided by the Wandb machine learning platform. It allows automatic exploration of different hyperparameter combinations to enhance a model's performance.

Implementation Code: Here

Models

Used overlap - To enhance accuracy in predicting sequence edges, we utilized overlap by using a 10000 length sequence to predict an 8000 length sequence.
Implementation of PrecTime Model - You can find details in this discussion. We also made modifications to its structure, including the addition of transformer architecture and residual connection structures. The experiments had shown that these modifications contribute to a certain improvement in the model's performance.

Post-preprocessing Trick

We used dynamic programming algorithm to deal with overlap problem.

Principle behind this method: To achieve a high MAP (Mean Average Precision), three main criteria need to be met: Firstly, the predicted label should be sufficiently close to the actual label. Secondly, within a positive or negative tolerance range around the actual label, there should only be one predicted point. Thirdly, the score of other predicted points outside the actual label range should be lower than those within the range.

def get_results_slide_window(pred, gap):
    scores = list(pred)
    stack = [0]
    dp = [-1] * len(scores)
    dp[0] = 0
    for i in range(1,len(scores)):
        if i - stack[-1] < gap:
            if scores[i] >= scores[stack[-1]]:
                stack.pop()
                if i - gap >= 0:
                    if stack:
                        if dp[i - gap] != stack[-1]:
                            while stack and dp[i - gap] - stack[-1] < gap:
                                stack.pop()
                            stack.append(dp[i - gap])
                    else:
                        stack.append(dp[i - gap])
                stack.append(i)
        else:
            stack.append(i)
        dp[i] = stack[-1]
    return stack

Ensemble

Our final ensemble method essentially involved averaging different outputs. With post-processing and this ensemble method combined, our results generally follow the pattern that the more models we use or the greater the variety of models, the higher the score.

Our submissions:

Models	LB Score	PB Score	Selected
`2 * 5 folds PrecTime + 1 * 5 folds LSTM-Unet`	`0.75`	`0.8`	Yes
`2 * 5 folds PrecTime + 2 * 5 folds LSTM-Unet + 10 single models`	`0.759`	`0.803`	Yes
`1 * 5 folds PrecTime + 1 fold LSTM-Unet + 10 single models`	`0.761`	`0.804`	No
`1 * 5 folds PrecTime + 1 * 5 folds LSTM-Unet + 10 single models`	`0.759`	`0.803`	No

Other Details

Welcome everyone to check our GitHub code, looking forward to any discussions.

Conclusion

Data Cleaning.
Generate New Features.
Use Architecture Like Conv1d, RNN, GRU, LSTM or Transformer.
Write Post-preprocessing for Special Metrics.

Thanks to all of my teammates for working together to gain this Silver Medal.

TODO LIST

Use Wandb sweep

If needed, modify sweep.yaml first.
Initiate WandB sweep as: $ wandb sweep wandb_sweep.yaml
Run Agent Creating a sweep returns a wandb agent command like:
Next invoke the wandb agent path/to/sweep command provided in the output of the previous command.

Child Mind Institute - Detect Sleep States

This repository is for Child Mind Institute - Detect Sleep States

Build Environment (Not necessary)

1. Install rye

install documentation

MacOS

curl -sSf https://rye-up.com/get | bash
echo 'source "$HOME/.rye/env"' >> ~/.zshrc
source ~/.zshrc

Linux

curl -sSf https://rye-up.com/get | bash
echo 'source "$HOME/.rye/env"' >> ~/.bashrc
source ~/.bashrc

Windows
see install documentation

2. Create virtual environment

rye sync

3. Activate virtual environment

. .venv/bin/activate

Set path

Rewrite run/conf/dir/local.yaml to match your environment

data_dir: /kaggle-detect-sleep/data
processed_dir: /kaggle-detect-sleep/data/processed_data
output_dir: /kaggle-detect-sleep/output
model_dir: /kaggle-detect-sleep/output/train
sub_dir: ./

Prepare Data

1. Set kaggle environment

export KAGGLE_USERNAME=your_kaggle_username
export KAGGLE_KEY=your_api_key

2. Download data

cd data
kaggle competitions download -c child-mind-institute-detect-sleep-states
unzip child-mind-institute-detect-sleep-states.zip

2. Preprocess data

python run/prepare_data.py phase=train,test

Train Model

Basic Model

python run/train.py downsample_rate=2 duration=5760 exp_name=exp001 batch_size=32

You can easily perform experiments by changing the parameters because hydra is used. The following commands perform experiments with downsample_rate of 2, 4, 6, and 8.

python run/train.py downsample_rate=2,4,6,8

PrecTime Model

python run/train_prectime.py

You can select sweep yaml

sweep_prectime_lstm.yaml
sweep_prectime_r_lstm.yaml

Upload Model

rye run python tools/upload_dataset.py

Inference

The following commands are for inference of LB0.714

rye run python run/inference.py dir=kaggle exp_name=exp001 weight.run_name=single downsample_rate=2 duration=5760 model.encoder_weights=null post_process.score_th=0.005 post_process.distance=40 phase=test

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.vscode		.vscode
bin		bin
data		data
notebook		notebook
run		run
src		src
tools		tools
wandb		wandb
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
Untitled.ipynb		Untitled.ipynb
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock
requirements.txt		requirements.txt
sweep.yaml		sweep.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

39th Detail Solution Child Mind Institute - Detect Sleep States

Here is the Detail Solution

Baseline Code

Dataset Preparation

Feature Engineering

Wandb sweep

Models

Post-preprocessing Trick

Ensemble

Other Details

Conclusion

Thanks to all of my teammates for working together to gain this Silver Medal.

TODO LIST

Use Wandb sweep

Child Mind Institute - Detect Sleep States

Build Environment (Not necessary)

1. Install rye

2. Create virtual environment

3. Activate virtual environment

Set path

Prepare Data

1. Set kaggle environment

2. Download data

2. Preprocess data

Train Model

Upload Model

Inference

About

Releases

Packages

Contributors 2

Languages

lullabies777/kaggle-detect-sleep

Folders and files

Latest commit

History

Repository files navigation

39th Detail Solution Child Mind Institute - Detect Sleep States

Here is the Detail Solution

Baseline Code

Dataset Preparation

Feature Engineering

Wandb sweep

Models

Post-preprocessing Trick

Ensemble

Other Details

Conclusion

Thanks to all of my teammates for working together to gain this Silver Medal.

TODO LIST

Use Wandb sweep

Child Mind Institute - Detect Sleep States

Build Environment (Not necessary)

1. Install rye

2. Create virtual environment

3. Activate virtual environment

Set path

Prepare Data

1. Set kaggle environment

2. Download data

2. Preprocess data

Train Model

Upload Model

Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages