GitHub

Data preprocessing

Data Download and decompress

First we need to download Coswara and coughbid datasets.

mkdir datasets
cd datasets
git clone https://github.com/iiscleap/Coswara-Data.git # Coswara
./decompress.sh # Decompress the dataset. Only Coswara needs it. 
git clone https://github.com/virufy/virufy-cdf-coughvid.git # coughvid from virufy

Data preparation

After finishing data download, we will need to do data preparation.

python data_preprocessing.py --dataset coswara ## preprocess coswara dataset, or
python data_preprocessing.py --dataset coughvid ## preprocess coughvid dataset

This will extract features of these wav files and generate a json file (stored at datasets/) according to the dataset being specified.

# Coswara dataset
{
    "iV3Db6t1T8b7c5HQY2TwxIhjbzD3": { # patient ID
        "feature_paths": [ # list of the patient's sound file
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3counting-normal.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3vowel-o.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3vowel-a.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3vowel-e.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3breathing-shallow.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3counting-fast.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3breathing-deep.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3cough-heavy.npy",
            "datasets/Coswara-Data/20200424/iV3Db6t1T8b7c5HQY2TwxIhjbzD3cough-shallow.npy"
        ],
        "pct_test_result": "untested" # patient's pcr test result
    },
    "AxuYWBN0jFVLINCBqIW5aZmGCdu1": { # another patient ID
        ...
    }
    ...
}

There are 417/1331/398 positive/negative/untested speakers in Coswara dataset.
There are 3734/11926/3574 positive/negative/untested speakers in Coswara dataset.
There are 588/7129/5400 positive/negative/untested speakers in Coughvid dataset.

Model training:

To train the model, use the following command:

python train.py --dataset <dataset> --semi <bool> --split_type <split_type>

Note that we can only choose "random" as the split_type when training on Coughvid dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
audio_utils.py		audio_utils.py
data_preprocessing.py		data_preprocessing.py
dataset.py		dataset.py
decompress.sh		decompress.sh
hparams.py		hparams.py
models.py		models.py
requirments.txt		requirments.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data preprocessing

Data Download and decompress

Data preparation

Model training:

About

Releases

Packages

Languages

Kuray107/ECE6254_final_project

Folders and files

Latest commit

History

Repository files navigation

Data preprocessing

Data Download and decompress

Data preparation

Model training:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages