Installation

This project is the backend of the M-SENA Platform.

Installation
- Docker
- From Source
Reference
- Dataset Structure
- Code Structure

Installation

Docker

We provide a docker image of our platform. See the main repo for instructions.

From Source

1. Clone this Repository

$ git clone https://github.com/iyuge2/M-SENA-Backend.git
$ cd M-SENA-Backend

2. Install Requirements

Install system requirements

$ apt install mysql-server default-libmysqlclient-dev libsndfile1 ffmpeg

Install python requirements

$ conda create --name sena python=3.8
$ source active sena
$ pip install -r requirements.txt

Download Bert-Base, Chinese from Google-Bert. Then, convert Tensorflow into pytorch using transformers-cli. Place the converted model under MM-Codes/pretrained_model directory.
Install Openface Toolkits

3. Configure MySQL

Login MySQL with root

$ mysql -u root -p

Create a database for M-SENA

mysql> CREATE DATABASE sena;

Create a user for M-SENA and grant privileges

mysql> CREATE USER sena IDENTIFIED BY 'MyPassword';
mysql> GRANT ALL PRIVILEGES ON sena.* TO sena@`%`;
mysql> FLUSH PRIVILEGES;

4. Configs

Edit Constants.py. Alter DATASET_ROOT_DIR, DATASET_SERVER_IP, OPENFACE_FEATURE_PATH, MM_CODES_PATH, MODEL_TMP_SAVE, AL_CODES_PATH and LIVE_TMP_PATH to fit your settings.
Edit config.sh. Look for DATABASE_URL and change it to fit your database settings.

5. Datasets

Download datasets and locate them under DATASET_ROOT_DIR specified in constants.py
Add information in DATASET_ROOT_DIR/config.json file to register the new dataset.
Format datasets with MM-Codes/data/DataPre.py
For datasets that needs labeling, the config file locates in AL-Codes directory.

$ python MM-Codes/data/DataPre.py --working_dir $PATH_TO_DATASET --openface2Path $PATH_TO_OPENFACE2_FeatureExtraction_TOOL --language cn/en

The structure of the DATASET_ROOT_DIR directory is introduced in the next section.

6. Run

$ source config.sh
$ flask run --host=0.0.0.0

Reference

Dataset Structure

The structure of the root dataset directory should look like this:

.
├── config.json
├── MOSEI
│   ├── label.csv
│   ├── Processed
│   └── Raw
├── MOSI
│   ├── label.csv
│   ├── Processed
│   └── Raw
└── SIMS
    ├── label.csv
    ├── Processed
    └── Raw

config.json: stating necessary information for all datasets. For example, language, label_path, features, etc. It only works when scanning and updating datasets.
**/label.csv: storing detailed information for each video clip in ** dataset, including video_id, clip_id, normal text, label value (Float), annotation (String), mode (training attributes). Besides, we define a field label_by to indicate the label type, which is necessary for labeling based on active learning.

**/Processed: placing feature files. We use pickle to store processed features, which are organized as the following structure. These files are used in MM-Codes.

{
    "train": {
        "raw_text": [],
        "audio": [],
        "vision": [],
        "id": [], # [video_id$_$clip_id, ..., ...]
        "text": [],
        "text_bert": [],
        "audio_lengths": [],
        "vision_lengths": [],
        "annotations": [],
        "classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
        "regression_labels": []
    },
    "valid": {***}, # same as the "train"
    "test": {***}, # same as the "train"
}

**/Raw: placing raw videos. The path of each clip should be consistent with label.csv.

We provide the download link for preprocessed SIMS, code: 4aa6, md5: 3befed5d2f6ea63a8402f5875ecb220d, which follows the above requirements. You can get more datasets from CMU-MultimodalSDK.

Code Structure

The source code is organized as follows:

.
├── AL-Codes                # Active learning codes
├── MM-Codes                # MSA algorithm codes
├── app.py                  # Flask main codes
├── config.py               # Basic config
├── config.sh               # Basic config
├── constants.py            # Global variable definition
├── database.py             # Database definition & initialization
├── httpServer.py           # Dataset server (for video previews)
└── requirements.txt        # Python requirements

MM-Codes

MSA Code Framework

Based on MMSA, all model and dataset parameters are saved in MM-Codes/config.json.

AL-Codes

Labeling based on Active Learning Code Framework

Based on MMSA, all model and dataset parameters are saved in AL-Codes/config.json.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
app		app
config		config
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
database.py		database.py
db_utils.py		db_utils.py
httpServer.py		httpServer.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Docker

From Source

1. Clone this Repository

2. Install Requirements

3. Configure MySQL

4. Configs

5. Datasets

6. Run

Reference

Dataset Structure

Code Structure

About

Releases

Packages

Contributors 2

Languages

License

iyuge2/M-SENA-Backend

Folders and files

Latest commit

History

Repository files navigation

Installation

Docker

From Source

1. Clone this Repository

2. Install Requirements

3. Configure MySQL

4. Configs

5. Datasets

6. Run

Reference

Dataset Structure

Code Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages