Much of this project was done on the cloud via Terra.bio (Firecloud v2) workflows or Google VMs. As such, if you need access to the Terra workspaces, please email twood@broadinstitute.org.
User Warning: Many of the bash scripts used to train and evaluate the models use nohup commands - if your CPU is not able to tolerate the amount of jobs, consider modifying your local version to serially run the models.
https://app.terra.bio/#workspaces/shipp-dfci/DLBCL_Staudt_TumorOnly_2021_v2/job_history
- Remap the labels from the consensus nmf job above: src_python/remap_labels.py
- Compute q-values per gene: src_python/fisher_5x2_parallel.py
- Generate baseline probabilities: src_python/generate_baseline_probabilities.py
- Create gene footprint table: src_python/calculate_driver_footprint.py
conda create --name Classifier
conda activate Classifier
conda install pytorch torchvision -c pytorch
conda install pandas
conda install matplotlib
conda install scikit-learn
-
Run model training bash scripts (warning: this will launch many jobs, do not launch at all once)
-
run_all_experiments_step1.sh
-
run_all_experiments_step2A.sh
-
run_all_experiments_step2B.sh
-
run_all_experiments_step2C.sh
-
run_all_experiments_step2T.sh
-
run_sens_spec_experiments.sh
-
-
Evaluate all trained models: src_python/evaluate_validation_ensembles.py
-
Combine training history: src_python/combine_model_training_history.py
Most plots are generated via R
- Step 1: src_R/plot_step1.R
- Step 2A: src_R/plot_step2A.R
- Step 2B: src_R/plot_step2B.R
- Step 2C: src_R/plot_step2C.R
- Step 2T: src_R/plot_step2T.R
- Sens/Spec experiments: src_R/plot_sensitivity_specificity_experiment.R
- Model training history: src_R/plot_training_history.R