Skip to content

Output format

Anusri Pampari edited this page Dec 29, 2022 · 5 revisions

Here is a detailed description of all the files generated as a part of ChromBPNet model training

models\

	bias_model_scaled.h5 
               This is the final bias model in use for bias correction and is different from the input bias model by a scaling 
               factor on the counts head. The scaling factor accounts for the difference in read-depths of the bias model training dataset and 
               the current observed dataset
	chrombpnet.h5
               This is the bias factorized chromBPNet model that trains on the observed accessibility. This model is the combination of 
               bias_model_scaled.h5 and chrombpnet_nobias.h5
        chrombpnet_nobias.h5 
               TF-Model i.e model to predict bias corrected accessibility profile

logs\

	chrombpnet.log
               Train and validation loss per epoch
	chrombpnet.log.batch 
               Train loss per batch per epoch
        chrombpnet.args.json
               Arguments used in training the chromBPNet model
        chrombpnet_data_params.tsv 
               Stats on training data regions
        chrombpnet_model_params.tsv 
               Model parameters used in training

auxiliary\

	filtered.peaks.bed
               Final peak regions used in training in bed format
	filtered.nonpeaks.bed
               Final non peak regions used in training in bed format
        data_unstranded.bw
               Bigwigs with observed accessibility profiles used as ground truth in training
        motif_to_pwm.tsv
               Bias motifs used in getting marginal footprints
        chrombpnet_nobias_footprints.h5
               Marginal footprints on bias motifs
        30K_subsample_peaks.bed
               30K peaks subsampled for interpretation and TFModisco. See [FAQ](https://github.com/kundajelab/chrombpnet/wiki/FAQ) on why we 
               subsample peaks
        interpret_subsample/chrombpnet_nobias.interpreted_regions.bed
               30K peaks subsampled for interpretation and TFModisco. This is same as 30K_subsample_peaks.bed
        interpret_subsample/chrombpnet_nobias.profile_scores.h5
               Profile contribution scores on each of the 30K subsampled peak regions
        interpret_subsample/modisco_results_profile_scores.h5               
               TFModisco lite object output generated from interpret_subsample/chrombpnet_nobias.profile_scores.h5
        interpret_subsample/chrombpnet_nobias.interpret.args.json
               Arguments used in running the interpretation script


evaluation\

	overall_report.pdf
               This is a pdf summary report of most of the remaining outputs in this folder. It also has guidelines on how to 
               interpret the results 
	overall_report.html
               This is a html summary report of most of the remaining outputs in this folder. It also has guidelines on how to 
               interpret the results
	bw_shift_qc.png
               This image should show the enzyme bias motif and is generated from the final shifted bigwigs. If this shows the bias motif it 
               indicates that the bams are correctly shifted.
        epoch_loss.png
               Bias factorized ChromBPNet training and validation loss change with epoch
	bias_predictions.h5
               Bias model predictions on peak regions
	bias_metrics.json 
               Bias model metrics on peak regions
        bias_only_peaks.counts_pearsonr.png
               Scatter plot of observed and bias model predicted log counts on peak regions
        bias_only_peaks.profile_jsd.png
               Histogram of pairwise JSD calculated between observed and predicted bias model profiles on peak regions overlayed with 
               histogram of pairwise JSD calculated between observed and randomized profiles for worst case      
	chrombpnet_predictions.h5
               Bias factorized ChromBPNet model predictions on peak regions
	chrombpnet_metrics.json
               Bias factorized ChromBPNet model predictions on peak regions
	chrombpnet_only_peaks.counts_pearsonr.png
               Scatter plot of observed and Bias factorized ChromBPNet model predicted log counts on peak regions
	chrombpnet_only_peaks.profile_jsd.png
               Histogram of pairwise JSD calculated between observed and predicted Bias factorized ChromBPNet model profiles on peak regions 
               overlayed with histogram of pairwise JSD calculated between observed and randomized profiles for worst case            
	chrombpnet_nobias.....footprint.png
               Marginal footprint profiles of chrombpnet_nobias.h5 on the bias motifs (name tn5_1, tn5_2.. tn5_5 for ATAC and dnase_1, dnase_2 
               for DNase)
	chrombpnet_nobias_max_bias_response.txt
               Mean of Max of the marginal footprint profiles of chrombpnet_nobias.h5 on the bias motifs and individual max on each of the 
               profiles (name tn5_1, tn5_2.. tn5_5 for ATAC and dnase_1, dnase_2 for DNase)
	chrombpnet_nobias_profile_motifs.pdf
               TFModisco motifs obtained from the contribution scores of chrombpnet_nobias.h5 profile head in pdf format
        modisco_profile/motifs.html
               TFModisco motifs obtained from the contribution scores of chrombpnet_nobias.h5 profile head in html format

Here is a detailed description of all the files generated as a part of bias model training

models\

	bias.h5
               This is the bias model that trains on the observed accessibility in non peak regions.
logs\

	bias.log
               Train and validation loss per epoch
	bias.log.batch 
               Train loss per batch per epoch
        bias.args.json
               Arguments used in training the chromBPNet model
        bias_data_params.tsv 
               Stats on training data regions
        bias_model_params.tsv 
               Model parameters used in training

auxiliary\

	filtered.nonpeaks.bed
               Final non peak regions used in training in bed format
        data_unstranded.bw
               Bigwigs with observed accessibility profiles used as ground truth in training
        30K_subsample_peaks.bed
               30K peaks subsampled for interpretation and TFModisco. See [FAQ](https://github.com/kundajelab/chrombpnet/wiki/FAQ) on why we 
               subsample peaks
        interpret_subsample/bias.interpreted_regions.bed
               30K peaks subsampled for interpretation and TFModisco. This is same as 30K_subsample_peaks.bed
        interpret_subsample/bias.profile_scores.h5
               Profile contribution scores on each of the 30K subsampled peak regions
        interpret_subsample/modisco_results_profile_scores.h5               
               TFModisco lite object output generated from interpret_subsample/chrombpnet_nobias.profile_scores.h5
        interpret_subsample/bias.counts_scores.h5
               Counts contribution scores on each of the 30K subsampled peak regions
        interpret_subsample/modisco_results_counts_scores.h5               
               TFModisco lite object output generated from interpret_subsample/chrombpnet_nobias.counts_scores.h5
        interpret_subsample/chrombpnet_nobias.interpret.args.json
               Arguments used in running the interpretation script

evaluation\

	overall_report.pdf
               This is a pdf summary report of most of the remaining outputs in this folder. It also has guidelines on how to 
               interpret the results 
	overall_report.html
               This is a html summary report of most of the remaining outputs in this folder. It also has guidelines on how to 
               interpret the results
	bw_shift_qc.png
               This image should show the enzyme bias motif and is generated from the final shifted bigwigs. If this shows the bias motif it 
               indicates that the bams are correctly shifted.
        epoch_loss.png
               Bias model training and validation loss change with epoch
	bias_predictions.h5
               Bias model predictions on peak and non peak regions
	bias_metrics.json 
               Bias model metrics on peak and non peak regions
        bias_only_peaks.counts_pearsonr.png
               Scatter plot of observed and bias model predicted log counts on peak regions
        bias_only_peaks.profile_jsd.png
               Histogram of pairwise JSD calculated between observed and predicted bias model profiles on peak regions overlayed with 
               histogram of pairwise JSD calculated between observed and randomized profiles for worst case      
        bias_only_nonpeaks.counts_pearsonr.png
               Scatter plot of observed and bias model predicted log counts on non peak regions
        bias_only_nonpeaks.profile_jsd.png
               Histogram of pairwise JSD calculated between observed and predicted bias model profiles on non peak regions overlayed with 
               histogram of pairwise JSD calculated between observed and randomized profiles for worst case      
	bias_profile_motifs.pdf
               TFModisco motifs obtained from the contribution scores of bias.h5 profile head in pdf format
        modisco_profile/motifs.html
               TFModisco motifs obtained from the contribution scores of bias.h5 profile head in html format
	bias_counts_motifs.pdf
               TFModisco motifs obtained from the contribution scores of bias.h5 counts head in pdf format
        modisco_counts/motifs.html
               TFModisco motifs obtained from the contribution scores of bias.h5 counts head in html format