Much of the code in this repo was not used in the final analyses. The most important scripts are listed below.
run_clean_assemble_bin.sh
is the master script for read QC, metagenome assembly, contig binning, bin QC, and taxonomic assignment. Parts of this script are hard-coded to work with the Cornell BioHPC SGE scheduler and the Brito Lab server structure.
Genes were called in metagenomic bins using run_prokka.sh
. gtf annotations output by prokka can be converted to R objects using gtf2tibble.R
.
The main scripts for cleaning up transcript reads and mapping those reads to references can be found in the Danko Lab proseq2.0 repo. Once you have bam files, per-base coverage reports can be generated with get_pileup_correct.sh
.
EC_peaks.rmd
and Stool_PRO-seq.Rmd
contain the R code used for the E. coli and human microbiome analyses, respectively. The Rmarkdown documents are ordered by main sections (#
) and subsections (##
/###
).
Post-review, analyses were conducted in separate notebooks. These notebooks can be found in data_processing_and_figures
.
E. coli sequencing reads: https://www.ncbi.nlm.nih.gov/sra/PRJNA800038
microbiome sequencing reads: https://www.ncbi.nlm.nih.gov/sra/PRJNA800070