All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- main workflow figure for ALSO.
- readthedocs.
- changelog format from @GangLiTarheel.
- Developed Python implementation for filtering bam files by txt-file lists of
QNAMEs
- Rewrote/refactored code to handle input for Python-implemented filtering
- Update
README.md
andlog.md
files
#TODO
Troubleshoot max memory for JVM when runningpicard FilterSamReads
#TODO
Consolidate shell, R functions into one script for each language#TODO
Remove harcoded path to GS installation ofpicard
from scripts#DONE
- Pipeline is completed; passed local unit tests with small files; can call it with
driver_allelic-segregation.sh
- Adding instructions for using
driver_allelic-segregation.sh
toREADME
.
#TODO
Test on the GS HPC with large files: Do we need to increase max heap memory to the JVM when runningpicard
?#TODO
Clean up messages output by the driver#TODO
Determine and list all dependencies
- Addressing error in preprocessing pipeline in which some duplicate QNAMEs persist in processed bam.
- Adding instructions for using the correction script,
03-remove-duplicate-qnames.sh
. #TODO
Add corrections in03-remove-duplicate-qnames.sh
to the initial preprocessing script:03-filter-problematic-qnames-HPC.sh
#DONE
- Cleaned up old example code from [@Kris] (https://github.com/Noble-Lab/ALSO/XX).
- Will create a pull request to move
03-remove-duplicate-qnames.sh
from the ALSO repo to the Shendure-Lab sciatac pipeline repo after the allele score comparison module is completed - Kris will work on the allele score comparison module
- Kris' new version preprocess module passed tests from both Kris and Gang
- Gang runs the preprocess module on all samples
- Gang tested on the one sample from mm10, one sample from CAST
- Kris tested on the largest bam that we have
- Bill cleaned up the space of vol6; all future results will be stored in vol6
- Update the workflow according to Kris's newest preprocess module (4 steps)
- Add worflow script
03-filter-qname.sh
, an in-progress shell pipeline to handle the preprocessing - Write code for handling intermediate files, e.g., deleting, keeping, etc.
- Upload/update test code for debugging the preprocess module
- Update
README
for using the script - Further updates, cleanup of the
README
- Upload/update test code for debugging the preprocess module
- Update
README
for information on running the test code
- Replace the workflow chart
- Upload code to debug preprocess module
- For
04
, add additional code to remove singletons from split bam files
- Add additional options, corrections to
04-split-index-repair-bam.sh
- "mm10" mode, which does not output POS and MPOS bed files
- "strain" mode, which outputs POS and MPOS bed files
- Additional to sort and index bam infile if necessary
- Update associated test script for new modes
- Update workflow chart with yellow box (preprocess step)
- Update run script for preprocess step
- Add
06-convert-bam-to-df_join-bed_write-rds.R
- Clean up repo, removing unneeded scripts and data files
- Update dependencies listed in
README
- Add 05-lift-strain-to-mm10.sh
- Add script to download and process liftOver chain files:
get-liftOver-chains.sh
- Add script to downsample bam files:
generate-downsampled-bam.sh
- Minor changes to workflow scripts
01
and04
- Update
README
, including sample-call section
- Update workflow image
- Update
README
for (filter reads with MAPQ < 30; then removing singleton; subread repair) - Update code for (filter reads with MAPQ < 30; then removing singleton; subread repair.)
- Add new workflow image
- CX updated get_unique_fragments.py. Kris will test it on duplicates
- After Shendure lab pipeline, we will first filter reads with MAPQ < 30; then removing singleton; (Kris: no need to sort anymore) subread repair