A showcase of functions using Biopython, FastqGeneralIterator, pandas, and matplotlib for teaching Big Data parsing for NGS
cd DataViz_For_Fastq
gunzip Life_Expectancy_US/US_A_USALEEP.csv.gz
Alternatively, in a multi-user environment, load Python module:
module load Python/3.9.1
Exact path to bin/python may differ according to Python version.
virtualenv -p /apps/x86_64/python/3.9.1/bin/python ./
bin/pip install biopython
bin/pip install matplotlib
bin/pip install pandas
bin/pip install mpld3
bin/pip install plotly
bin/python BioPython_LengthViolinFastq_Reads.py Viral_MiSeq/SC2_X.R1.fastq Viral_MiSeq/SC2_X.R2.fastq Viral_iSeq/SC2_Y.R1.fastq Viral_iSeq/SC2_Y.R2.fastq --outputType P
BioPython_PHREDViolinFastq_Reads.py Viral_MiSeq/SC2_X.R1.fastq Viral_MiSeq/SC2_X.R2.fastq --outputType P --titleString "Raw Reads PHRED Scores" --showQ30 Y
Note: For faster runtime, --showQ30 N
bin/python Pandas_ParseForViolin_CSV.py US_A_USALEEP.csv --outputType P --stateCodes 6 56 --titleString "U.S., California, and Wyoming Life Expectancy,"
bin/python Plotly_ParseForViolin_to_HTML.py Life_Expectancy_US/US_A_USALEEP.csv --stateCodes 15 6 30 --titleString "U.S., Hawaii, California, and Montana"
Note: For Plotly_ParseForViolin_to_HTML.py, output path should be set to a folder that enables point-and-click opening of the output HTML.