Report README

Inputs

Cecret pipeline analysis directory contents for a given sequencing run
Sequencing run directory, including sample sheet

Function

The report process generates human-readable reports from the flat files generated by the base Cecret pipeline. It does not generate any new data. The associated scripts and template files are written in R and Rmarkdown. Rmarkdown documents contain some HTML as well as YAML and LaTeX in the headers.

To view script options: singularity run-help --app report <singularity-container-name.sif>

To run script: singularity run --bind /mnt,<hostMntPt> --app report <singularity_container_name.sif> <runID> <analysisDirFP> <seqDirFP> <type> <mqc> <sigPage>

Outputs

All html outputs are placed in the Cecret analysis directory under report by default. Standard outputs include:

index.html: an easier to read version of summary.txt from the Cecret analysis output.
runInfo.html: an easier to read version of SampleSheet.csv in the sequencer output directory.
runQC.html: links to MultiQC report and displays insert size histograms for each sample.
ampliconCov.html: displays and links to detailed amplicon coverage analysis for each sample.
sGene.html: currently only stub for future development.
about.html: lists output directories and the version of component software used to generate them.

The other output is SC2_Variant_WGS_Run_Summary.pdf located in the report directory. This file is for the CLIA technical supervisor to record official approval of the analysis output with an electronic signature. It is split into three sections, eLIMS Fields, Other QC Fields, and Other Dry Lab Fields.

Summary Table Details

Note that the Sample.Name column is repeated regularly to aid in readability in the report documents. Numbers are appended to the ends of the repeated column headers.

The table below provides details about each column found in four different output documents.

summary.txt is the primary summary output document from Cecret, and the name of the field in summary.txt is found under "summary.txt column" below.
index.html is a lightly modified version of summary.txt. Fields may be removed in index.html relative to summary.txt, but there are no additional fields. Fields with pass/fail thresholds are visually coded to reflect their score. The name of the field is found under "index.html column" below.
push_to_elims.txt and SC2_Variant_WGS_Run_Summary.pdf contain identical names for columns that they share, and those column names are listed under "eLIMS report column" below. push_to_elims.txt contains the subset of fields from summary.txt that are uploaded into eLIMS. SC2_Variant_WGS_Run_Summary.pdf is another copy of summary.txt with an electronic signature page for recording official CLIA approvals of the data. There are three tables in SC2_Variant_WGS_Run_Summary.pdf, "eLIMS Fields" for the fields that are entered into eLIMS, "Other QC Fields" for other QC data that is not uploaded to eLIMS, and "Other Dry Lab Fields" for everything else that is in summary.txt but not uploaded to eLIMS or used for CLIA QC.

The "use" column indicates whether a row is reported in eLIMS, not reported in eLIMS, or is an official CLIA QC metric that is reported (QC-R) or not reported (QC-NR).

use	summary.txt column	index.html column	eLIMS report column	threshold	value description
not reported	sample_id	Sample.ID	not included	not null	Sample ID fragment starting at beginning and going through first hyphen
reported	not included	not included	CSID	not null
reported	not included	not included	CUID	not null
not reported	sample	Sample.Name	not included	not null	Full sample identifier
not reported	aligner_version	not included	not included	not null
not reported	ivar_version	not included	not included	not null
reported	pangolin_lineage	Pangolin	Lineage	not null
QC-NR	pangolin_status	Pangolin.QC	not included	passed
not reported	nextclade_clade	NextClade	not included
QC-R	fastqc_raw_reads_1	#FastQC.R1	Total Reads	>=100,000
not reported	fastqc_raw_reads_2	#FastQC.R2	not included	>=100,00
not reported	seqyclean_pairs_kept_after_cleaning	#Seqyclean.Pairs	not included
not reported	seqyclean_percent_kept_after_cleaning	%Seqyclean.Pairs	not included
not reported	fastp_reads_passed	not included	not included
QC-R	depth_after_trimming	Depth.Post.Trim	Average Depth	>=100x
QC-R	coverage_after_trimming	Coverage.Post.Trim	Percent Genome Coverage	>= 90% with SME discretion to go as low as 60% if mutations reported
not reported	%_human_reads	%Human.Reads	not included
not reported	%_SARS-COV-2_reads	%SC2.Reads	not included
not reported	ivar_num_variants_identified	#iVar.Variants	not included
not reported	bcftools_variants_identified	#BCFTools.Variants	not included
not reported	bedtools_num_failed_amplicons	#BEDTools.Failed.Amps	not included
not reported	samtools_num_failed_amplicons	SAMTools.Failed.Amps	not included
not reported	num_N	#N	not included
not reported	num_degenerage	#Degenerate	not included
not reported	num_ACTG	#ACTG	not included
not reported	num_total	#Total.Bases	not included
not reported	Total_Reads_Analyzed	Total.Reads.Analyzed	not included
QC-NR	%_N	%N	not included	<10%
not reported	ave_cov_depth	Mean.Cov.Depth	not included
QC-R	%_Reads_Matching_SC2_Ref	%Reads.Mapping.SC2	Percent Mapped Reads	>=65%
not reported	vadr_status	Vadr	not included
not reported	vdr_sample_orfshift	Vadr.All.ORF.Shift	not included
QC-NR	vdr_sgene_orftshift	Vadr.S.ORF.Shift	Frameshifts	false
reported	S_aa_indels	AA.Changes.S	Spike Protein Substitutions	not null	list of insertions, deletions, and substitutions found in the amino acids reported for the S gene
not reported	len_largest_insertion	Length.Longest.Insert	not included
not reported	len_largest_deletion	Length.Longest.Del	not included
reported	pangoLEARN_version	pangoLearn.v	pangoLEARN Version	not null
reported	pangolin_subs	#Lineage.Subs	Number of Lineage-Defined Substitutions	not null
reported	GenBank#	GenBank#	GenBank Accession #	NA until # obtained
QC-R	ORFs.Passing.QC	ORFs.Passing.QC	Open Reading Frames	>=10	a count of ORFs with >=95% coverage and mean depth of >=100x
QC-R	Coverage.S	Coverage.S	S-gene Coverage	>= 95%	percentage of positions in the predicted S gene length that have any (even 1 read) sequencing data
not reported	Mean.Depth.S	Mean.Depth.S	not included		mean depth of coverage of sequencing across predicted S gene
not reported	Percent.Pos.Min.Cov.S	Percent.Pos.Min.Cov.S	not included		percentage of positions in the S gene that meet minimum coverage threshold
QC-NR	Percent.Ns.S	Percent.Ns.S	not included	<10%	percentage of Ns in the region of the consensus sequence for the S gene

Dependencies

The Singularity container described in singularity-r.def. For details on obtaining and using container, see the README.
Scripts (within the Singularity container unless noted)
- config.R controls the rendering of the html and pdf versions of the report. View the help document using singularity exec --bind /mnt,<hostMntPt> <containerName> Rscript /opt/config.R --help. Note that the mount point in the host directory tree must be high enough to include all required input files and output locations below it.
- HTML report
  - index.Rmd produces an easier to read version of summary.txt from the Cecret analysis output.
  - runInfo.Rmd produces an easier to read version of SampleSheet.csv in the sequencer output directory.
  - runQC.Rmd produces a page with links to MultiQC report and displays insert size histograms for each sample.
  - ampliconCov.Rmd produces detailed amplicon coverage analysis for each sample. Calls ampliconDetailTemplate.Rmd to generate subpages.
  - sGene.Rmd blank for future development.
  - about.Rmd produces a list of output directories and the version of component software used to generate them.
- PDF report
  - report_headers_template.tex is a template document customizing the PDF report's page headers and footers. See Note for more information. Found only in repo, not in Singularity container.
  - clia_sig_page.Rmd is manually rendered to create a PDF with signature lines for CLIA documentation. See Note for more information. Found only in repo, not in Singularity container.
  - clia_summary.Rmd produces a PDF containing tables of the information found in the Cecret output summary.txt organized in a way that is easier for the CLIA supervisor to review.
  - render_clia_sig.R renders a signature PDF that is reused each time a report is generated. See Note for more information.
  - render_clia_report_blank.R renders a blank version of a PDF report. Typical use is inclusion in CLIA certification packages. See Note for more information. Found only in repo, not in Singularity container.
Reference documents
- MN908947.3-ORF7b.bed and MN908947.3-ORFs.bed: Open reading frame annotations for SARS-CoV-2.
- artic_V3_nCoV-2019.bed: Artic V3 primer scheme.

Note

Customizing Report Headers and Footers for CLIA or QMS

To customize the PDF that serves as the official record of the run and is signed by the CLIA Technical Supervisor, you must modify report_headers_template.tex.

Open the file in a text editor and scroll to the bottom.

To modify the report title in the page headers, edit the following line. The existing header reads "SARS-CoV-2 Variant Whole Genome Sequencing Run Summary". \chead{\textbf{SARS-CoV-2 Variant Whole Genome Sequencing Run Summary}}
By default each page will be numbered in the lower right corner of the header, but since the signature sheet is rendered separately from the report, the signature page will always be numbered 1 even though it appears at the end of the final output PDF. You can resolve this by removing the line \rhead{\thepage} from the .tex file before rendering the signature page (usually performed only once on initial set up of the pipeline) and rendering the signature page without a page number, then putting the line back in before running the pipeline. This will result in a correctly numbered report with an unnumbered signature page at the end.
To customize the version number of your document in the footer, replace the "#" in the following line with the desired version identifier. \cfoot{\textbf{Ver. No.} #}
To customize the document identifier number in the footer, replace the "#" in the following line with the desired identifier. \lfoot{\textbf{Doc. No.} #}
To customize the effective date in the footer, modify the placeholder date of 01/01/1900 in the following line. \rfoot{\textbf{Effective Date:} 01/01/1900}

Once modifications are complete, save the file as `report_headers.tex` in `SC2CLIA/Cecret/configs/internal`.

Rendering the CLIA Signature Page Template

The CLIA signature page template file clia_sig_page.Rmd found in SC2/Cecret/bin/report must be manually rendered to create a PDF before running the pipeline. This PDF is reused by all subsequent analysis runs of the pipeline by appending it to the PDF of results tables outputted by each run.

To render this page, we recommend you first see the instructions in the above section regarding the customization of the page headers and footers. Once customization is complete, navigate to the directory SINGULARITY-CACHE and check for the container sc2clia-cecret-r:version#.sif. Note that the container is not included in the git repo and must be obtained first (more info here). Once you have a copy of the container, run the command singularity run --bind /mnt,<hostMntPt> --app sig_page <singularity_container_name.sif> <inputRMD> <inputTEX> <outputDir>. The PDF will automatically render as clia_sig.pdf in . Run command singularity exec --bind /mnt,<hostMntPt> <containerName> Rscript /opt/render_clia_sig.R --help to see help documentation. Move clia_sig.pdf to SC2CLIA/Cecret/configs/internal for use by the pipeline.

Alternatively, if you have a reasonably modern version of R installed locally, you can render the file using that install instead of the Singularity image by running Rscript Cecret/bin/report/render_clia_sig.R -r <inputRMD> -t <inputTEX> -d <outputDir> from the SC2CLIA directory.

If you wish to keep your entire records system digital for CLIA, we recommend you use a program such as Adobe Acrobat Pro to create fillable fields on the clia_sig.pdf file before running the full pipeline.

Rendering an Empty CLIA Report for Workflow Validation

It is common for groups creating a CLIA workflow validation package to include a mocked up or blank copy of a report in the package. To produce an empty report (includes tables and page headers/footers without any data) with a sign off sheet for the CLIA Technical Supervisor, navigate to the directory Cecret/SINGULARITY-CACHE and run the command singularity run --bind /mnt,<hostMntPt> --app report_blank <containerName> <inputTEX> <inputSig> <outputDir>. We recommend only running this script using the R Singularity container included in the pipeline because this script has a non-R dependency. The output file is SC2_Variant_WGS_Run_Summary_blank.pdf in the output directory.

Full Rmd Parameter Documentation

Common params across most report Rmd documents
- analysisDirFP: Cecret output directory, full path
- runID: sequencing run identifier, string
- seqDirFP: sequencing data directory, full path
Document-specific params
- index.Rmd, clia_summary.Rmd
  - sampleIDInsertFreq: number of columns after which the script repeats the sample ID column in wide tables
- clia_summary.Rmd
  - runNormal: when TRUE, document renders with data filled into tables. When FALSE, renders tables with single row of NA instead of data
  - in_header: LaTeX document name that contains header and footer information
- runQC.Rmd
  - multiQC: location of MultiQC report HTML
  - insertDir: directory to search for insert sizes data
  - insertFileSuffix: redundant file suffix to trim off of insert sizes files to obtain sample ID
  - expectedInsertSize: expect insert sizes in bp
  - extremeInsertSize: threshold for considering an insert size extreme in bp
- ampliconCov.Rmd
  - pacbamDir: directory to search for pacbam files
  - pacbamFileSuffix: redundant file suffix to trim off of pacbam files to obtain sample ID
  - ampBED: amplicon BED file
  - ampMinCov: minimum depth of coverage acceptable for any given nucleotide in the amplicon
  - ampMeanCov: minimum mean depth of coverage acceptable for amplicon
  - ampMaxNFail: maximum number of amplicons that may fail and the sample still pass
  - ampMinNPass: minimum number of amplicons that must pass for the sample to pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report_README.md

report_README.md

Report README

TOC

Inputs

Function

Outputs

Summary Table Details

Dependencies

Note

Customizing Report Headers and Footers for CLIA or QMS

Rendering the CLIA Signature Page Template

Rendering an Empty CLIA Report for Workflow Validation

Full Rmd Parameter Documentation

Files

report_README.md

Latest commit

History

report_README.md

File metadata and controls

Report README

TOC

Inputs

Function

Outputs

Summary Table Details

Dependencies

Note

Customizing Report Headers and Footers for CLIA or QMS

Rendering the CLIA Signature Page Template

Rendering an Empty CLIA Report for Workflow Validation

Full Rmd Parameter Documentation