Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sans-UMI option #128

Merged
merged 64 commits into from
Aug 27, 2021
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
c3107f3
Merge pull request #61 from nf-core/dev
ggabernet Jan 14, 2020
378998b
starting work on no-umi option for bcellmagic
dladd Jul 16, 2021
d21924c
further work adding a sans-UMI mode, working up through AssemblePairs
dladd Jul 19, 2021
5a2552f
merge changes from nf-core/dev
dladd Jul 19, 2021
cc29af5
Merge pull request #113 from nf-core/dev
ggabernet Jul 19, 2021
3c18309
Merge branch 'master' into no-umi
dladd Jul 20, 2021
ae048c2
update schema with no-umi options
dladd Jul 20, 2021
f66509b
bugfix: typo on presto_collapseseq parameter specification
dladd Jul 20, 2021
f089285
add post-assembly versions of presto filterseq and maskprimers for sa…
dladd Jul 20, 2021
a9e7ca4
add a log parser for no-umi option
dladd Jul 21, 2021
a597ea3
bugfix: header parsing for no-umi mode
dladd Jul 21, 2021
83df93c
do not require read files to end in _R(1/2)
dladd Jul 21, 2021
43ded56
include light chains (if present) in ParseDB step
dladd Jul 22, 2021
d169249
allow non-human species in ig shazam tigger threshold
dladd Jul 23, 2021
e9dbca1
add light chain refs to shazam; allow single-clone samples
dladd Jul 27, 2021
a2f8344
Merge branch 'dev' into no-umi
ggabernet Jul 28, 2021
d35c949
include changeo makedb_args options parameter
dladd Jul 29, 2021
79d2569
add fastqc and multiqc section to check stats on assembled and QC-fil…
dladd Jul 30, 2021
1fc6f85
Merge branch 'dev' into no-umi_dev
dladd Jul 30, 2021
293c1bf
add source label prefix to shazam threshold output files- publishes m…
dladd Aug 2, 2021
a8c9613
add source label prefixes on output files when loci=tr
dladd Aug 2, 2021
de2ea55
Hamming dist. thresholding: if # nearest distances < 3, default thres…
dladd Aug 2, 2021
1381ddf
add conf file for no-umi test, associated files in nf-core/test-datas…
dladd Aug 3, 2021
d4553df
Merge branch 'no-umi' of github.com:dladd/bcellmagic into no-umi
dladd Aug 3, 2021
b630f88
remove makedb_args in favor of options.args
dladd Aug 4, 2021
5452213
remove makedb_args in favor of options.args
dladd Aug 4, 2021
546e757
add sourceLabel comment in bin/TIgGER-shazam.R
dladd Aug 4, 2021
557f7e7
fix comment in bin/log_parsing_no-umi.py
dladd Aug 4, 2021
b40a1bf
remove makedb_args in favor of options.args
dladd Aug 4, 2021
908da60
remove makedb_args in favor of options.args
dladd Aug 4, 2021
0ce9e16
use options.args to specify option logic based on config
dladd Aug 4, 2021
2b214e4
use options.args to specify option logic based on config
dladd Aug 4, 2021
9dc84fd
single-ended as reads rather than R1
dladd Aug 4, 2021
a82adca
single-ended as reads rather than R1
dladd Aug 4, 2021
e2887d8
remove commented line
dladd Aug 4, 2021
8ac4850
move postassembly maskprimers and filterseq to their own module files…
dladd Aug 4, 2021
a3e356e
resolve conflicts with dev using subworkflow for presto with UMIs
dladd Aug 5, 2021
ea08e17
begin adding sans-umi option as subworkflow: prep work and add fastqc…
dladd Aug 6, 2021
b997c4c
add sans-umi subworkflow
dladd Aug 6, 2021
1211d5c
add no-umi test to CI
dladd Aug 6, 2021
e298b60
simplify/harmonize parameter handling
dladd Aug 6, 2021
c064198
remove unnecessary param
dladd Aug 6, 2021
f98b344
linting whitespace
dladd Aug 9, 2021
d619904
testdata URL typo in config
dladd Aug 9, 2021
0a54c52
increase collapseseq resource allocation; log parsing: allow undersco…
dladd Aug 11, 2021
4c589c9
explicitly define default assemblepairs coord as presto
dladd Aug 12, 2021
18a550e
groovy-ify revpr specification in maskprimers
dladd Aug 12, 2021
4034c48
default params.umi_length to null
dladd Aug 12, 2021
cadab60
default params.umi_length to null
dladd Aug 12, 2021
e27f930
default params.umi_length to null
dladd Aug 12, 2021
99a3c84
add multiqc_config.yaml to files_unchanged for linting
dladd Aug 12, 2021
3879e4c
Merge branch 'no-umi' of github.com:dladd/bcellmagic into no-umi
dladd Aug 12, 2021
a6314ab
bugfix: explicitly check for undefined umi_length as null
dladd Aug 12, 2021
18a80a0
linting bugfix: set default umi_length to -1 instead of null
dladd Aug 12, 2021
f968c9a
Update .nf-core.yml for new linting style
dladd Aug 13, 2021
e88431c
Update .nf-core.yml
ggabernet Aug 13, 2021
d8dc9d0
remove unnecessary sans-UMI parseheader collapse step
dladd Aug 16, 2021
ecc1a9f
revert umi_length to a required param, update docs with no-umi option
dladd Aug 16, 2021
f7f7bbb
remove umi_length requirement from schema again to avoid no-umi autom…
dladd Aug 16, 2021
9d347cc
linting whitespace; add recent contributors to README
dladd Aug 16, 2021
b41b8d1
linting whitespace
dladd Aug 17, 2021
4e49c34
Update contributors in README.md
dladd Aug 27, 2021
b3abe78
asterisks not dashes in README for linting
dladd Aug 27, 2021
72893e5
linting
dladd Aug 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:
matrix:
# Nextflow versions: check pipeline minimum and current latest
nxf_ver: ["21.04.0", ""]
profile: ["test_tcr"]
profile: ["test_tcr", "test_no_umi"]
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
Expand Down
8 changes: 3 additions & 5 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
files_unchanged:
- lib/NfcoreSchema.groovy
actions_awsfulltest: False
params_used: False
files_exist:
lint:
files_unchanged:
- assets/multiqc_config.yaml
- environment.yml
- Dockerfile
ggabernet marked this conversation as resolved.
Show resolved Hide resolved
16 changes: 16 additions & 0 deletions assets/multiqc_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@ report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/bcellmagic" target="_blank">nf-core/bcellmagic</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/nf-core/bcellmagic" target="_blank">documentation</a>.

module_order:
- fastqc:
name: 'FastQC (raw)'
info: 'This section of the report shows FastQC results from the original reads'
path_filters:
- './*.zip'
path_filters_exclude:
- './*_ASSEMBLED_fastqc.zip'
- fastqc:
name: 'FastQC (post-assembly)'
info: 'This section of the report shows FastQC results after paired reads
are assembled and QC filtered but before collapsing duplicates.'
path_filters:
- './*_ASSEMBLED_fastqc.zip'

report_section_order:
software_versions:
order: -1000
Expand Down
55 changes: 34 additions & 21 deletions bin/TIgGER-shazam.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,27 @@ fastas = args[4:length(args)]
output_folder = dirname(inputtable)

db <- read.table(inputtable, header=TRUE, sep="\t")
# Add label for source species
sourceLabel <- gsub(pattern = "\\.tsv$", "", inputtable)
dladd marked this conversation as resolved.
Show resolved Hide resolved

if (loci == "ig"){

db_fasta <- readIgFasta(fastas, strip_down_name = TRUE)
db_fasta <- c()
for (fasta in fastas) {
dbf <- readIgFasta(fasta, strip_down_name = TRUE)
db_fasta <- c(db_fasta, dbf)
}

gt <- inferGenotype(db, v_call = "v_call", find_unmutated = F)

# Filter out Duplicate sequences as not supported by Tigger 1.0.0
gt_filt <- filter(gt, !grepl("D", gene))
gt_filt <- filter(gt, !grepl("D|d", gene))

gtseq <- genotypeFasta(gt_filt, db_fasta)
writeFasta(gtseq, paste(output_folder,"v_genotype.fasta",sep="/"))
writeFasta(gtseq, paste(output_folder,paste0(sourceLabel, "_v_genotype.fasta"),sep="/"))

# Plot genotype
ggsave(paste(output_folder,"genotype.pdf",sep="/"), plotGenotype(gt, silent=T))
ggsave(paste(output_folder,paste0(sourceLabel, "_genotype.pdf"),sep="/"), plotGenotype(gt, silent=T))

# Modify allele calls and output TSV file
db_reassigned <- reassignAlleles(db, gtseq)
Expand All @@ -52,7 +58,7 @@ if (loci == "ig"){
normalize="len",
nproc=1,
first = FALSE)
writeChangeoDb(db_reassigned, paste(output_folder,"v_genotyped.tab",sep="/"))
writeChangeoDb(db_reassigned, paste(output_folder,paste0(sourceLabel, "_v_genotyped.tab"),sep="/"))

} else if (loci == "tr") {

Expand All @@ -64,10 +70,10 @@ if (loci == "ig"){
gt <- inferGenotype(db, v_call = "v_call", find_unmutated = FALSE)

gtseq <- genotypeFasta(gt, c(db_fasta_TRAV,db_fasta_TRBV,db_fasta_TRDV))
writeFasta(gtseq, paste(output_folder,"TRxV_genotype.fasta",sep="/"))
writeFasta(gtseq, paste(output_folder,paste0(sourceLabel, "_TRxV_genotype.fasta"),sep="/"))

# Plot genotype
ggsave(paste(output_folder,"genotype.pdf",sep="/"), plotGenotype(gt, silent=T))
ggsave(paste(output_folder,paste0(sourceLabel, "_genotype.pdf"),sep="/"), plotGenotype(gt, silent=T))

# Modify allele calls and output TSV file
db_reassigned <- reassignAlleles(db, gtseq)
Expand All @@ -82,25 +88,32 @@ if (loci == "ig"){
nproc=1,
first = FALSE)

writeChangeoDb(db, paste(output_folder,"v_tr_genotyped.tab",sep="/"))
writeChangeoDb(db, paste(output_folder,paste0(sourceLabel, "_v_tr_genotyped.tab"),sep="/"))

} else {
stop("Loci specified is not available, please choose from: ig, tr.")
}

# Find threshold using chosen method

if (threshold_method == "density") {
output <- findThreshold(dist_ham$dist_nearest, method="density")
threshold <- output@threshold
} else if (threshold_method == "gmm") {
output <- findThreshold(dist_ham$dist_nearest, method="gmm")
threshold <- output@threshold
num_dist <- length(unique(na.omit(dist_ham$dist_nearest)))
if (num_dist > 3) {
# Find threshold using chosen method
if (threshold_method == "density") {
output <- findThreshold(dist_ham$dist_nearest, method="density")
threshold <- output@threshold
} else if (threshold_method == "gmm") {
output <- findThreshold(dist_ham$dist_nearest, method="gmm")
threshold <- output@threshold
} else {
stop("Threshold method is not available, please choose from: density, gmm")
}
# Plot distance histogram, density estimate and optimum threshold
ggsave(paste(output_folder,paste0(sourceLabel, "_Hamming_distance_threshold.pdf"),sep="/"), plot(output), device="pdf")
} else {
stop("Threshold method is not available, please choose from: density, gmm")
# Workaround for sources with too few nearest distance values to determine an effective threshold.
dladd marked this conversation as resolved.
Show resolved Hide resolved
# Set threshold to 0 and print a warning
threshold <- 0.0
warning(paste("Could not determine an effective Hamming distance threshold for source:", sourceLabel, ", which has", num_dist, "unique nearest distances. Threshold defaulting to 0.", sep=" "))
ggsave(paste(output_folder,paste0(sourceLabel, "_Hamming_distance_threshold.pdf"),sep="/"), plot(dist_ham$dist_nearest, dist_ham$duplicate_count), device="pdf")
}

# Plot distance histogram, density estimate and optimum threshold
ggsave(paste(output_folder,"Hamming_distance_threshold.pdf",sep="/"), plot(output), device="pdf")

write.table(threshold, file= paste(output_folder,"threshold.txt",sep="/"), quote=FALSE, sep="", row.names = FALSE, col.names = FALSE)
write.table(threshold, file= paste(output_folder,paste0(sourceLabel, "_threshold.txt"),sep="/"), quote=FALSE, sep="", row.names = FALSE, col.names = FALSE)
16 changes: 8 additions & 8 deletions bin/log_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
with open(logfile, "r") as f:
for line in f:
if " START>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "PAIRS>" in line:
pairs.append(line.strip().lstrip("PAIRS> "))
Expand Down Expand Up @@ -80,7 +80,7 @@
for line in f:
if " START>" in line:
if c < 1:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "SEQUENCES>" in line:
if c < 1:
Expand Down Expand Up @@ -127,7 +127,7 @@
# print(f.read())
for line in f:
if " START>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "SEQUENCES1>" in line:
seqs1.append(line.strip().lstrip("SEQUENCES1").lstrip("> "))
Expand Down Expand Up @@ -162,7 +162,7 @@
# print(f.read())
for line in f:
if " START>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "OUTPUT>" in line:
output_file.append(line.strip().lstrip("OUTPUT> "))
Expand Down Expand Up @@ -202,7 +202,7 @@
# print(f.read())
for line in f:
if " START>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "SEQUENCES>" in line:
seqs.append(line.strip().lstrip("SEQUENCES> "))
Expand Down Expand Up @@ -235,7 +235,7 @@
# print(f.read())
for line in f:
if "PASS>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
pass_blast.append(line.strip().lstrip("PASS> "))
elif "FAIL>" in line:
fail_blast.append(line.strip().lstrip("FAIL> "))
Expand Down Expand Up @@ -267,7 +267,7 @@
# print(f.read())
for line in f:
if " START>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "RECORDS>" in line:
seqs.append(line.strip().lstrip("RECORDS> "))
Expand Down Expand Up @@ -303,7 +303,7 @@
# print(f.read())
for line in f:
if " START>" in line:
s_code.append(logfile.split("/")[1].split("_")[0])
s_code.append(logfile.split("/")[1].split("_command_log")[0])
process_name.append(process)
elif "RECORDS>" in line:
seqs.append(line.strip().lstrip("RECORDS> "))
Expand Down
Loading