-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatched forward and reverse sequence files #15
Comments
Hello, I would be most grateful if you can provide some more details to try checking how that happened.
|
|
If you still have the temp dir, can you also count the reads from |
Thanks. The below table is from a run with fewer samples that threw the the same error. I don't see any red flags, but you may think otherwise. Might the issue stem from this?
(see https://benjjneb.github.io/dada2/faq.html)
|
Sorry, but a somewhat related question: if the above cannot be solved, using R1-data only may be enough for my purposes. However, I could not find a combination of parameters that would allow me to do so. Is this possible? Thanks very much. |
Thanks for your help reporting this, hope a fix will be out this week. For single end mode, that was never implemented because The last time I saw a single end dataset that was a long time ago, so dadaist was born quite opinionated :) this is on the radar but will come later probably. |
I just pushed an update that should fix your problem, which I suppose can be somehow filesystem related (by default the order of files should be the same for FOR and REV). |
Great! Thank you. Quick question: what installation method (other than miniconda) would you recommend for v.1.2.1? The one described in the "developmental snapshot"? |
While not available through BioConda, the only possibility would be something like "dev snapshot". # Start from a directory you can download the package
git clone git@github.com:quadram-institute-bioscience/dadaist2.git
# Activate your dadaist environment
source activate "dadaist-env-name"
# Add the current directory/dadaist2/bin to PATH
export PATH="$PWD"/dadaist2/bin/:"$PATH"
# Check if it worked
dadaist2 --version |
Thank you -- the installation worked but I got the same error message(s). I will test if using dada2 outside of dadaist2 results in the same problem and get back to you. |
Ewww, that's frustrating, sorry about this! If you can run in debug mode and send me the log, I might finally understand where the files are flipping dadaist2 --debug {your parameters} 2>&1 | tee dadaist-debug.log Thanks! |
Please see attached -- thanks! |
Hi @padbc, many thanks for sharing. I tried creating a dataset using your sample names (each with different reads number) but I have been unlucky in solving the issue. What I can suggest, if you are so kind to keep helping me here, is:
# This to be run in the place you have your reads:
# in the logs appear as ./ so I used the same here
INPUT=./
# This will produce two output directory input_1 and input_2
mkdir -p input_{1,2}
C=0
seqfu metadata $INPUT | grep -v sample-id > metadata.tsv
set -euo pipefail
while read LINE;
do
C=$((C+1))
sample=$(echo "$LINE" | cut -f1)
for=$(echo "$LINE" | cut -f 2)
rev=$(echo "$LINE" | cut -f 3)
echo -n "Copying $sample... "
cp "$for" "input_1"/Sample${C}_R1.fastq.gz
cp "$rev" "input_1"/Sample${C}_R2.fastq.gz
cp "$for" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R1.fastq.gz
cp "$rev" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R2.fastq.gz
echo Done
done < metadata.tsv
# Check paired reads match
seqfu count input_1/*.gz >/dev/null && echo "OK: input_1"
seqfu count input_2/*.gz >/dev/null && echo "OK: input_2"
# If the following is not printed some program failed
echo "DONE: OK" |
Thanks very much. I will try both solutions. Before you posted this, as I had suspected, I solved the issue by running dada2 using the |
Yes, I will implement that as well, but I also wanted to improve logging in the meanwhile and I'm really grateful for your patience in this issue! |
Quick update. The last part of the output of (2) is the following:
We can therefore rule out naming convention, but these are pre-filtered files. Should we expect additional mismatch errors to be introduced after QCing? |
Since by default QC is non-modifying the FASTQ files (it collects data to feed parameters to DADA2), I'm intrigued. |
Read counts:
|
To add to the above: the difference in read numbers is not what's causing the issue, as removing those samples from the analysis throws the same error message. |
Just to be sure I'm not losing track of info here:
The two files have the very same name: is one of the two actually the R2?
Surely, if the only problem was in |
Thank you. Please see attached log of dadaist2 1.2.2a ran on input_1 files (minus Sample115* files): |
Could you confirm whether the the |
Hello @padbc matchIDs is not yet implemented, and does not look what we need in this context from the docs (https://rdrr.io/github/benjjneb/dada2/man/fastqPairedFilter.html). From the log you kindly provided it looks like the vectors are in the correct order, like in this short example:
so I'm still struggling finding the cause of your issue. If you can create a minimal directory with a short selection of samples that cause the problem to you and wish to send it via mail/ link I'm happy to inspect further. Meanwhile I'll try to add some extra checks trying to go closer to the problem. |
I pushed in the repo 1.2.3 that will count the reads in the input folder and in the filtered folder. Note that this extra information is only available with It would be interesting to see if the problem persists also after removing from the input:
|
Great -- thank you. I will try to take a look at this later today. |
Sorry for the delay here. The errors persists after removing those samples; see attached log. |
Ah, thanks a million.
Now, this does not make sense as you previosly checked the input directory with If not, I cannot figure out why some somples degrades while copying in the temporary directory. Is it a peculiar filesystem maybe? |
Thanks so much for the quick reply. So yes, deleting Sample116* appears to have "solved" the problem, i.e., dadaist2 produces the expected output. Like you, I find this perplexing. |
I can only suggest to check with |
This is not a problem of dadaist2 per se, but I cannot figure out which samples (and why) are causing the following error (there are several of these:
That said, I wonder if dadaist filters the forward and reverse reads independently, resulting in mismatched filtered fastq files.
My command was
dadaist2 -i ./ -o output_folder --maxee1 2 --maxee2 2 -t 8 -d ~/tools/dadaist2/refs/silva_nr_v138_train_set.fa.gz
The text was updated successfully, but these errors were encountered: