Mismatched forward and reverse sequence files #15

padbc · 2021-11-04T23:11:17Z

This is not a problem of dadaist2 per se, but I cannot figure out which samples (and why) are causing the following error (there are several of these:

Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0,  :
  Mismatched forward and reverse sequence files: 3811, 11495.

That said, I wonder if dadaist filters the forward and reverse reads independently, resulting in mismatched filtered fastq files.

My command was dadaist2 -i ./ -o output_folder --maxee1 2 --maxee2 2 -t 8 -d ~/tools/dadaist2/refs/silva_nr_v138_train_set.fa.gz

The text was updated successfully, but these errors were encountered:

telatin · 2021-11-05T09:06:34Z

Hello, I would be most grateful if you can provide some more details to try checking how that happened.

What OS are you using
What version of Dadaist2
Can you attach the log?
Can paste the total reads counts by seqfu stats -n reads/*.fastq.gz to have an overview of the number of samples and their depth

padbc · 2021-11-05T17:05:04Z

OS: Ubuntu 14.04 LTS
dadaist2 version: [1.1.0]
dadaist.log
There are 1262 samples, so I'm just pasting the first few (the story is the same for all of them):
─────────────────────────────────────────────┬────────┬──────────┬───────┬─────┬─────┬─────┬────────┬─────┬─────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
├─────────────────────────────────────────────┼────────┼──────────┼───────┼─────┼─────┼─────┼────────┼─────┼─────┤
│ ./0300903-27022018-S37_L001_R1_001.fastq.gz │ 132736 │ 39908375 │ 300.7 │ 301 │ 301 │ 300 │ 0.136 │ 35 │ 301 │
│ ./0300903-27022018-S37_L001_R2_001.fastq.gz │ 132736 │ 39873083 │ 300.4 │ 301 │ 300 │ 300 │ 0.131 │ 35 │ 301 │
│ ./0300904-06042018-S38_L001_R1_001.fastq.gz │ 125201 │ 37500347 │ 299.5 │ 301 │ 301 │ 300 │ 0.237 │ 35 │ 301 │
│ ./0300904-06042018-S38_L001_R2_001.fastq.gz │ 125201 │ 37473176 │ 299.3 │ 301 │ 300 │ 300 │ 0.236 │ 35 │ 301 │
│ ./0404504-2019-S98_L001_R1_001.fastq.gz │ 146585 │ 43893185 │ 299.4 │ 301 │ 301 │ 299 │ 0.199 │ 35 │ 301 │
│ ./0404504-2019-S98_L001_R2_001.fastq.gz │ 146585 │ 43884601 │ 299.4 │ 301 │ 300 │ 300 │ 0.199 │ 35 │ 301 │
│ ./100102-S13_L001_R1_001.fastq.gz │ 139407 │ 41916488 │ 300.7 │ 301 │ 301 │ 300 │ 0.070 │ 35 │ 301 │
│ ./100102-S13_L001_R2_001.fastq.gz │ 139407 │ 41890087 │ 300.5 │ 301 │ 300 │ 300 │ 0.054 │ 35 │ 301 │
│ ./100102-S48_L001_R1_001.fastq.gz │ 80043 │ 23353810 │ 291.8 │ 301 │ 300 │ 289 │ 0.387 │ 35 │ 301 │
│ ./100102-S48_L001_R2_001.fastq.gz │ 80043 │ 23380877 │ 292.1 │ 301 │ 300 │ 291 │ 0.386 │ 35 │ 301 │
│ ./100102-S68_L001_R1_001.fastq.gz │ 102856 │ 30910462 │ 300.5 │ 301 │ 301 │ 299 │ 0.032 │ 71 │ 301 │
│ ./100102-S68_L001_R2_001.fastq.gz │ 102856 │ 30912377 │ 300.5 │ 301 │ 300 │ 300 │ 0.032 │ 71 │ 301 │
│ ./100202-S14_L001_R1_001.fastq.gz │ 146864 │ 44170584 │ 300.8 │ 301 │ 301 │ 300 │ 0.055 │ 35 │ 301 │
│ ./100202-S14_L001_R2_001.fastq.gz │ 146864 │ 44137972 │ 300.5 │ 301 │ 300 │ 300 │ 0.049 │ 35 │ 301

telatin · 2021-11-08T12:21:58Z

If you still have the temp dir, can you also count the reads from /tmp/dadaist2_ZLjc61/for/*gz and /tmp/dadaist2_ZLjc61/rev/*gz?
Running the pipeline with --debug force the temporary directories not to be deleted in any case.

padbc · 2021-11-08T16:30:07Z

Thanks. The below table is from a run with fewer samples that threw the the same error. I don't see any red flags, but you may think otherwise.

Might the issue stem from this?

What if my forward and reverse reads aren’t in matching order?

This situation commonly arises when external filtering methods, like the QIIME demultiplexing script, are used and filter the forward and reverse reads independently. This can be remedied by using adding matchIDs=TRUE flag to the filterAndTrim or fastqPairedFilter functions. For example, if no more filtering is required, the following will retain just those reads that match between the forward and reverse fastq files (assumes Illumina fastq headers):
filterAndTrim(..., matchIDs=TRUE)

(see https://benjjneb.github.io/dada2/faq.html)

┌────────────────────────────────────┬────────┬──────────┬───────┬─────┬─────┬─────┬───────┬─────┬─────┐
│ File                               │ #Seq   │ Total bp │ Avg   │ N50 │ N75 │ N90 │ auN   │ Min │ Max │
├────────────────────────────────────┼────────┼──────────┼───────┼─────┼─────┼─────┼───────┼─────┼─────┤
│ 100102-S13_L001_R1_001.fastq.gz    │ 139407 │ 41916488 │ 300.7 │ 301 │ 301 │ 300 │ 0.070 │ 35  │ 301 │
│ 100202-S14_L001_R1_001.fastq.gz    │ 146864 │ 44170584 │ 300.8 │ 301 │ 301 │ 300 │ 0.055 │ 35  │ 301 │
│ 100203-S15_L001_R1_001.fastq.gz    │ 139739 │ 42018747 │ 300.7 │ 301 │ 301 │ 301 │ 0.164 │ 35  │ 301 │
│ 100204-S16_L001_R1_001.fastq.gz    │ 132438 │ 39803309 │ 300.5 │ 301 │ 301 │ 300 │ 0.122 │ 35  │ 301 │
│ 100205-S17_L001_R1_001.fastq.gz    │ 144511 │ 43460952 │ 300.7 │ 301 │ 301 │ 300 │ 0.107 │ 35  │ 301 │
│ 100206-S18_L001_R1_001.fastq.gz    │ 135101 │ 40573650 │ 300.3 │ 301 │ 301 │ 301 │ 0.210 │ 35  │ 301 │
│ 100502-S19_L001_R1_001.fastq.gz    │ 154180 │ 46279307 │ 300.2 │ 301 │ 300 │ 299 │ 0.169 │ 35  │ 301 │
│ 100503-S20_L001_R1_001.fastq.gz    │ 136254 │ 40876059 │ 300.0 │ 301 │ 300 │ 298 │ 0.196 │ 35  │ 301 │
│ 100504-S21_L001_R1_001.fastq.gz    │ 149045 │ 44791183 │ 300.5 │ 301 │ 301 │ 299 │ 0.114 │ 35  │ 301 │
│ 100505-S22_L001_R1_001.fastq.gz    │ 144673 │ 43404679 │ 300.0 │ 301 │ 300 │ 299 │ 0.188 │ 35  │ 301 │
│ 100506-S23_L001_R1_001.fastq.gz    │ 139843 │ 41949009 │ 300.0 │ 301 │ 300 │ 299 │ 0.203 │ 35  │ 301 │
│ 200202-S94_L001_R1_001.fastq.gz    │ 149611 │ 44664048 │ 298.5 │ 301 │ 301 │ 300 │ 0.202 │ 35  │ 301 │
│ 200203-S95_L001_R1_001.fastq.gz    │ 103600 │ 31036418 │ 299.6 │ 301 │ 301 │ 300 │ 0.283 │ 35  │ 301 │
│ 200502-S7_L001_R1_001.fastq.gz     │ 144901 │ 43509686 │ 300.3 │ 301 │ 300 │ 299 │ 0.157 │ 35  │ 301 │
│ 200503-S8_L001_R1_001.fastq.gz     │ 137665 │ 41387170 │ 300.6 │ 301 │ 301 │ 300 │ 0.074 │ 35  │ 301 │
│ 200504-S9_L001_R1_001.fastq.gz     │ 123302 │ 37034341 │ 300.4 │ 301 │ 301 │ 300 │ 0.222 │ 35  │ 301 │
│ 200505-S10_L001_R1_001.fastq.gz    │ 117559 │ 35345816 │ 300.7 │ 301 │ 301 │ 300 │ 0.104 │ 35  │ 301 │
│ 200506-S11_L001_R1_001.fastq.gz    │ 136077 │ 40842847 │ 300.1 │ 301 │ 300 │ 299 │ 0.185 │ 35  │ 301 │
│ 200599-S12_L001_R1_001.fastq.gz    │ 113326 │ 33910575 │ 299.2 │ 301 │ 301 │ 300 │ 0.264 │ 35  │ 301 │
│ 200602-S134_L001_R1_001.fastq.gz   │ 136183 │ 40852757 │ 300.0 │ 301 │ 300 │ 299 │ 0.206 │ 35  │ 301 │
│ 200603-S135_L001_R1_001.fastq.gz   │ 97157  │ 29174080 │ 300.3 │ 301 │ 301 │ 299 │ 0.238 │ 35  │ 301 │
│ 200702-S2_L001_R1_001.fastq.gz     │ 117177 │ 35234949 │ 300.7 │ 301 │ 301 │ 300 │ 0.093 │ 35  │ 301 │
│ 200703-S3_L001_R1_001.fastq.gz     │ 139156 │ 41850711 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 86  │ 301 │
│ 200704-S4_L001_R1_001.fastq.gz     │ 115559 │ 34510934 │ 298.6 │ 301 │ 301 │ 300 │ 0.260 │ 35  │ 301 │
│ 200705-S5_L001_R1_001.fastq.gz     │ 128350 │ 38462972 │ 299.7 │ 301 │ 301 │ 300 │ 0.228 │ 35  │ 301 │
│ 200706-S6_L001_R1_001.fastq.gz     │ 117212 │ 34748566 │ 296.5 │ 301 │ 301 │ 300 │ 0.260 │ 35  │ 301 │
│ 200802-S103_L001_R1_001.fastq.gz   │ 101572 │ 30372892 │ 299.0 │ 301 │ 301 │ 301 │ 0.291 │ 35  │ 301 │
│ 200803-S104_L001_R1_001.fastq.gz   │ 131863 │ 39583103 │ 300.2 │ 301 │ 301 │ 301 │ 0.214 │ 35  │ 301 │
│ 200899-S105_L001_R1_001.fastq.gz   │ 127531 │ 38191634 │ 299.5 │ 301 │ 301 │ 300 │ 0.234 │ 35  │ 301 │
│ 200902-S136_L001_R1_001.fastq.gz   │ 119614 │ 35851983 │ 299.7 │ 301 │ 301 │ 299 │ 0.244 │ 35  │ 301 │
│ 200904-S137_L001_R1_001.fastq.gz   │ 137159 │ 41195209 │ 300.3 │ 301 │ 300 │ 299 │ 0.105 │ 35  │ 301 │
│ 200905-S138_L001_R1_001.fastq.gz   │ 101718 │ 30472597 │ 299.6 │ 301 │ 301 │ 300 │ 0.289 │ 35  │ 301 │
│ 200906-S139_L001_R1_001.fastq.gz   │ 116950 │ 35140104 │ 300.5 │ 301 │ 301 │ 299 │ 0.119 │ 35  │ 301 │
│ 201002-S1_L001_R1_001.fastq.gz     │ 164983 │ 49509070 │ 300.1 │ 301 │ 299 │ 298 │ 0.123 │ 35  │ 301 │
│ 301102-S72_L001_R1_001.fastq.gz    │ 111648 │ 33522225 │ 300.2 │ 301 │ 301 │ 300 │ 0.248 │ 35  │ 301 │
│ 301103-S73_L001_R1_001.fastq.gz    │ 134229 │ 40310675 │ 300.3 │ 301 │ 301 │ 300 │ 0.201 │ 35  │ 301 │
│ 301104-S74_L001_R1_001.fastq.gz    │ 159954 │ 48021586 │ 300.2 │ 301 │ 300 │ 299 │ 0.162 │ 35  │ 301 │
│ 301105-S75_L001_R1_001.fastq.gz    │ 149240 │ 44760267 │ 299.9 │ 301 │ 300 │ 299 │ 0.194 │ 35  │ 301 │
│ 301202-S76_L001_R1_001.fastq.gz    │ 129360 │ 38852367 │ 300.3 │ 301 │ 301 │ 299 │ 0.187 │ 35  │ 301 │
│ 301203-S77_L001_R1_001.fastq.gz    │ 129759 │ 38979143 │ 300.4 │ 301 │ 301 │ 299 │ 0.144 │ 35  │ 301 │
│ 301204-S78_L001_R1_001.fastq.gz    │ 124516 │ 37406603 │ 300.4 │ 301 │ 301 │ 299 │ 0.166 │ 35  │ 301 │
│ 301205-S79_L001_R1_001.fastq.gz    │ 110261 │ 33139525 │ 300.6 │ 301 │ 301 │ 300 │ 0.163 │ 35  │ 301 │
│ 301206-S80_L001_R1_001.fastq.gz    │ 125896 │ 37834323 │ 300.5 │ 301 │ 301 │ 300 │ 0.162 │ 35  │ 301 │
│ 301299-S81_L001_R1_001.fastq.gz    │ 112832 │ 33872487 │ 300.2 │ 301 │ 300 │ 299 │ 0.205 │ 35  │ 301 │
│ 303102-S38_L001_R1_001.fastq.gz    │ 126745 │ 38103308 │ 300.6 │ 301 │ 301 │ 300 │ 0.160 │ 35  │ 301 │
│ 303202-S39_L001_R1_001.fastq.gz    │ 127925 │ 37997809 │ 297.0 │ 301 │ 301 │ 300 │ 0.238 │ 35  │ 301 │
│ 303203-S40_L001_R1_001.fastq.gz    │ 136279 │ 40725166 │ 298.8 │ 301 │ 301 │ 300 │ 0.220 │ 35  │ 301 │
│ 303204-S41_L001_R1_001.fastq.gz    │ 111187 │ 33332312 │ 299.8 │ 301 │ 301 │ 299 │ 0.264 │ 35  │ 301 │
│ 303205-S42_L001_R1_001.fastq.gz    │ 120717 │ 35669441 │ 295.5 │ 301 │ 301 │ 299 │ 0.255 │ 35  │ 301 │
│ 303206-S43_L001_R1_001.fastq.gz    │ 135175 │ 40419603 │ 299.0 │ 301 │ 301 │ 300 │ 0.222 │ 35  │ 301 │
│ 303299-S44_L001_R1_001.fastq.gz    │ 116796 │ 34984447 │ 299.5 │ 301 │ 301 │ 299 │ 0.253 │ 35  │ 301 │
│ 303402-S45_L001_R1_001.fastq.gz    │ 119797 │ 35858803 │ 299.3 │ 301 │ 299 │ 298 │ 0.240 │ 35  │ 301 │
│ 303403-S46_L001_R1_001.fastq.gz    │ 186419 │ 55979250 │ 300.3 │ 301 │ 301 │ 299 │ 0.148 │ 35  │ 301 │
│ 303404-S47_L001_R1_001.fastq.gz    │ 118567 │ 35652038 │ 300.7 │ 301 │ 301 │ 300 │ 0.078 │ 35  │ 301 │
│ 303405-S48_L001_R1_001.fastq.gz    │ 116310 │ 34969402 │ 300.7 │ 301 │ 301 │ 300 │ 0.088 │ 35  │ 301 │
│ 303406-S49_L001_R1_001.fastq.gz    │ 115016 │ 34398140 │ 299.1 │ 301 │ 300 │ 299 │ 0.258 │ 35  │ 301 │
│ 400102-S84_L001_R1_001.fastq.gz    │ 134777 │ 37410202 │ 277.6 │ 301 │ 301 │ 257 │ 0.244 │ 35  │ 301 │
│ 400103-S85_L001_R1_001.fastq.gz    │ 143919 │ 41743878 │ 290.1 │ 301 │ 301 │ 299 │ 0.218 │ 35  │ 301 │
│ 400104-S86_L001_R1_001.fastq.gz    │ 124926 │ 36977294 │ 296.0 │ 301 │ 300 │ 299 │ 0.245 │ 35  │ 301 │
│ 400105-S87_L001_R1_001.fastq.gz    │ 114701 │ 33863620 │ 295.2 │ 301 │ 301 │ 300 │ 0.268 │ 35  │ 301 │
│ 400106-S88_L001_R1_001.fastq.gz    │ 156059 │ 45877631 │ 294.0 │ 301 │ 301 │ 299 │ 0.198 │ 35  │ 301 │
│ 400302-S89_L001_R1_001.fastq.gz    │ 146847 │ 44060983 │ 300.0 │ 301 │ 301 │ 300 │ 0.194 │ 35  │ 301 │
│ 400303-S90_L001_R1_001.fastq.gz    │ 152645 │ 45637805 │ 299.0 │ 301 │ 299 │ 298 │ 0.198 │ 35  │ 301 │
│ 400304-S91_L001_R1_001.fastq.gz    │ 139780 │ 41925656 │ 299.9 │ 301 │ 301 │ 300 │ 0.210 │ 35  │ 301 │
│ 400306-S92_L001_R1_001.fastq.gz    │ 149711 │ 44906057 │ 300.0 │ 301 │ 300 │ 298 │ 0.181 │ 35  │ 301 │
│ 400399-S93_L001_R1_001.fastq.gz    │ 131503 │ 39519530 │ 300.5 │ 301 │ 301 │ 301 │ 0.172 │ 35  │ 301 │
│ 400402-S97_L001_R1_001.fastq.gz    │ 121356 │ 36378975 │ 299.8 │ 301 │ 301 │ 300 │ 0.244 │ 35  │ 301 │
│ 400403-S98_L001_R1_001.fastq.gz    │ 138126 │ 41429581 │ 299.9 │ 301 │ 300 │ 299 │ 0.206 │ 35  │ 301 │
│ 400404-S99_L001_R1_001.fastq.gz    │ 145812 │ 43816556 │ 300.5 │ 301 │ 301 │ 300 │ 0.176 │ 35  │ 301 │
│ 400405-S100_L001_R1_001.fastq.gz   │ 130846 │ 39087045 │ 298.7 │ 301 │ 301 │ 300 │ 0.231 │ 35  │ 301 │
│ 400406-S96_L001_R1_001.fastq.gz    │ 122805 │ 36758431 │ 299.3 │ 301 │ 301 │ 300 │ 0.243 │ 35  │ 301 │
│ 400498-S102_L001_R1_001.fastq.gz   │ 117750 │ 35196667 │ 298.9 │ 301 │ 301 │ 300 │ 0.253 │ 35  │ 301 │
│ 400499-S101_L001_R1_001.fastq.gz   │ 128036 │ 38396715 │ 299.9 │ 301 │ 301 │ 300 │ 0.229 │ 35  │ 301 │
│ 401502-S125_L001_R1_001.fastq.gz   │ 133535 │ 40072003 │ 300.1 │ 301 │ 301 │ 299 │ 0.197 │ 35  │ 301 │
│ 401503-S126_L001_R1_001.fastq.gz   │ 123937 │ 37223689 │ 300.3 │ 301 │ 301 │ 299 │ 0.167 │ 35  │ 301 │
│ 401504-S127_L001_R1_001.fastq.gz   │ 110292 │ 33108766 │ 300.2 │ 301 │ 300 │ 299 │ 0.202 │ 35  │ 301 │
│ 401505-S128_L001_R1_001.fastq.gz   │ 117017 │ 35141339 │ 300.3 │ 301 │ 300 │ 299 │ 0.131 │ 35  │ 301 │
│ 401506-S129_L001_R1_001.fastq.gz   │ 95721  │ 28731675 │ 300.2 │ 301 │ 301 │ 299 │ 0.225 │ 35  │ 301 │
│ 401902-S119_L001_R1_001.fastq.gz   │ 134605 │ 40408859 │ 300.2 │ 301 │ 301 │ 300 │ 0.212 │ 35  │ 301 │
│ 401903-S120_L001_R1_001.fastq.gz   │ 125445 │ 37659002 │ 300.2 │ 301 │ 301 │ 300 │ 0.209 │ 35  │ 301 │
│ 401904-S121_L001_R1_001.fastq.gz   │ 131128 │ 39239268 │ 299.2 │ 301 │ 301 │ 299 │ 0.228 │ 35  │ 301 │
│ 401905-S122_L001_R1_001.fastq.gz   │ 80284  │ 24079942 │ 299.9 │ 301 │ 301 │ 300 │ 0.356 │ 35  │ 301 │
│ 401906-S123_L001_R1_001.fastq.gz   │ 128938 │ 38727296 │ 300.4 │ 301 │ 301 │ 301 │ 0.212 │ 35  │ 301 │
│ 401999-S124_L001_R1_001.fastq.gz   │ 132643 │ 39690036 │ 299.2 │ 301 │ 301 │ 300 │ 0.227 │ 35  │ 301 │
│ 402202-S62_L001_R1_001.fastq.gz    │ 105353 │ 31660054 │ 300.5 │ 301 │ 301 │ 300 │ 0.245 │ 35  │ 301 │
│ 402203-S63_L001_R1_001.fastq.gz    │ 95134  │ 28606171 │ 300.7 │ 301 │ 301 │ 300 │ 0.167 │ 35  │ 301 │
│ 402204-S64_L001_R1_001.fastq.gz    │ 107403 │ 32295082 │ 300.7 │ 301 │ 301 │ 300 │ 0.116 │ 35  │ 301 │
│ 402205-S65_L001_R1_001.fastq.gz    │ 92203  │ 27729694 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 35  │ 301 │
│ 402206-S66_L001_R1_001.fastq.gz    │ 107835 │ 32431189 │ 300.7 │ 301 │ 301 │ 301 │ 0.145 │ 35  │ 301 │
│ 402602-S67_L001_R1_001.fastq.gz    │ 119751 │ 35964665 │ 300.3 │ 301 │ 301 │ 301 │ 0.232 │ 35  │ 301 │
│ 402603-S68_L001_R1_001.fastq.gz    │ 115001 │ 34480055 │ 299.8 │ 301 │ 301 │ 300 │ 0.255 │ 35  │ 301 │
│ 402604-S69_L001_R1_001.fastq.gz    │ 104178 │ 31215971 │ 299.6 │ 301 │ 301 │ 301 │ 0.280 │ 35  │ 301 │
│ 402605-S70_L001_R1_001.fastq.gz    │ 143556 │ 43097511 │ 300.2 │ 301 │ 301 │ 300 │ 0.200 │ 35  │ 301 │
│ 402606-S71_L001_R1_001.fastq.gz    │ 136853 │ 41150225 │ 300.7 │ 301 │ 301 │ 300 │ 0.055 │ 35  │ 301 │
│ 404302-S130_L001_R1_001.fastq.gz   │ 132182 │ 39637404 │ 299.9 │ 301 │ 301 │ 300 │ 0.219 │ 35  │ 301 │
│ 404303-S131_L001_R1_001.fastq.gz   │ 124872 │ 37514339 │ 300.4 │ 301 │ 301 │ 300 │ 0.185 │ 35  │ 301 │
│ 404304-S132_L001_R1_001.fastq.gz   │ 111924 │ 33600617 │ 300.2 │ 301 │ 301 │ 300 │ 0.247 │ 35  │ 301 │
│ 404305-S133_L001_R1_001.fastq.gz   │ 96481  │ 28591819 │ 296.3 │ 301 │ 300 │ 299 │ 0.316 │ 35  │ 301 │
│ 405202-S107_L001_R1_001.fastq.gz   │ 117201 │ 35011277 │ 298.7 │ 301 │ 300 │ 299 │ 0.256 │ 35  │ 301 │
│ 405203-S108_L001_R1_001.fastq.gz   │ 127049 │ 37630557 │ 296.2 │ 301 │ 301 │ 301 │ 0.241 │ 35  │ 301 │
│ 405204-S109_L001_R1_001.fastq.gz   │ 103407 │ 30171591 │ 291.8 │ 301 │ 300 │ 299 │ 0.302 │ 35  │ 301 │
│ 405205-S110_L001_R1_001.fastq.gz   │ 94724  │ 27638705 │ 291.8 │ 301 │ 300 │ 293 │ 0.329 │ 35  │ 301 │
│ 405206-S111_L001_R1_001.fastq.gz   │ 112195 │ 33399068 │ 297.7 │ 301 │ 300 │ 299 │ 0.266 │ 35  │ 301 │
│ 405302-S112_L001_R1_001.fastq.gz   │ 108441 │ 32607145 │ 300.7 │ 301 │ 301 │ 301 │ 0.179 │ 35  │ 301 │
│ 405303-S113_L001_R1_001.fastq.gz   │ 120154 │ 36060437 │ 300.1 │ 301 │ 300 │ 298 │ 0.207 │ 35  │ 301 │
│ 405304-S114_L001_R1_001.fastq.gz   │ 121694 │ 36538880 │ 300.3 │ 301 │ 300 │ 299 │ 0.180 │ 35  │ 301 │
│ 405305-S115_L001_R1_001.fastq.gz   │ 100204 │ 30032416 │ 299.7 │ 301 │ 300 │ 299 │ 0.282 │ 35  │ 301 │
│ 405306-S116_L001_R1_001.fastq.gz   │ 103412 │ 30959775 │ 299.4 │ 301 │ 300 │ 298 │ 0.250 │ 35  │ 301 │
│ 405602-S50_L001_R1_001.fastq.gz    │ 144151 │ 43324535 │ 300.5 │ 301 │ 301 │ 300 │ 0.117 │ 35  │ 301 │
│ 405603-S51_L001_R1_001.fastq.gz    │ 143343 │ 43079313 │ 300.5 │ 301 │ 301 │ 299 │ 0.064 │ 35  │ 301 │
│ 405604-S52_L001_R1_001.fastq.gz    │ 131212 │ 39413412 │ 300.4 │ 301 │ 301 │ 299 │ 0.158 │ 35  │ 301 │
│ 405605-S53_L001_R1_001.fastq.gz    │ 124667 │ 37467053 │ 300.5 │ 301 │ 301 │ 300 │ 0.122 │ 35  │ 301 │
│ 405606-S54_L001_R1_001.fastq.gz    │ 146036 │ 43867444 │ 300.4 │ 301 │ 300 │ 299 │ 0.072 │ 35  │ 301 │
│ 405699-S55_L001_R1_001.fastq.gz    │ 128499 │ 38596090 │ 300.4 │ 301 │ 301 │ 300 │ 0.206 │ 35  │ 301 │
│ 406002-S56_L001_R1_001.fastq.gz    │ 124471 │ 37393565 │ 300.4 │ 301 │ 301 │ 299 │ 0.105 │ 35  │ 301 │
│ 406003-S57_L001_R1_001.fastq.gz    │ 103812 │ 31184093 │ 300.4 │ 301 │ 301 │ 299 │ 0.141 │ 1   │ 301 │
│ 406004-S58_L001_R1_001.fastq.gz    │ 138285 │ 41542957 │ 300.4 │ 301 │ 301 │ 300 │ 0.183 │ 35  │ 301 │
│ 406005-S59_L001_R1_001.fastq.gz    │ 143149 │ 43041084 │ 300.7 │ 301 │ 301 │ 300 │ 0.081 │ 35  │ 301 │
│ 406006-S60_L001_R1_001.fastq.gz    │ 131859 │ 39561171 │ 300.0 │ 301 │ 300 │ 299 │ 0.213 │ 35  │ 301 │
│ 406502-S25_L001_R1_001.fastq.gz    │ 126658 │ 38064498 │ 300.5 │ 301 │ 301 │ 300 │ 0.076 │ 35  │ 301 │
│ 406503-S26_L001_R1_001.fastq.gz    │ 127714 │ 38200024 │ 299.1 │ 301 │ 301 │ 300 │ 0.236 │ 35  │ 301 │
│ 406504-S27_L001_R1_001.fastq.gz    │ 113363 │ 33949930 │ 299.5 │ 301 │ 301 │ 300 │ 0.262 │ 35  │ 301 │
│ 406505-S28_L001_R1_001.fastq.gz    │ 111854 │ 33626963 │ 300.6 │ 301 │ 301 │ 300 │ 0.074 │ 35  │ 301 │
│ 406506-S29_L001_R1_001.fastq.gz    │ 127794 │ 38409856 │ 300.6 │ 301 │ 301 │ 300 │ 0.090 │ 35  │ 301 │
│ 407002-S30_L001_R1_001.fastq.gz    │ 124120 │ 37203784 │ 299.7 │ 301 │ 301 │ 300 │ 0.236 │ 35  │ 301 │
│ 407003-S31_L001_R1_001.fastq.gz    │ 115886 │ 34664346 │ 299.1 │ 301 │ 301 │ 300 │ 0.255 │ 35  │ 301 │
│ 407004-S32_L001_R1_001.fastq.gz    │ 127180 │ 38136953 │ 299.9 │ 301 │ 301 │ 300 │ 0.229 │ 35  │ 301 │
│ 407005-S33_L001_R1_001.fastq.gz    │ 107374 │ 32201894 │ 299.9 │ 301 │ 301 │ 300 │ 0.265 │ 35  │ 301 │
│ 407006-S34_L001_R1_001.fastq.gz    │ 100642 │ 30124806 │ 299.3 │ 301 │ 300 │ 300 │ 0.288 │ 35  │ 301 │
│ 407099-S35_L001_R1_001.fastq.gz    │ 85509  │ 25620473 │ 299.6 │ 301 │ 301 │ 300 │ 0.337 │ 41  │ 301 │
│ NG23-S24_L001_R1_001.fastq.gz      │ 6534   │ 1923506  │ 294.4 │ 301 │ 301 │ 300 │ 0.347 │ 35  │ 301 │
│ NG24-S36_L001_R1_001.fastq.gz      │ 5067   │ 1485871  │ 293.2 │ 301 │ 301 │ 301 │ 0.293 │ 35  │ 301 │
│ NG27-S82_L001_R1_001.fastq.gz      │ 2522   │ 698269   │ 276.9 │ 301 │ 301 │ 301 │ 0.684 │ 35  │ 301 │
│ NG28-S106_L001_R1_001.fastq.gz     │ 2492   │ 571390   │ 229.3 │ 301 │ 301 │ 300 │ 0.931 │ 35  │ 301 │
│ NG30-S61_L001_R1_001.fastq.gz      │ 4866   │ 1387137  │ 285.1 │ 301 │ 301 │ 301 │ 0.501 │ 35  │ 301 │
│ NG32-S117_L001_R1_001.fastq.gz     │ 3554   │ 733993   │ 206.5 │ 301 │ 301 │ 299 │ 0.818 │ 35  │ 301 │
│ NG38-S140_L001_R1_001.fastq.gz     │ 2425   │ 482015   │ 198.8 │ 301 │ 301 │ 299 │ 1.057 │ 35  │ 301 │
│ PCRNEG-1-S37_L001_R1_001.fastq.gz  │ 6316   │ 1735958  │ 274.9 │ 301 │ 301 │ 301 │ 0.649 │ 35  │ 301 │
│ PCRNEG-2-S83_L001_R1_001.fastq.gz  │ 2122   │ 626119   │ 295.1 │ 301 │ 301 │ 301 │ 0.931 │ 35  │ 301 │
│ PCRNEG-3-S118_L001_R1_001.fastq.gz │ 3309   │ 832711   │ 251.7 │ 301 │ 301 │ 300 │ 0.857 │ 35  │ 301 │
│ PCRNEG-4-S142_L001_R1_001.fastq.gz │ 2639   │ 464063   │ 175.8 │ 301 │ 301 │ 297 │ 1.340 │ 35  │ 301 │
│ POS-1-S141_L001_R1_001.fastq.gz    │ 135504 │ 40585861 │ 299.5 │ 301 │ 301 │ 300 │ 0.163 │ 35  │ 301 │
│ QEB-1-S143_L001_R1_001.fastq.gz    │ 1535   │ 152569   │ 99.4  │ 301 │ 44  │ 35  │ 3.335 │ 35  │ 301 │
└────────────────────────────────────┴────────┴──────────┴───────┴─────┴─────┴─────┴───────┴─────┴─────┘

padbc · 2021-11-08T18:36:44Z

Sorry, but a somewhat related question: if the above cannot be solved, using R1-data only may be enough for my purposes. However, I could not find a combination of parameters that would allow me to do so. Is this possible? Thanks very much.

telatin · 2021-11-08T21:16:57Z

Thanks for your help reporting this, hope a fix will be out this week.

For single end mode, that was never implemented because The last time I saw a single end dataset that was a long time ago, so dadaist was born quite opinionated :) this is on the radar but will come later probably.

telatin · 2021-11-09T14:20:49Z

I just pushed an update that should fix your problem, which I suppose can be somehow filesystem related (by default the order of files should be the same for FOR and REV).
The new version 1.2.1 is now on github and should be available also via BioConda in ~ 24 hours; I would be most grateful if you could test it as I could not easily reproduce the problem!
Best
Andrea

padbc · 2021-11-09T16:08:27Z

Great! Thank you. Quick question: what installation method (other than miniconda) would you recommend for v.1.2.1? The one described in the "developmental snapshot"?

telatin · 2021-11-09T16:35:24Z

While not available through BioConda, the only possibility would be something like "dev snapshot".
Since you have a working environment, you can clone the repository somewhere and add it to you PATH temporarily, something like

# Start from a directory you can download the package
git clone git@github.com:quadram-institute-bioscience/dadaist2.git

# Activate your dadaist environment
source activate "dadaist-env-name"

# Add the current directory/dadaist2/bin to PATH
export PATH="$PWD"/dadaist2/bin/:"$PATH"

# Check if it worked
dadaist2 --version

padbc · 2021-11-09T18:36:22Z

Thank you -- the installation worked but I got the same error message(s). I will test if using dada2 outside of dadaist2 results in the same problem and get back to you.

telatin · 2021-11-09T21:11:34Z

Ewww, that's frustrating, sorry about this!
If you have some extra time for me, I pushed an update in the repo with extra checks. If you git pull inside the repository and try again (version should print 1.2.2 now).

If you can run in debug mode and send me the log, I might finally understand where the files are flipping

dadaist2 --debug {your parameters} 2>&1 | tee dadaist-debug.log

Thanks!

padbc · 2021-11-09T22:04:15Z

Please see attached -- thanks!

dadaist-debug.log

telatin · 2021-11-10T11:12:33Z

Hi @padbc, many thanks for sharing. I tried creating a dataset using your sample names (each with different reads number) but I have been unlucky in solving the issue.

What I can suggest, if you are so kind to keep helping me here, is:

To pull the latest update (1.2.2a) without bugfixes but with more verbose reporting (can always be run with --debug please) and see if this helps seeing more details on the trouble
The second thing is inspired by an issue in DADA2 with common suffixes which I would just attempt renaming the reads in two ways: replacing dashes and with progressive sample names

# This to be run in the place you have your reads: 
# in the logs appear as ./ so I used the same here
INPUT=./

# This will produce two output directory input_1 and input_2
mkdir -p input_{1,2}
C=0
seqfu metadata $INPUT | grep -v sample-id > metadata.tsv

set -euo pipefail
while read LINE;
do
   C=$((C+1))
   sample=$(echo "$LINE" | cut -f1)
   for=$(echo "$LINE" | cut -f 2)
   rev=$(echo "$LINE" | cut -f 3)
   echo -n "Copying $sample... "
   cp "$for" "input_1"/Sample${C}_R1.fastq.gz
   cp "$rev" "input_1"/Sample${C}_R2.fastq.gz
   cp "$for" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R1.fastq.gz
   cp "$rev" "input_2"/$(echo "$sample" | sed 's/-/x/g')_R2.fastq.gz
   echo Done
done < metadata.tsv 

# Check paired reads match
seqfu count input_1/*.gz >/dev/null && echo "OK: input_1"
seqfu count input_2/*.gz >/dev/null && echo "OK: input_2"

# If the following is not printed some program failed
echo "DONE: OK"

padbc · 2021-11-10T16:05:36Z

Thanks very much. I will try both solutions.

Before you posted this, as I had suspected, I solved the issue by running dada2 using the matchIDs=TRUE option of the filterAndTrim command. At first glance, the taxonomic classification results make sense.

telatin · 2021-11-10T16:19:12Z

Yes, I will implement that as well, but I also wanted to improve logging in the meanwhile and I'm really grateful for your patience in this issue!

padbc · 2021-11-10T17:53:03Z

Quick update. The last part of the output of (2) is the following:

ERROR: Counts in R1 and R2 files do not match for input_1/Sample115_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_1/Sample116_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_2/406003xS57_R1.fastq.gz
ERROR: Counts in R1 and R2 files do not match for input_2/406002xS56_R1.fastq.gz

We can therefore rule out naming convention, but these are pre-filtered files. Should we expect additional mismatch errors to be introduced after QCing?

telatin · 2021-11-10T18:04:06Z

Since by default QC is non-modifying the FASTQ files (it collects data to feed parameters to DADA2), I'm intrigued.
Are the files related 406003-S57 R1 and R2 containing the same number of reads? Maybe a seqfu count ./*.fastq.gz can help spot the problem, but also a direct read count of the two files might be useful.

padbc · 2021-11-10T19:15:08Z

Read counts:

./406003-S57_L001_R1_001.fastq.gz       103812  
./406003-S57_L001_R1_001.fastq.gz       111495

padbc · 2021-11-10T20:25:50Z

To add to the above: the difference in read numbers is not what's causing the issue, as removing those samples from the analysis throws the same error message.

telatin · 2021-11-11T09:38:56Z

Just to be sure I'm not losing track of info here:

In the read counts reported above:

./406003-S57_L001_R1_001.fastq.gz       103812  
./406003-S57_L001_R1_001.fastq.gz       111495

The two files have the very same name: is one of the two actually the R2?

The ERROR: Counts in R1 and R2 files do not match reported by the previous script can be explained either as:

a problem in the original files with different counts
a problem copying the original files in the new directories

Surely, if the only problem was in 406003-S57 this that not explain that the error raised by DADA2 occurred against multiple samples. Under this light, the naming can be still a culprit (because of shared suffixes), and running latest dadaist2 might help understanding when the problem arises, using input_1 and input_2 as input directories (but removing Sample115_* and 406003xS57_* respectively from those directories.

padbc · 2021-11-12T17:21:50Z

Thank you. Please see attached log of dadaist2 1.2.2a ran on input_1 files (minus Sample115* files):

dadaist-debug-input1.log

padbc · 2021-11-22T22:45:21Z

Could you confirm whether the the matchIDs=TRUE option of the filterAndTrim command has been incorporated into the last version of dadaist2? Thanks very much.

telatin · 2021-11-25T16:02:12Z

Hello @padbc
sorry for the late reply, last week we have been busier with a workshop.

matchIDs is not yet implemented, and does not look what we need in this context from the docs (https://rdrr.io/github/benjjneb/dada2/man/fastqPairedFilter.html).

From the log you kindly provided it looks like the vectors are in the correct order, like in this short example:

58,/tmp/dadaist2_2a7XbW/for/Sample22_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample22_R2.fastq.gz 
59,/tmp/dadaist2_2a7XbW/for/Sample23_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample23_R2.fastq.gz 
60,/tmp/dadaist2_2a7XbW/for/Sample24_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample24_R2.fastq.gz 
61,/tmp/dadaist2_2a7XbW/for/Sample25_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample25_R2.fastq.gz 
62,/tmp/dadaist2_2a7XbW/for/Sample26_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample26_R2.fastq.gz 
63,/tmp/dadaist2_2a7XbW/for/Sample27_R1.fastq.gz,/tmp/dadaist2_2a7XbW/rev/Sample27_R2.fastq.gz

so I'm still struggling finding the cause of your issue. If you can create a minimal directory with a short selection of samples that cause the problem to you and wish to send it via mail/ link I'm happy to inspect further. Meanwhile I'll try to add some extra checks trying to go closer to the problem.

telatin · 2021-11-26T12:51:00Z

I pushed in the repo 1.2.3 that will count the reads in the input folder and in the filtered folder. Note that this extra information is only available with --debug and in the output log dadaist.log, not in the STDERR printed to the screen.

It would be interesting to see if the problem persists also after removing from the input:

Sample33
Sample134
Sample136
Sample137
Sample140
Sample141
Sample143

padbc · 2021-11-26T16:22:26Z

Great -- thank you. I will try to take a look at this later today.

padbc · 2021-12-08T19:42:06Z

Sorry for the delay here. The errors persists after removing those samples; see attached log.
dadaist.log

telatin · 2021-12-08T19:54:51Z

Ah, thanks a million.
So, before running Dada2 now the log prints the number of sequences from the location they are temporary copied, and here there is one sample with a discrepancy before getting into DADA2.

Sample116_R1.fastq.gz	103812
Sample116_R2.fastq.gz	111495

Now, this does not make sense as you previosly checked the input directory with seqfu counts $DIR/*gz and I believe no errors where found, but you might try again now and see if there is indeed a problem in the input files.

If not, I cannot figure out why some somples degrades while copying in the temporary directory. Is it a peculiar filesystem maybe?
If you wish, I can try on a differen server, contact me via email at andrea.telatin 🐌 quadram.ac.uk

padbc · 2021-12-08T22:41:18Z

Thanks so much for the quick reply. So yes, deleting Sample116* appears to have "solved" the problem, i.e., dadaist2 produces the expected output. Like you, I find this perplexing.

telatin · 2021-12-09T09:02:41Z

I can only suggest to check with seqfu count inputdir/*gz the reads before starting, but will implement some extra (optional) checks

telatin added the bug Something isn't working label Nov 5, 2021

telatin added user case and removed bug Something isn't working labels Dec 10, 2021

telatin closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatched forward and reverse sequence files #15

Mismatched forward and reverse sequence files #15

padbc commented Nov 4, 2021

telatin commented Nov 5, 2021

padbc commented Nov 5, 2021

telatin commented Nov 8, 2021

padbc commented Nov 8, 2021 •

edited by telatin

Loading

padbc commented Nov 8, 2021

telatin commented Nov 8, 2021

telatin commented Nov 9, 2021

padbc commented Nov 9, 2021

telatin commented Nov 9, 2021

padbc commented Nov 9, 2021

telatin commented Nov 9, 2021

padbc commented Nov 9, 2021

telatin commented Nov 10, 2021 •

edited

Loading

padbc commented Nov 10, 2021

telatin commented Nov 10, 2021

padbc commented Nov 10, 2021

telatin commented Nov 10, 2021

padbc commented Nov 10, 2021

padbc commented Nov 10, 2021

telatin commented Nov 11, 2021

padbc commented Nov 12, 2021

padbc commented Nov 22, 2021

telatin commented Nov 25, 2021

telatin commented Nov 26, 2021

padbc commented Nov 26, 2021

padbc commented Dec 8, 2021

telatin commented Dec 8, 2021

padbc commented Dec 8, 2021

telatin commented Dec 9, 2021

Mismatched forward and reverse sequence files #15

Mismatched forward and reverse sequence files #15

Comments

padbc commented Nov 4, 2021

telatin commented Nov 5, 2021

padbc commented Nov 5, 2021

telatin commented Nov 8, 2021

padbc commented Nov 8, 2021 • edited by telatin Loading

padbc commented Nov 8, 2021

telatin commented Nov 8, 2021

telatin commented Nov 9, 2021

padbc commented Nov 9, 2021

telatin commented Nov 9, 2021

padbc commented Nov 9, 2021

telatin commented Nov 9, 2021

padbc commented Nov 9, 2021

telatin commented Nov 10, 2021 • edited Loading

padbc commented Nov 10, 2021

telatin commented Nov 10, 2021

padbc commented Nov 10, 2021

telatin commented Nov 10, 2021

padbc commented Nov 10, 2021

padbc commented Nov 10, 2021

telatin commented Nov 11, 2021

padbc commented Nov 12, 2021

padbc commented Nov 22, 2021

telatin commented Nov 25, 2021

telatin commented Nov 26, 2021

padbc commented Nov 26, 2021

padbc commented Dec 8, 2021

telatin commented Dec 8, 2021

padbc commented Dec 8, 2021

telatin commented Dec 9, 2021

padbc commented Nov 8, 2021 •

edited by telatin

Loading

telatin commented Nov 10, 2021 •

edited

Loading