DECIPHER: Alignment larger than the maximum allowable size #54

Charmy0619 · 2025-01-13T22:25:07Z

Hi,

I am running pacbio data nad I have trouble in the step of AlignReadsDECIPHER.
Here is the error message:

Error executing process > 'AlignReadsDECIPHER (AlignReadsDECIPHER:R1)'

Caused by:
  Process `AlignReadsDECIPHER (AlignReadsDECIPHER:R1)` terminated with an error exit status (1)

Command executed [/home/qj5/ped_xtan25_chi_link/qj5/test16S_TADA/src/TADA/templates/AlignReadsDECIPHER.R]:

  #!/usr/bin/env Rscript
  .libPaths(c("/mmfs1/home/qj5/R/x86_64-pc-linux-gnu-library/4.4", .libPaths()))
  suppressPackageStartupMessages(library(dada2))
  suppressPackageStartupMessages(library(DECIPHER))
  
  seqs <- readDNAStringSet("asvs.md5.nochim.R1.fna")
  alignment <- AlignSeqs(seqs,
             anchor=NA,
             processors = 64)
  writeXStringSet(alignment, "aligned_seqs.R1.fasta")

Command exit status:
  1

Command output:
  Determining distance matrix based on shared 9-mers:
  ================================================================================
  
  Time difference of 2.04 secs
  
  Clustering into groups by similarity:
  ================================================================================
  
  Time difference of 0.37 secs
  
  Aligning Sequences:
  ================================================================================
  
  Time difference of 949.63 secs
  
  Iteration 1 of 2:
  
  Determining distance matrix based on alignment:
  ================================================================================
  
  Time difference of 1.3 secs
  
  Reclustering into groups by similarity:
  ================================================================================
  
  Time difference of 0.25 secs
  
  Realigning Sequences:
  ===============================================================================

Command error:
  Error in f(p.profile, s.profile) : 
    Alignment larger (6,205,073,628) than the maximum allowable size (2,147,483,647).
  Calls: AlignSeqs -> .align -> do.call -> do.call ->  -> f
  Execution halted

Work dir:
  /mmfs1/projects/ped_xtan25_chi/qj5/16S_pacbio_2024_dam/src/work/15/04113d39009483d31a03e09bfe1530

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

I try to check the "asvs.md5.nochim.R1.fna" files:

seqs <- readDNAStringSet("asvs.md5.nochim.R1.fna")
> # Number of sequences
num_sequences <- length(seqs)
print(num_sequences)
[1] 814

> summary(sequence_lengths)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   1000    1166    1374    1355    1486    1796
> # Calculate total base pairs (bp) in all sequences
total_bp <- sum(width(seqs))
print(total_bp)
[1] 1102620

I also try to increase the memory to 240G but it's still not working.

Can anyone help?
Thanks.

The text was updated successfully, but these errors were encountered:

cjfields · 2025-01-14T04:32:35Z

@Charmy0619 this is an unusual one. Are you running this on UIUC resources or somewhere else?

Charmy0619 · 2025-01-16T19:25:21Z

@Charmy0619 this is an unusual one. Are you running this on UIUC resources or somewhere else?

Thank you for your comment. It's unusual and it did not happen to me before. At this time, I did not run it in UIUC Biocluster HPC. I set up this pipeline in the UIC lakeshore HPC.

I consulted Erik, the developer of DECIPHER. He mentioned, "This error occurs when the alignment dramatically expands in width during alignment. This typically indicates there are non-homologous sequences in the input, which should not be aligned." After that, I played with the iteration and refinement based on his suggestion. I can go through the pipeline without iteration.

I am wondering if this will cause some problems for the results. Maybe just related to the tree if I am correct. As you know, I previously ran rumen fluid data, and we don't have a problem. However, this dataset is currently from mice and may be contaminated with the mitochondrial sequence.

cjfields · 2025-01-22T04:36:05Z

@Charmy0619 one possibility is to skip the alignment + phylogenetic tree step, particularly if you are concerned there are contaminants present. In the main branch this can be done by setting runTree to either false or '' (empty string). In the DSL2 work on dev this will be much simpler, but that code isn't ready for use at this time.

Saying that, normally I haven't found mitochondrial or chloroplast 16S rRNA to be an issue, but other contaminants (off-target sequences for example) can certainly be a problem.

cjfields · 2025-01-27T01:06:36Z

@Charmy0619 as a quick follow up: we don't currently pre-screen sequences prior to DECIPHER, though this step has been proposed as a new feature (see #60 for tracking this). It will take a little time to implement this, but you could essentially emulate this by skipping alignment + tree but allowing taxonomic assignment. Screen out any ASVs that have no assignment, then perform either DECIPHER or another MSA tool (e.g., muscle5), then use fasttree to generate a ML-based tree. Happy to walk you through these steps, just email me.

Charmy0619 changed the title ~~Alignment larger than the maximum allowable size~~ DECIPHER: Alignment larger than the maximum allowable size Jan 13, 2025

cjfields added the DSL2 Prioritize for DSL2 implementation label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DECIPHER: Alignment larger than the maximum allowable size #54

DECIPHER: Alignment larger than the maximum allowable size #54

Charmy0619 commented Jan 13, 2025 •

edited

Loading

cjfields commented Jan 14, 2025

Charmy0619 commented Jan 16, 2025

cjfields commented Jan 22, 2025

cjfields commented Jan 27, 2025

DECIPHER: Alignment larger than the maximum allowable size #54

DECIPHER: Alignment larger than the maximum allowable size #54

Comments

Charmy0619 commented Jan 13, 2025 • edited Loading

cjfields commented Jan 14, 2025

Charmy0619 commented Jan 16, 2025

cjfields commented Jan 22, 2025

cjfields commented Jan 27, 2025

Charmy0619 commented Jan 13, 2025 •

edited

Loading