Skip to content
RAPT-release edited this page Oct 3, 2024 · 16 revisions

As of December 2024, NCBI's pilot tool, Read assembly and Annotation Pipeline (RAPT) tool will no longer be available. We encourage you to check out NCBI’s suite of assembly and annotation tools including the genome assembler SKESA, the taxonomic assignment tool ANI, and the prokaryotic genome annotation pipeline (PGAP). Learn more...


Can I use reads that are not produced by an Illumina sequencing machine?

At this time RAPT only supports reads produced on the Illumina sequencing platform. Reads can be provided to RAPT as fasta or fastq files or as SRA run accessions (starting with the SRR, DRR or SRR prefix).

Can I assemble and annotate a metagenomic sample?

No. RAPT is only designed to work on data sequenced from bacterial or archaeal isolates.

I do not wish to run SKESA. Can I use a different read assembler?

At this time, RAPT only supports SKESA. If you wish to annotate an already assembled genome, please use PGAP.

What environments are supported with RAPT?

At the moment, there are two variations of RAPT; Google Cloud Platform (GCP) RAPT and Standalone RAPT. Please see the respective documentation pages for pre-requisites and instructions: GCP RAPT, Standalone RAPT.

What information is reported to NCBI?

For each run of the pipeline, multiple reports will be generated. One at the beginning, and one at the end of each phase of RAPT. These reports help us measure our impact on the community, which in turns helps us get funds, so please report your usage. For more information see the NCBI privacy policy. What we collect will look like this:

1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T12:24:44 rapt_start 
1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T12:25:55 skesa_success
1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T12:54:22 ani_start 
1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T13:24:88 ani_success
1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T13:25:44 pgap_start
1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T19:46:11 pgap_success
1 34.86.175.158 8bd35abb-8a04-4984-9d88-59c349824819 2020-07-10T19:46:11 rapt_exit

How do I turn off the NCBI reporting feature?

Although we recommend always reporting information back to NCBI because this helps us build a better product by understanding usage and errors, you can disable this by adding the following --no-usage-reporting to the run_rapt_gcp.sh or the run_rapt.py job submission command.

The taxonomy check step indicates that the organism for my input data is misassigned. What does it mean, and what should I do?

The taxonomy check done within RAPT with the Average Nucleotide Identity tool compares the set of contigs assembled by RAPT to type strain assemblies available in GenBank. A misassignment indicates that the short read sequences passed to RAPT on input come from a different organism than the one provided. If you agree with the ANI assessment, and wish to use the ANI-chosen scientific name in the downstream steps, re-run RAPT with the flag --auto-correct-tax. This will guarantee the best annotation quality possible.

I am not confident in the taxonomic classification of the organism I sequenced, so the scientific name I can provide is only a guess. Is it acceptable?

Yes! The taxonomy check done within RAPT with ANI can assign a scientific name to your assembly based on its best matching assembly in GenBank that is of well-defined origin. If you run RAPT with the flag --auto-correct-tax, the scientific name determined by ANI will override the scientific name you provide on input, resulting in a more accurate annotation. The scientific name in the final results will be the ANI-chosen name.

Can I make sure RAPT stops if the taxonomy check indicates my sample may be misassigned or contaminated?

Yes. Add the flag --stop-on-errors to the run_rapt_gcp.sh or the run_rapt.py job submission command and RAPT will stop if the taxonomy check indicates the species assigned to the reads is incorrect or if the read set is contaminated.

What is the cost of running RAPT?

The cost of running RAPT increases roughly linearly with the size of the genome assembled from the read set provided on input. Below are examples of inputs and their runtimes on GCP n1-highmem-8 (8-CPU) machines. For reference, dollar cost of renting such machines can be derived from the current Google virtual machine cost structure.

SRA run Species Size of genome produced (Mb) Runtime (min)
SRR11101319 Campylobacter jejuni 1.9 58
SRR11147196 Listeria monocytogenes 3 113
SRR4457405 Clostridium perfringens 3.7 97
ERR4436589 Acinetobacter baumanii 4 104
SRR12431019 Salmonella enterica 4.8 109
ERR2116816 Enterobacter cloacae 4.9 124
ERR4338267 Salmonella enterica 5.1 127
SRR6048050 Pseudomonas aeruginosa 6.5 171
SRR11046561 Klebsiella oxytoca 6.6 173
ERR1974692 Pseudomonas aeruginosa 7.3 193

I have further questions about PGAP. Where can I find more information?

See the PGAP FAQs

Who do I contact for help or feedback?

Please open an issue issue, after checking that your question was not addressed in previously opened issues.

Stand-alone RAPT Specific FAQs

Why does my run occasionally not finish, producing no logs or message in terminal, and yet the pipeline still seem to be running?

You are most likely running the pipeline on a remote machine over ssh, and the connection has been interrupted. Use the nohup utility, or a terminal multiplexer, such as tmux or screen when working on a remote machine, to allow run_rapt.py to continue in case the ssh connection is interrupted.

GCP RAPT Specific FAQs

The run is marked "Failed", and the results tar.gz file in the storage bucket is empty

One possible reason is failure to connect to SRA. Such failures are reported in the log file run.log, with the line SRA connection check failed with code 1, abort.., and are typically transient. Please retry.

What is the default options for --machine-type TYPE, --boot-disk-size NUM, and --timeout SECONDS, why would I change them?

--machine-type TYPE
  Default is "n1-highmem-8" (refer to google cloud documentation), which is suitable for most jobs.  The larger the machine, the faster the job will be. 
  There is a point of diminishing returns which will vary per user and their cost/time preferences.

--boot-disk-size NUM
  Optional. Set the size (in Gb) of boot disk for the virtual machine. Default size is 128.  The larger the boot disk, the faster the job will be. 
  There is a point of diminishing returns which will vary per user and their cost/time preferences.

--timeout SECONDS
  Optional. Set the timeout (seconds) for the job. Default is 86400s (24 hours). If you have a job that does not complete in this time, 
  you can increase the timeout and/or increase your machine type.

My run is marked 'Failed', and the message in the log is "Execution failed: selecting resources: selecting region and zone: no available zones: us-central1: CPUS quota too low"

This message is caused by insufficient compute quota available to your GCP project in the us-central1 zone for RAPT to execute with the default machine: n1-highmem-16. This commonly occur when using a "free GCP" account. The first step is to view your quotas. On the line “Compute Engine API – CPUs”, select “All quotas” and find the region(s) where non-zero quota is available. Use the --regions parameter to specify a region where you have an alloted quota. You must also select a machine size that is equal or lower than your quota limit, using the --machine-type parameter to specify a lower machine size.
Please note: if you use an instance rather than command shell, the instance is counted as a machine.

Can RAPT run faster?

Yes, by using larger machines. By default, RAPT runs on n1-highmem-8 machines. You can run RAPT on a larger machine by adjusting the --machine-type parameter. In our hands, the runtime decreaes by 30% on average when switching from n1-highmem-8 machines to n1-highmem-16 but the cost increases by about 40%, based on the current Google virtual machine cost structure.

Can I save money by running RAPT on smaller machines?

By default, RAPT runs on n1-highmem-8 machines. We do not recommend running RAPT on smaller machines.

How do I run on a GCP instance or virtual machine?

Follow the set-up instruction for running in a Cloud Shell.

  • On the GCP screen from the last step, click "Compute Engine" or navigate to the "Compute Engine" section by clicking on the navigation menu with the "hamburger icon" (three horizontal lines) on the top left corner.

    ]

  • Click on the blue "CREATE INSTANCE" button on the top bar.

  • Create an image with the default parameters. Give your instance a name for tracking and enable access to all Cloud APIs. Plus look at the expense for record keeping.

  • Click the blue "Create" button. This will create and start the VM.

  • SSH into your instance.

Clone this wiki locally