You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To support running ingest on Terra, we need to support downloading existing Nextclade alignments and metadata from remote storage other than AWS's S3.
Description
We need to support the definition of a "source" bucket in the workflow configuration YAML associated with at least S3 or GS URIs. This means changing the name of the configuration variable from s3_src to a more generic name and modifying all of the logic in the workflow that refers specifically to downloading from S3 (e.g., "download_from_s3" script, etc.).
This should not be necessary, as the nextstrain/ncov-ingest image is based on the nextstrain/base image. I believe the only reason the GCS Python bindings aren't available in the latest nextstrain/ncov-ingest image is that it predates (~3 Feb) the addition of the bindings to the nextstrain/base image (~11 Feb). I triggered an image update which is running now, and that should take care of bringing in the GCS bindings.
Context
To support running ingest on Terra, we need to support downloading existing Nextclade alignments and metadata from remote storage other than AWS's S3.
Description
We need to support the definition of a "source" bucket in the workflow configuration YAML associated with at least S3 or GS URIs. This means changing the name of the configuration variable from
s3_src
to a more generic name and modifying all of the logic in the workflow that refers specifically to downloading from S3 (e.g., "download_from_s3" script, etc.).As part of this work, we will also need to modify the Pipfile used to populate the ncov-ingest Docker image by adding the Python bindings for Google Cloud Storage. See the Dockerfile for the docker-base image.See @tsibley's comments below.Examples
The modified ingest should continue to work with our production S3 buckets, but it should also work from GS buckets accessed through Terra.
See code in ncov for handling remote files.
The text was updated successfully, but these errors were encountered: