Skip to content

Commit

Permalink
Merge pull request #2 from bentsherman/patch-1
Browse files Browse the repository at this point in the history
Update docs.md
  • Loading branch information
pditommaso authored Aug 3, 2023
2 parents ada37fe + 8e4ac80 commit 8c11578
Showing 1 changed file with 41 additions and 37 deletions.
78 changes: 41 additions & 37 deletions docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,62 +2,66 @@

## Pre-requisites

Make sure to have configured the XPACK license and the plugin as described
in the [README](README.md#configuration) document.
Make sure to configure the XPACK license and the plugin as described
in the [README](README.md#configuration).

## AWS Batch pro executor
## AWS Batch Pro executor

The `xpack-amzn` plugin implements an advanced version of AWS Batch executor
for Nextflow that allows using a shared file system in place of AWS S3 bucket
as the pipeline work directory to ingest pipeline data.
The `xpack-amzn` plugin implements an advanced version of the AWS Batch executor
for Nextflow, which supports the use of a shared file system instead of an S3 bucket
as the pipeline work directory.

## Use of AWS EFS file system with Nextflow
## Using Amazon EFS

AWS [EFS](https://aws.amazon.com/efs/) is a shared file-system based on the
[Amazon EFS](https://aws.amazon.com/efs/) is a shared file-system based on the
NFS protocol provided by AWS.

To learn about EFS details and how to create EFS instance check the AWS documentation
at [this link](https://docs.aws.amazon.com/efs/latest/ug/creating-using-create-fs.html).
To learn more about EFS and how to create an EFS instance, check out the [AWS documentation](https://docs.aws.amazon.com/efs/latest/ug/creating-using-create-fs.html).

Once you have created one or more file systems, to make accessible in your
pipeline execution add the `efsVolumes` declaration in your configuration
file as shown below:
Once you have created your desired EFS instances, you can make them accessible to your
pipeline with the `efsVolumes` option in your Nextflow configuration:

```groovy
aws.batch.efsVolumes.'efs-1234567890'.mountPath = '/mnt/efs'
```
aws.batch.efsVolumes.'efs-1234567890'.mountPath = '/mnt/fsx'

In the above snippet, replace `efs-1234567890` with the ID of your EFS instance and
the path `/mnt/efs` with one of your choice. Repeat this configuration for each
EFS instance that you want to use in your pipeline.

You can then use an EFS instance as the work directory based on its mount path:

```groovy
workDir = '/mnt/efs'
```

In the above snippet replace `efs-1234567890` with the ID of your EFS instance and
the path `/mnt/fsx` with one of your choice.
Available options:

Repeat the above configuratio for all file system instance you want to configure
in your pipeline.
`aws.batch.efsVolumes.'<ID>'.mountPath`
: The host path to which the file system should be made available (default: none)

`aws.batch.efsVolumes.'<ID>'.rootPath`
: The file system directory that should be made available through the mount point (default: `/`)

| Config option | Description |
|--- |--- |
| `aws.batch.efsVolumes.'<ID>'.mountPath` | The host path to which the file system should be made available (default: none )
| `aws.batch.efsVolumes.'<ID>'.rootPath` | The file system directory that should be made available throught the mount point (optional, default: `/`)
| `aws.batch.efsVolumes.'<ID>'.readOnly` | When `true` only allows the read of files (optional, default: `false`)
`aws.batch.efsVolumes.'<ID>'.readOnly`
: When `true` mounts the file system as read-only (default: `false`)

Note: Replace the `<ID>` placeholder in the above table with your EFS file system identifier.
*Note: Replace `<ID>` with your EFS instance ID.*

## Use of a POSIX-based shared file-system
## Using a POSIX-based shared file-system

Using `xpack-amzn` plugin you can use any POSIX-based shared file-system, along with
AWS Batch such as [AWS FSx](https://aws.amazon.com/fsx/), [Qumolo](https://qumulo.com/), [Weka](https://www.weka.io/), etc.
With the `xpack-amzn` plugin, you can use any POSIX-based shared file-system with
AWS Batch, such as [Amazon FSx](https://aws.amazon.com/fsx/), [Qumolo](https://qumulo.com/), [Weka](https://www.weka.io/), etc.

The configuration of such file systems is out of the scope of this guide and it's
expected to be managed by the user.
The provisioning and management of such file systems are your responsibility. Typically,
these file systems must be mounted in the launch template used to execute tasks.

To use this file system as work directory in your Nextflow pipeline execution,
you will only need to specify a shared directory path in the Nextflow command-line
with the `-w` option, e.g.
As with EFS, you can then use the shared file system as the work directory based on its mount path:

```
nextflow run <MY-PIPELINE> -w /mnt/some/shared/dir
```groovy
workDir = '/mnt/fsx'
```

To access shared file paths, other than the pipeline work directory, it is required to
declare such file or directory paths using the Nextflow `volumes` configuration
option as described as [this link](https://www.nextflow.io/docs/latest/awscloud.html#volume-mounts).
You can also mount additional shared file systems into the task containers with the `aws.batch.volumes`
config option. See the [Nextflow documentation](https://nextflow.io/docs/latest/aws.html#volume-mounts)
for more details.

0 comments on commit 8c11578

Please sign in to comment.