dorado
can make use of a MinKNOW-compatible sample sheet containing data used to identify a particular classification of read. To apply a sample sheet, provide the path to the appropriate CSV file using the --sample-sheet
argument:
$ dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.2.0 reads/ \
--kit-name SQK-16S114-24 \
--sample-sheet <path_to_sample_sheet_csv> \
> calls.bam
A sample sheet can also be applied to the demux
command in the same way:
$ dorado demux calls.bam \
--output-dir classified_reads
--kit-name SQK-16S114-24 \
--sample-sheet <path_to_sample_sheet_csv>
Note that dorado
currently uses the sample sheet only for barcode filtering and aliasing, so a --kit-name
argument is required.
In the case of demux
, the sample sheet must contain a 1-to-1 mapping of barcode
identifiers to flow_cell_id
/position_id
- i.e. all entries in the barcode
column must be unique.
A sample sheet may only contain the column names below:
Standard | experiment_id |
Required* |
kit |
Required | |
flow_cell_id |
Optional if position_id is set |
|
position_id |
Optional if flow_cell_id is set |
|
protocol_run_id |
Optional | |
sample_id |
Optional* | |
flow_cell_product_code |
Optional | |
Barcoding | alias |
Optional* |
type |
Optional | |
barcode |
Optional |
* These fields must be a maximum of 40 characters, which must be either alphanumeric (A-Z
, a-z
, 0-9
), _
or -
.
At a minimum a sample sheet must contain kit
, experiment_id
and one of position_id
or flow_cell_id
. All rows in a sample sheet must contain the same experiment_id
.
For a full description of the format of the sample sheet, see the MinKNOW Sample Sheet documentation.
Note that dorado
does not currently support dual barcodes.
If a sample sheet is present and barcoding is requested, dorado
will only attempt to find matches to the barcode identifiers listed in the barcode
column (if present).
If a sample sheet contains an alias
column, this will be used to replace the barcode
identifer for reads matching the flow_cell_id
/position_id
and experiment_id
. This will be reflected in the read group ID @RG ID
in the file header, and in the BC
and RG
tags of the classified reads. Values in the alias
column must not be valid barcode identifiers (e.g. barcode##
or unclassified
).
Note that if both flow_cell_id
and position_id
are present, both must match the read data for an alias to be applied.