Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] provide minimum sourmash version information about sketch fromfile #2009

Merged
merged 2 commits into from
Apr 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ The `sketch translate` command reads in **DNA sequences**, translates them in al
The `sketch fromfile` command takes in a CSV file containing the
locations of genomes and proteomes, and outputs all of the requested
sketches. It is primarily intended for large-scale database construction.
(`fromfile` is a new command as of sourmash v4.4.0.)

All of the `sourmash sketch` commands take FASTA or FASTQ sequences as
input; input data can be uncompressed, compressed with gzip, or
Expand Down
9 changes: 9 additions & 0 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,15 @@ support
developer
```


```{toctree}
:hidden:
README.md
legacy-databases.md
plotting-compare.ipynb
sourmash-sketch.md
```

# Indices and tables

* {ref}`genindex`
Expand Down
5 changes: 4 additions & 1 deletion doc/sourmash-sketch.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ The `sketch translate` command reads in **DNA sequences**, translates them in al
The `sketch fromfile` command takes in a CSV file containing the
locations of genomes and proteomes, and outputs all of the requested
sketches. It is primarily intended for large-scale database construction.
(`fromfile` is a new command as of sourmash v4.4.0.)

All `sourmash sketch` commands take FASTA or FASTQ sequences as input;
input data can be uncompressed, compressed with gzip, or compressed
Expand Down Expand Up @@ -88,6 +89,8 @@ use `sketch protein` to build signatures.

### Bulk sketch construction from many files

(This was added as of sourmash v4.4.0.)

The `sourmash sketch fromfile` command is intended for use when
building many signatures as part of a larger workflow. It supports a
variety of options to build new signatures, parallelize
Expand All @@ -108,7 +111,7 @@ can be empty for a given row; likewise, if no DNA sketches are requested,
`genome_filename` can be empty for a given row.

Some of the key command-line options supported by `fromfile` are:
* `-o/--output-signatures` will save generated signatures to any of the [standard supported output formats](command-line.md#saving-signatures-more-generally).
* `-o/--output-signatures` will save generated signatures to any of the [standard supported output formats](command-line.md#choosing-signature-output-formats).
* `-o/--output-csv-info` will save a CSV file of input filenames and parameter strings for use with the `sourmash sketch` command line; this can be used to construct signatures in parallel.
* `--already-done` will take a list of existing signatures/databases to check against; signatures with matching names and parameter strings will not be rebuilt.
* `--output-manifest-matching` will output a manifest of already-existing signatures, which can then be used with `sourmash sig cat` to collate signatures across databases; see [using manifests](command-line.md#using-manifests-to-explicitly-refer-to-collections-of-files). (This provides [`sourmash sig check` functionality](command-line.md#sourmash-signature-check---compare-picklists-and-manifests) in `sketch fromfile`.)
Expand Down