diff --git a/.readthedocs.yml b/.readthedocs.yml index e58229537a..0c7ae9f116 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -12,7 +12,7 @@ conda: environment: doc/environment.yml python: - version: 3.7 + version: 3.8 install: - method: pip path: . diff --git a/doc/classifying-signatures.md b/doc/classifying-signatures.md index c40be86adf..11cc2c1bbd 100644 --- a/doc/classifying-signatures.md +++ b/doc/classifying-signatures.md @@ -57,7 +57,7 @@ genomes based on greedy partitioning. Essentially, it takes a query metagenome and searches the database for the most highly contained genome; it then subtracts that match from the metagenome, and repeats. At the end it reports how much of the metagenome remains unknown. The -[basic sourmash tutorial](tutorial-basic.md#what-s-in-my-metagenome) +[basic sourmash tutorial](tutorial-basic.md#whats-in-my-metagenome) has some sample output from using gather with GenBank. See Appendix A at the bottom of this page for more technical details. diff --git a/doc/conf.py b/doc/conf.py index e98aa84031..bb3f9e4f19 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -37,7 +37,7 @@ 'sphinx.ext.doctest', 'sphinx.ext.coverage', 'sphinx.ext.viewcode', - 'sphinxcontrib.napoleon', + 'sphinx.ext.napoleon', 'nbsphinx', 'IPython.sphinxext.ipython_console_highlighting', 'myst_parser' @@ -302,3 +302,4 @@ #texinfo_no_detailmenu = False autodoc_mock_imports = ["sourmash.minhash"] +myst_heading_anchors = 3 diff --git a/doc/index.md b/doc/index.md index 75d83e2f55..9d532088b1 100644 --- a/doc/index.md +++ b/doc/index.md @@ -28,7 +28,7 @@ background information on how and why MinHash works. **Want to migrate to sourmash v4?** sourmash v4 is now available, and has a number of incompatibilites with v2 and v3. Please see -[our migration guide](support.md#migrating-from-sourmash-v3-x-to-sourmash-v4-x)! +[our migration guide](support.md#migrating-from-sourmash-v3x-to-sourmash-v4x)! ---- diff --git a/doc/release-notes/sourmash-2.0.md b/doc/release-notes/sourmash-2.0.md index 8c85bda3ba..f3fd868661 100644 --- a/doc/release-notes/sourmash-2.0.md +++ b/doc/release-notes/sourmash-2.0.md @@ -19,11 +19,11 @@ in the This is a list of substantial new features and functionality in sourmash 2.0. * Added Sequence Bloom Tree search to enable similarity and containment queries on very large collections of signatures in low memory; see `sourmash index`, `sourmash search`, and `sourmash gather` in [the command line documentation](../command-line.md). -* Added "LCA databases" for fast searching of large databases in not-so-low memory; see [`sourmash lca index` in command-line docs](../command-line.md#sourmash-lca-subcommands-for-taxonomic-classification). +* Added "LCA databases" for fast searching of large databases in not-so-low memory; see [`sourmash lca index` in command-line docs](../command-line.md#sourmash-lca-subcommands-for-in-memory-taxonomy-integration). * Created [precomputed databases](../databases.md) for most of GenBank genomes. -* Added taxonomic reporting functionality in the `sourmash lca` submodule - [see command-line docs](../command-line.md#sourmash-lca-subcommands-for-taxonomic-classification). +* Added taxonomic reporting functionality in the `sourmash lca` submodule - [see command-line docs](../command-line.md#sourmash-lca-subcommands-for-in-memory-taxonomy-integration). * Added signature manipulation utilities in the `sourmash signature` submodule - [see command-line docs](../command-line.md#sourmash-signature-subcommands-for-signature-manipulation) -* Introduced new modulo hash or "scaled" signatures for containment analysis; see [Using sourmash: a practical guide](../using-sourmash-a-guide.md#what-resolution-should-my-signatures-be-how-should-i-create-them) and [more details in the Python API examples](../api-example.md#advanced-features-of-sourmash-minhash-objects-scaled-and-num). +* Introduced new modulo hash or "scaled" signatures for containment analysis; see [Using sourmash: a practical guide](../using-sourmash-a-guide.md#what-resolution-should-my-signatures-be--how-should-i-create-them) and [more details in the Python API examples](../api-example.md#advanced-features-of-sourmash-minhash-objects---scaled-and-num). * Switched to using JSON instead of YAML for signatures. * Many performance optimizations! * Many more tests! diff --git a/doc/release-notes/sourmash-4.0.md b/doc/release-notes/sourmash-4.0.md index 2b5a780266..681233ad16 100644 --- a/doc/release-notes/sourmash-4.0.md +++ b/doc/release-notes/sourmash-4.0.md @@ -9,7 +9,7 @@ contains many feature improvements and new functionality, as well as many breaking changes with sourmash 2.x and 3.x. Please see -[our migration guide](../support.md#migrating-from-sourmash-v3-x-to-sourmash-v4-x) +[our migration guide](../support.md#migrating-from-sourmash-v3x-to-sourmash-v4x) for guidance on updating to sourmash v4, and post questions about migrating to sourmash 4.0 in the [sourmash issue tracker](https://github.com/dib-lab/sourmash/issues/new). diff --git a/doc/sourmash-sketch.md b/doc/sourmash-sketch.md index 346f77b54e..37264ec673 100644 --- a/doc/sourmash-sketch.md +++ b/doc/sourmash-sketch.md @@ -109,8 +109,8 @@ The `-p` argument to `sourmash sketch` provides parameter strings to sourmash, a A parameter string is a space-delimited collection that can contain one or more fields, comma-separated. * `k=` - create a sketch at this k-mer size; can provide more than one time in a parameter string. Typically `ksize` is between 4 and 100. -* `scaled=` - create a scaled MinHash with k-mers sampled deterministically at 1 per `` value. This controls sketch compression rates and resolution; for example, a 5 Mbp genome sketched with a scaled of 1000 would yield approximately 5,000 k-mers. `scaled` is incompatible with `num`. See [our guide to signature resolution](using-sourmash-a-guide.md#what-resolution-should-my-signatures-be-how-should-i-create-them) for more information. -* `num=` - create a standard MinHash with no more than `` k-mers kept. This will produce sketches identical to [mash sketches](https://mash.readthedocs.io/en/latest/). `num` is incompatible with `scaled`. See [our guide to signature resolution](using-sourmash-a-guide.md#what-resolution-should-my-signatures-be-how-should-i-create-them) for more information. +* `scaled=` - create a scaled MinHash with k-mers sampled deterministically at 1 per `` value. This controls sketch compression rates and resolution; for example, a 5 Mbp genome sketched with a scaled of 1000 would yield approximately 5,000 k-mers. `scaled` is incompatible with `num`. See [our guide to signature resolution](using-sourmash-a-guide.md#what-resolution-should-my-signatures-be--how-should-i-create-them) for more information. +* `num=` - create a standard MinHash with no more than `` k-mers kept. This will produce sketches identical to [mash sketches](https://mash.readthedocs.io/en/latest/). `num` is incompatible with `scaled`. See [our guide to signature resolution](using-sourmash-a-guide.md#what-resolution-should-my-signatures-be--how-should-i-create-them) for more information. * `abund` / `noabund` - create abundance-weighted (or not) sketches. See [Classify signatures: Abundance Weighting](classifying-signatures.md#abundance-weighting) for details of how this works. * `dna`, `protein`, `dayhoff`, `hp` - create this kind of sketch. Note that `sourmash sketch dna -p protein` and `sourmash sketch protein -p dna` are invalid; please use `sourmash sketch translate` for the former. diff --git a/doc/support.md b/doc/support.md index d7a388293f..5e702ddf41 100644 --- a/doc/support.md +++ b/doc/support.md @@ -29,7 +29,7 @@ that depend on sourmash, e.g. specifying `sourmash >=3,<4` for software that is tested with sourmash 3.x. Read on for details! Upgrading major versions (to sourmash 4.0, for example) will often involve -more work; see the [next section](#upgrading-versions) for more +more work; see the [next section](#upgrading-major-versions) for more our suggested process. ### Semantic versioning @@ -148,7 +148,7 @@ If you use sourmash from the command line, there are a few major changes in 4.0 First, **`sourmash compute` is deprecated in favor of [`sourmash sketch`](sourmash-sketch.md)**, which provides quite a bit more flexibility in creating signatures. -Second, **`sourmash index` will now save databases in the Zip format (`.sbt.zip`) instead of the old JSON+subdirectory format** (see [updated docs](command-line.md#sourmash-index-build-an-sbt-index-of-signatures)). You can revert to the old behavior by explicitly specifying the `.sbt.json` filename for output when running `sourmash index`. +Second, **`sourmash index` will now save databases in the Zip format (`.sbt.zip`) instead of the old JSON+subdirectory format** (see [updated docs](command-line.md#sourmash-index---build-an-sbt-index-of-signatures)). You can revert to the old behavior by explicitly specifying the `.sbt.json` filename for output when running `sourmash index`. Third, all sourmash commands that operate on signatures should now be able to directly read from lists of signatures in signature files, SBT databases, LCA databases, directories, and files containing lists of filenames (see [updated docs](command-line.md#advanced-command-line-usage)). diff --git a/doc/tutorials.md b/doc/tutorials.md index 276d560ad7..a3a0277e38 100644 --- a/doc/tutorials.md +++ b/doc/tutorials.md @@ -13,7 +13,7 @@ X and Linux. They require about 5 GB of disk space and 5 GB of RAM. These next three tutorials are all notebooks that you can view, run yourself, or run interactively online via the -[binder](http://mybinder.org) service. +[binder](https://mybinder.org) service. * [An introduction to k-mers for genome comparison and analysis](kmers-and-minhash.md) diff --git a/doc/using-sourmash-a-guide.md b/doc/using-sourmash-a-guide.md index 2b62be021b..bde3827182 100644 --- a/doc/using-sourmash-a-guide.md +++ b/doc/using-sourmash-a-guide.md @@ -189,7 +189,7 @@ built and searched directly from the command line. Reverse indexed or LCA databases are *in-memory* databases that, once loaded from disk, support fast search and gather across 10s of thousands -of signatures. They can be created using `sourmash lca index` ([docs](command-line.md#sourmash-lca-index-build-an-lca-database)) +of signatures. They can be created using `sourmash lca index` ([docs](command-line.md#sourmash-lca-index---build-an-lca-database)) LCA databases are currently stored in JSON files (that can be gzipped). As these files get larger, the time required to load them from disk @@ -198,7 +198,7 @@ can be substantial. LCA databases are also currently (sourmash 2.0-4.0) the only databases that support the inclusion of taxonomic information in the database, and there is an associated collection of commands -[under `sourmash lca`](command.md#sourmash-lca-subcommands-for-taxonomic-classification). +[under `sourmash lca`](command-line.md#sourmash-lca-subcommands-for-in-memory-taxonomy-integration). However, they can also be used as regular indexed databases for search and gather as above. diff --git a/setup.cfg b/setup.cfg index 37e5444e52..bd3aeea86d 100644 --- a/setup.cfg +++ b/setup.cfg @@ -64,13 +64,14 @@ demo = jupyter_client ipython doc = - sphinx - myst-parser>=0.13.7,<0.15.0 + sphinx>=4.4.0,<5 + myst-parser==0.17.0 + Jinja2==3.0.3 alabaster sphinxcontrib-napoleon nbsphinx ipython - docutils>=0.17.1 + docutils>=0.17.1,<0.18.0 storage = ipfshttpclient>=0.4.13 redis