Releases: sourmash-bio/sourmash
v4.1.1
This release fixes a minor bug, provides some refactorings, and dramatically decreases memory consumption for sourmash gather --linear
(which is, admittedly, a niche use case :).
No major new features.
Bug fixes and performance improvements:
- Unload data with
sourmash gather --linear
on SBTs (#1534) - Fix
sourmash gather --no-prefetch
when used w/abund signatures (#1528) - Fix
sourmash index
to not create directory for .sbt.zip output (#1539)
Major refactoring and new internal functionality:
- Add
FrozenMinHash
to better support separation of frozen and mutable data actions (#1508)
Refactoring and cleanup:
v4.1.0
4.1.0 release notes
This release provides several convenient features for users, including zipfile collections on input and output and a new prefetch
command. sourmash gather
has also received a considerable speed/memory upgrade (twice as fast, 80-90% lower memory). You should upgrade! As a reminder, v4.x has several incompatibilities with v3.x, and if you are upgrading from v3.x you should consult our migration guide.
Major new features:
- Support zipped collections of signatures (#1349)
- Refactor
gather
functionality for speed & modularity (#1370, #1512, #1513) - Provide new command,
prefetch
. (#1370) - Add flexible & iterative support for outputting signatures in variety of collection formats - directories, zipfiles, etc. (#1493)
- Add
max_containment
to API and--max-containment
to command line (#1346) - Add
--from-file
option tosourmash sketch
commands (#1362)
Bug fixes that break backwards compatibility:
- Require scaled signatures for containment (#1381)
- Fix CSV output for
sourmash lca classify
when.name
is empty (#1401) - Really old SBTs (pre-v2.0) no longer load (v1 and v2 SBTs) (changed in #1392)
Other bug fixes:
- Add proper newline output for csv module (#1319) - important for Windows!
Other new features:
--best-only
searches now work for both similarity AND containment (fixed in #1392)sourmash categorize
now takes all database types- add
--name
tosourmash sig merge
(#1480) - decline to load really large files for LCA databases if they're not valid JSON (#1495)
Major refactoring and new internal functionality:
- Add a
MultiIndex
class that wraps multipleIndex
classes (#1374) - Refactor and dramatically simplify database loading and compatibility checking (#1406, #1420)
- Rework the
find
functionality forIndex
classes (#1392, #1477). - Improved intersection and union calculations (#1475)
Documentation enhancements:
- Update the sourmash
__init__.py
docstring, provide__all__
for imports (#1364) - Add '-h/--help' usage instructions to 'sourmash sketch' CLI (#1400)
- Add ORCID to contribution checklist (#1405)
- Add information about updating the developer environment to the developer docs (#1432)
- Docs: Partial fix for doc build issues with notebooks (#1516)
Refactoring and cleanup:
- Refactor the database loading code in
sourmash_args
(#1373, #1380) - Pin needletail version to keep MSRV at 1.37 (#1393)
- Rename
load_file_list_of_signatures
toload_pathlist_from_file
(#1423) - Update call to notify in
src/sourmash/search.py
with f-strings (#1422) - Bump MSRV to 1.42 (and other dep fixes) (#1461)
- CI/Rust: update and fix cbindgen config (#1473)
- Refactor MinHash.downsample (#1458)
- Make
MinHash.downsample(...)
require keyword arguments & fix newly revealed buggy test. (#1448) - Add a check for LCA database error text in
tests/test_lca.py
(#1445) - pin docutils version to last working (#1444)
- add codecov configuration to fix paths (#1422, #1449)
- provide new test fixtures for cleaner testing (#1487)
- Fix small papercuts: SyntaxWarning and coverage reports (#1488)
- Clean up clippy lints from 1.52 (#1505)
- Bump docutils from 0.16 to 0.17.1 (#1499)
- Update myst-parser requirement from ~=0.13.7 to >=0.13.7,<0.15.0 (#1520)
- replace utils.TempDirectory with runtmp in some tests (#1502)
v4.0.0
Major changes for 4.0
4.0 is a major new version of sourmash, and it contains a number of new and breaking features.
Please see our migration guide for more information on how to migrate from v3.x to version 4.0!
Numerical output and search results are unchanged
There are no changes to numerical output or search results in this release; you should get the same results with v4 as you get with v3, except where command-line parameters need to be adjusted as noted below (see: protein ksize #1277, lca summarize
changes #1175, sourmash gather
on signatures without abundance #1328). Please file an issue if your results change!
New or changed behavior
- default SBT storage is now .sbt.zip (#1174, #1170)
- add
sourmash sketch
command for creating signatures (#1159) - protein ksizes in MinHash are now divided by 3, except in
sourmash compute
(#1277) - refactor MinHash API and implementation: add, iadd, merge, hashes, and max_hash (#1282, #1154, #1139, #1301)
- add HyperLogLog implementation (#1223)
SourmashSignature.name
is now a property (not a method): usestr(sig)
instead ofname()
(#1179, #1232)lca summarize
no longer merges all signatures, and uses hash abundance by default (#1175)index
andlca index
(#1186, #1222) now support--from-file
and no longer require signature files on command line--traverse-directory
is now on by default for signature loading behavior (#1178)sourmash sketch
andsourmash compute
no longer create empty signatures from empty files and stdin (#1347);sourmash sketch
andsourmash compute
setsig.filename
to empty string when filename is-
(#1347);
Feature removal
- remove Python 2.7 support (& end Python 2 compatibility) (#1145, #1144)
- remove
lca gather
(#1307) - remove 10x support from
sourmash compute
(#1229) - remove 'dump' command (#1157)
Feature/function deprecations
- deprecate
sourmash compute
(#1159) - deprecate
load_signatures
,sourmash.load_one_signature
,create_sbt_index
, andload_sbt_index
(#1279, #1304) - deprecate import_csv in favor of new
sourmash sig import --csv
(#1281)
Refactoring, improvements, and minor bug fixes:
- accept file list in
sourmash sig cat
(#1236) - add unique_intersect_bp and gather_result_rank to gather CSV output (#1219)
- remove deprecated minhash functions (#1149)
- fix Rust panic error in signature creation (#1172)
- cache nodes in SBT during search (#1161)
- fix two bugs in gather --output-unassigned (#1156)
- Refactor the gather code so that it uses 'hashes' instead of 'mins' (#1329)
- Update output from gather w/o abundances, so that abund output is empty instead of 0(#1328)
Documentation updates
- substantial revisions and updates to the documentation (#1283)
- add information about versioning, migrations, etc to the docs (#1153)
Infrastructure and CI changes:
- update finch requirement from 0.3.0 to 0.4.1 (#1290)
- update rand for test, and activate "js" feature for getrandom (#1275)
- dev updates (configs and doc) (#1298)
- move wheel building from Travis to GitHub Actions (#1295)
- fix new clippy warnings from Rust 1.49 (#1267)
- use tox for running tests locally (#696)
- CI: small build fixes (#1252)
- CI: Fix releases in GitHub Actions (#1250)
- update build_wheel action paths
- CI: moving python tests from travis to GH actions (#1249)
- CI: move wheel building to GitHub actions (#1244)
- remove last .rst file from docs (#1185)
- update CI for latest branch name change (#1150)
v3.5.1
Feature deprecations
- add deprecation warning for
sourmash compute --input-is-10x
(#1326) - add warnings about new
sourmash lca summarize
behavior (#1326) - add warning for new behavior of
MinHash.merge(...)
(#1326) - add deprecation warning for
TarStorage
(#1165)
Infrastructure and CI changes:
- Backport github actions to stable branch (3.5.x) (#1317)
v3.5.0
This is the first of several minor releases (v3.5.x) from the new stable
branch. These releases focus on preparing for sourmash v4.0 by introducing deprecations and warnings for features that will be removed in v4.0.
Refactoring and deprecations:
MinHash
class refactoring (#1128, #1129); many deprecations for 4.0 and 5.0sourmash dump
deprecated, for removal in 4.0 (#1147)import sourmash_lib
deprecated, for removal in 4.0 (#1143)
Cleanup:
- remove mentions of ijson and khmer (no longer needed dependencies) #1140
Documentation:
- Simplify and clean up README (#1124)
- Add sourmash logo to docs and README (#1127)
- update release process and release notes (#1125)
Rust:
- Update typed-builder requirement from 0.6.0 to 0.7.0 (#1121)
v3.4.1
Major new features:
- Document
sourmash.fig
usage and behavior; enable output ofcompare
clustering with labels (#859) - Adds --majority option to
lca classify
using majority vote algorithm (#1113)
Minor improvements:
- MinHash compatibility check to sourmash sig intersect (#1116)
Bugs fixed:
- add ksize selectors back into sourmash sig functions (#1105)
Documentation updates:
v3.4.0
Major new features:
- enable seamless loading of signatures from indexed databases (#1059, #1083, #1090)
- add
signature cat
andsignature split
commands to combine/split signature files (#1044, #1074) - add compute-optimized MinHash (for small scaled or large cardinalities) in Rust (#1045)
- optionally weight lca summarize output by hashval abundance. (#1022)
- enable moltypes other than DNA in LCA databases (#1013)
Minor improvements:
- add --num-results/-n to gather (#1047)
- improve lca index error message when inserting num signature (#1076)
- autodetect FASTA/FASTQ files if given as signatures (#1078)
- add is_lineage_match, pop_to_rank, make_lineage to lca_utils (#1081)
- use stricter niffler versions and add new gz feature to it (#1070)
- added
MinHash.clear()
andMinHash.add_hash_with_abundance
to Python API (#1046)
Bugs fixed:
- investigations and fixes around new gather behavior. (#1001)
Refactoring:
- move tests from
test_lca
intotest_lca_functions
(#1035) - remove unused run_shell_cmd function (#1032)
- refactor some tests in test_sourmash.py to use @utils.in_tempdir decorators (#1020)
- use install scripts from py-ipfs-http-client (#1068)
Documentation:
- Improve documentation around abundance projection (#1073)
- Replace recommonmark with myst (docs) (#1021)
- Fix doctest filename error (#1040)
Thanks to @luizirber @ctb @bluegenes @erikyoung85 for their contributions!
3.3.1
version 3.3.0
Improvements:
- add
ZipStorage
, support loading SBT databases from storage;.sbt.zip
extensions. (#648) - Replace
khmer.Nodegraph
with rust nodegraph; ~5x speedup of SBT search & gather. (#799)
Bugs:
- Document and (lightly) fix the
LCA_Database
API. (#966) - Fix bug when using Python 3.5 and before; refactor
LCA_Database
tests (#962)
Documentation:
version 3.2.3
Incompatibilities with previous versions due to bugs:
sourmash gather
on SBT databases was setting--threshold-bp=0
in all cases. This was fixed in #942, and output may change. Specify--threshold-bp=0
to recover old behavior.
Improvements:
- refactor LCA_Database class to support programmatic creation. (#946)
- add --singleton option to lca summarize (#922)
- update gather to calculate fraction of match that was in original query (#938)
- add compare --containment (#937)
- add --outdir argument to
sourmash compute
(#935) - improvements to sourmash argparse output for compute. (#931)
Bugs:
- fix
lca classify
bug with -o (#902) - set_abundances now works with large signatures (#911)
- test & fix LinearIndex, SBT, and LCA
gather
thresholding. (#942)
Build, CI and docs: