Releases: jean-pierreBoth/gsearch
New release for paper
New ANI calculator added
This release we added an additional ANI calculator for even more accurate ANI calculation, bigsig for reads search & classification.
GSearch release v0.1.4
With this new release, both optimal densification and faster (or reverse) optimal densification are supported.
GSearch v0.1.3 seq processing and better parallelism
For v0.1.3, we have better memory consumption and parallel efficiency controlled by the number of threads for sketching. We can sketching metagenome now by controlling the number of pio files to reduce memory. Also, genomes can be processed seq by seq or processed by entire genome after concatenating sequences in the genome via the --block option. Default is seq processing. Block processing is more memory friendly but introduce a small number of artificial kmer, which does not affect the final results. We have binaries for major platforms, including windows. An amino acid bug fix in the beta version.
GSearch v0.1.3 seq processing and better parallelism
For v0.1.3, we have better memory consumption and parallel efficiency controlled by the number of threads for sketching. We can sketching metagenome now by controlling the number of pio files to reduce memory. Also, genomes can be processed seq by seq or processed by entire genome after concatenating sequences in the genome via the --block option. Default is seq processing. Block processing is more memory friendly but introduce a small number of artificial kmer, which does not affect the final results. We have binaries for major platforms, including windows.
GSearch version 0.1.2
more digits for output distance use more digits for output genomic distances.
Some minor fix with probminhash version
This release, we add a superminhash support and also optimize parallel fasta parsing, which further accelerate the build and request step. Superminhash can be used for metagenomes, where sketching a larger number of files is very memory expensive. We also add HLL, which is for space efficient purposes but slower (e.g. memory and disk space). Further optimization of HLL is expected.
New release with add functionality to existing database
With this release, you can add new genomes to an existing database with the --add feature for both nt and AA database. We also provide a test data. It is recommended that you are pretty sure the genomes to be added is smaller than 95% ANI with any of the genomes in the database (you can know this after running request step, calculate the ANI of your query with the best hit found). Pre-compiled Binaries on Linux kernel (intel x86_64), Darwin kernel (intel x86_64) and Darwin kernel (arm64) are available. Let me know if you have Linux (aarch64) machine and need to compile form source. Openblas must be installed for Darwin. gfortran dynamic library (libgfortran.so) and GCC as system default requirement (install them if you do not have them). All the above was either statically or dynamically linked to OpenBLAS, a CPU structure independent BLAS implementation. For Intel processors, we also provide a Linux binary that was based on the Intel MKL implementation of BLAS, which is slightly faster than OpenBLAS but only works on Intel x86_64. After you download the .zip files you need to unzip it using gunzip and then make the binaries executable using the 'chmod a+x ./tohnsw' and then put the binary into you system path to run (e.g. /usr/local/bin/) or any place where you put it into your system path.
GSearch v0.0.10
This is the initial release of GSearch software for large scale microbial genome search.There are two binaries in this release, tohnsw and request, respectively. Pre-compiled Binaries on Linux kernel (intel x86_64), Darwin kernel (intel x86_64) and Darwin kernel (arm64) are available. Let me know if you have Linux (aarch64) machine and need to compile form source. Openblas must be installed for Darwin. gfortran dynamic library (libgfortran.so) and GCC as system default requirement (install them if you do not have them). All the above was either statically or dynamically linked to OpenBLAS, a CPU structure independent BLAS implementation. For Intel processors, we also provide a Linux binary that was based on the Intel MKL implementation of BLAS, which is slightly faster than OpenBLAS but only works on Intel x86_64.