Skip to content

Releases: isi-nlp/nlcodec

0.5 - Add `byte` scheme

24 Dec 01:26
33f043f
Compare
Choose a tag to compare

v0.4.0 -- add support for class scheme

04 Aug 06:16
Compare
Choose a tag to compare
  • class scheme supported now

v0.3.2 - Shrink vocabulary support

29 Apr 01:40
Compare
Choose a tag to compare

This version is used in our many-English paper https://arxiv.org/abs/2104.00290

nlcodec CLI bug fix. Add nlcodec-learn CLI for Spark based learn

24 Feb 21:05
Compare
Choose a tag to compare

Db, Multipartdb, Batch, and more; perf improv with __slots__

04 Aug 01:03
Compare
Choose a tag to compare
  • add nlcodec-freqs CLI to setup.py
  • log time and memory usage for learn task
  • log BPE merge operations once every 2s instead of all operations
  • using__slots__: ~25% faster, %30 less memory for BPE with 3M word types
  • nlcodec.db.core with Db and MultipartDb
  • nlcodec.db.batch with Batch and BathIterable
  • CLI nlcodec.learn for learning BPE using pyspark
  • CLI nlcodec.bitextdb to build a database from parallel text

fix issue with name property

14 Jul 18:54
Compare
Choose a tag to compare
  • fix issue with name as class property (#24, #25)

option to run on a spark session given by caller

08 Jul 06:20
Compare
Choose a tag to compare
  • spark session can be specified by user
  • docs published

PySpark and term-frequencies support for large datasets

14 Jun 22:29
3c732d3
Compare
Choose a tag to compare
  • Option to accept term frequencies as input
  • PySpark backend to compute word and char frequencies
  • --min-co-ev of BPE is CLI arg

Fix find_packages() issue; select all nested packages

30 May 18:49
Compare
Choose a tag to compare

public release v0.2.0

17 Apr 17:43
Compare
Choose a tag to compare
  • uploaded to pypi : pip install nlcodec
  • public repository with apache license 2.0