Skip to content

Hazm 0.9

Compare
Choose a tag to compare
@sir-kokabi sir-kokabi released this 20 May 15:48
· 366 commits to master since this release
548c4b1

Added

  • Windows compaitiblity by using Python-crfsuite instead of Wapiti. @E-Ghafour.
  • Pretrained Chunker and POSTagger models with Python-crfsuite. @E-Ghafour.
  • new parameters in Normalizer to better text processing. @sir-kokabi.
  • Three regex patterns in Normalizer to fix ZWNJs and spacing issues. @sir-kokabi.
  • 400 Non-standard unicode characters to be replaced in Normalizer. @sir-kokabi.
  • 40,000+ new words to improve Lemmatizer and Tokenizer. @sir-kokabi.
  • train function for Word2vec and Sent2vec modules in Embedding. @E-Ghafour.
  • Implement keywordExtraction with the embedRank approach as a sample of Hazm usage. @E-Ghafour.
  • Support Universal tags in POSTagger. @E-Ghafour.
  • Support universal POS mapper in PeykareReader & DadeganReader (#239). @phsfr.
  • PersianPlainTextReader to process raw text datasets (#120). @mhbashari.
  • Support EZ tag in PeykareReader. @E-Ghafour.
  • Slash & back-slash (/ ) support in Tokenizer (#102). @elahimanesh.
  • Conjugation class to handle verb conjugation. @sir-kokabi.

Fixed

Changed

  • Drop Python 2 support and migrate all code to Python 3. @sir-kokabi.
  • Use data_maker function instead of patterns in SequenceTagger. @E-Ghafour.
  • Refactor IOBTagger and POSTagger to be compatible with data_maker. @E_Ghafour.
  • Change می روم to می‌روم in example (#203). @SMSadegh19.
  • Overhaul the project structure and GitHub repo. @sir-kokabi.

Download Pretrained models

Full Changelog: v0.8.2...v0.9