Hazm 0.9
Added
- Windows compaitiblity by using
Python-crfsuite
instead ofWapiti
. @E-Ghafour. - Pretrained
Chunker
andPOSTagger
models withPython-crfsuite
. @E-Ghafour. - new parameters in Normalizer to better text processing. @sir-kokabi.
- Three regex patterns in Normalizer to fix ZWNJs and spacing issues. @sir-kokabi.
- 400 Non-standard unicode characters to be replaced in
Normalizer
. @sir-kokabi. - 40,000+ new words to improve
Lemmatizer
andTokenizer
. @sir-kokabi. train
function forWord2vec
andSent2vec
modules inEmbedding
. @E-Ghafour.- Implement
keywordExtraction
with theembedRank
approach as a sample of Hazm usage. @E-Ghafour. - Support
Universal tags
inPOSTagger
. @E-Ghafour. - Support universal POS mapper in
PeykareReader
&DadeganReader
(#239). @phsfr. PersianPlainTextReader
to process raw text datasets (#120). @mhbashari.- Support
EZ
tag inPeykareReader
. @E-Ghafour. - Slash & back-slash (/ ) support in
Tokenizer
(#102). @elahimanesh. Conjugation
class to handle verb conjugation. @sir-kokabi.
Fixed
- Improve the accuracy of
POSTagger
andChunker
. @E-Ghafour. - Improve
InformalNormalizer
#219. @riasati. - Fix pep8 issues. (#135). @hadifar.
- Fix Some tests issues. @sir-kokabi @E-Ghafour.
- Fix
Stemmer
issues with multiple suffixes. @sir-kokabi. - Fix various reported issues
Changed
- Drop Python 2 support and migrate all code to Python 3. @sir-kokabi.
- Use
data_maker
function instead ofpatterns
inSequenceTagger
. @E-Ghafour. - Refactor
IOBTagger
andPOSTagger
to be compatible withdata_maker
. @E_Ghafour. - Change می روم to میروم in example (#203). @SMSadegh19.
- Overhaul the project structure and GitHub repo. @sir-kokabi.
Full Changelog: v0.8.2...v0.9