We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PYTHAINLP_DATA_DIR
~/pythainlp-data
pythainlp.util.thai_time
bahttext
pythainlp.tokenize.Tokenizer
import pythainlp
ssg
AttaCut
engine="attacut"
pythainlp.tokenize.word_tokenize()
newmm-safe
newmm
longest
Removing and updating many dependencies - thanks @c4n @artificiala @cstorm125 @korakot @bact @wannaphong
Remove:
keras
tensorflow
fastai
pythainlp.ulmfit
marisa-trie
deepcut
engine="deepcut"
Update:
artagger
setup.py
The text was updated successfully, but these errors were encountered:
it would be more efficient if we have a way to automatically generate this change log.
Sorry, something went wrong.
@heytitle agreed. do you have any suggestion?
Publicly announced the 2.1, close the issue. https://www.blognone.com/node/113587
No branches or pull requests
Corpus
PYTHAINLP_DATA_DIR
environment variable to set location of downloaded data (default is~/pythainlp-data
) (add option of setting data dir with an enviromental variable #238 Added docs on PYTHAINLP_DATA_DIR environ variable #294) - thanks @dhpollack @abhabongseLocalization
pythainlp.util.thai_time
Time spell out to Thai words (Add pythainlp.util.thai_time #303) thanks @wannaphong @abhabongse @bactbahttext
bug for a value of one million (bahttext not working for 1,000,000 #350) thanks @wannaphongTokenizer
pythainlp.tokenize.Tokenizer
is now immediately available whenimport pythainlp
(79432c2) - thanks @korakotssg
, a CRF syllable segmentor (Questions on the implementation of syllable_tokenize #229 Alternative syllable tokenizer #237 Add ssg #242) - thanks @wannaphong @ponrawee @heytitleAttaCut
, a fast and accurate tokenizer, is now available throughengine="attacut"
inpythainlp.tokenize.word_tokenize()
(Integrate AttaCut to PyThaiNLP #258, add attacut to pythainlp/tokenize #261) - thanks @heytitle @bkktimbernewmm-safe
forpythainlp.tokenize.word_tokenize()
- anewmm
engine with additional mechanism to avoid possible exponentially long wait for long text with a lot of ambiguity in breaking points. ("newmm-safe" option -- fix newmm issue, take too long time for long text with lots of ambiguity breaking points #302) - thanks @bactnewmm
engine, to help avoid possible long wait (Add graph size limit in _onecut() to avoid long wait for ambiguous text #333) (available in 2.1.1, backport from 2.2) - thanks @bactlongest
engine, last character is now consumed (Longest Match segment fails when the entire input text is a full word. #357) (available in 2.1.4 - thanks @bactSpellchecker
Named-Entity Tagger
Dependency cleaning
Removing and updating many dependencies - thanks @c4n @artificiala @cstorm125 @korakot @bact @wannaphong
Remove:
keras
,tensorflow
(Port Thai2Rom from Keras to PyTorch #202 Thai2Rom on PyTorch (seq2seq no attention mechanism) #235 pytorch seq2seq implementation for Thai romanization #246) - Thai romanization is now implemented in PyTorchfastai
(Removefastai
from the dependencies #252) - removing and replacingpythainlp.ulmfit
preprocessing-related code with a self-implemented onemarisa-trie
(Change frommarisa-trie
to a Trie implementation written in python #277) - removing and replacing with native Trie implementationdeepcut
(Remove deepcut, keras, tensorflow from dependencies #283) - removing, word tokenizer still supportengine="deepcut"
but the user needs to install dependencies (deepcut
,keras
,tensorflow
) by themselvesUpdate:
artagger
(Use artagger from main repo, use tensorflow < 2 #281) - updating to use one from the main repo (was depends on a fork)setup.py
(Include only direct dependency in setup.py #275)Documentation
Others
The text was updated successfully, but these errors were encountered: