NEWS

This file lists noteworthy changes between releases, for full list of changes, see git log and then ChangeLog.old.

Significant changes in 20120401

whole new finntreebank tagset for forthcoming finntreebank work
uppercasing is noted in the analysis level
the word boundaries of lexicalised compounds may be available for more cases (depending on the tagset)
whole new lemmatizer tagset is available
some dozens of new words added and fixed
combine corpus analysis script with apertium's preprocessors
causative derivation chain added
bbreviations, adpositions, prefixes and suffixes are no longer pos but subcat analyses

completely new morphology built on traditional lexc-twolc model
easier route to add new lexical data via simple CSV format
lots of new lexical data from Joukahainen project as well as extended from kotus-sanalista semi-automatically and by hand.
titlecasing filter for regular words
š filter for old orthography variants
compounding much less haphazard concoction
parts of speech classified and included
pronouns, interjections, numerals, proper nouns
much closer to real full fledged morphology
movement from SFST to HFST toolset with lots of new cool toys (SFST support is retained in HFST)
towards full-scale automatic test suite