Skip to content

Releases: Halvani/Constituent-Treelib

v0.0.7

10 Apr 00:02
Compare
Choose a tag to compare

What's new?

The underlying language detector fasttext caused various errors, which among other things led to a dedicated stack overflow thread. As a result, it has been replaced with an alternative language detector. I decided to use langid for this, as it not only works more reliably, but also integrates the language model directly, so there are no external dependencies.

v0.0.6

24 Mar 00:29
Compare
Choose a tag to compare

What's new?

  • The structure of the constituent tree can be modified. By default, inner postag nodes and token leaves are present (Structure.Complete). Alternatively, postag nodes or token leaves can be removed. In the case of the latter, postag sequences result from the extracted phrases.
  • Ensured that there are no multiple spaces at the end of a sentence that cause an exception regarding benepar when the sentence is parsed.
  • Create_pipeline() downloads the benepar model to the path "share\nltk_data\models" so that no remaining data is left behind in the CTL directory when CTL is uninstalled.
  • Create_pipeline() is supplied with a 'quite' parameter to suppress pip installation output.
  • Integrated optional expansion of contractions (e.g., I'm --> I am) within sentences. Note that this is only supported for English.
  • Incorporation of comprehensive error handling (e.g., validating language mismatch between the given sentence and the benepar and spaCy models). Integrated custom exceptions that simplify the debugging process.
  • Extensive code refactoring (e.g., reduction of code repetitions, conversion of all string literals from ' to ", etc.)

v0.0.5

28 Jan 12:41
8470620
Compare
Choose a tag to compare
  • Fixed Type aliases problem
  • Fixed several dependency problems
  • Changed the behavior of the create_pipeline() method so that the spaCy and benepar models are now downloaded by default if they are not present.