Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 1.6 KB

nlp-resources.md

File metadata and controls

14 lines (10 loc) · 1.6 KB

A list of NLP tools with an emphasis on the command line.

  • AlchemyCmd is a command-line tool for performing natural language processing and text analysis on Linux/Unix systems.
  • Awk and Sed for Language Analysis
  • "Combining the Bourne-shell, sed and awk in the UNIX environment for language analysis" by Lothar M. Schmitt and Kiel T. Christianson. Original, mirror.
  • dbacl is a command line text classifier. It’s uses bigrams for features and, as far as I can tell (I’ve only skimmed the source) builds a maximum entropy model for classification. I’ve only played with it a little bit, but my impressions so far are that it’s easy to use, fast and produces high quality results.
  • Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants.
  • Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization." It was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy.
  • Sentences - A command line sentence tokenizer in Go.
  • TextBlob: Simplified Text Processing