A list of NLP tools with an emphasis on the command line.
- AlchemyCmd is a command-line tool for performing natural language processing and text analysis on Linux/Unix systems.
- Awk and Sed for Language Analysis
- "Combining the Bourne-shell, sed and awk in the UNIX environment for language analysis" by Lothar M. Schmitt and Kiel T. Christianson. Original, mirror.
- dbacl is a command line text classifier. It’s uses bigrams for features and, as far as I can tell (I’ve only skimmed the source) builds a maximum entropy model for classification. I’ve only played with it a little bit, but my impressions so far are that it’s easy to use, fast and produces high quality results.
- Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants.
- Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization." It was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy.
- Sentences - A command line sentence tokenizer in Go.
- TextBlob: Simplified Text Processing