Skip to content

Statistical and neural based methods for extracting features from text

Notifications You must be signed in to change notification settings

nbarba/NLP-Related

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-Classification

This repo contains python implementations for extracting features from text, that I have used in my research mostly for user input classification tasks.
Two approaches are implemented:

  • One based on word-embeddings, which is described as part of the baseline methods in [1].
  • A typical statistical n-gram language modeling approach, that estimates the conditional probability of a sentence in a class.

API Referernce

To do....

Toy Example

A toy example is provided, to play around with. The dataset used is a randomly selected subset of the "SMS Spam Collection" dataset available at the UCI Machine learning repository.

References

  1. Cedric De Boom, Steven Van Canneyt, Thomas Demeester, and Bart Dhoedt. 2016. Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, C (September 2016), 150-156. DOI: https://doi.org/10.1016/j.patrec.2016.06.012

Releases

No releases published

Packages

No packages published