Skip to content

Latest commit

 

History

History
29 lines (18 loc) · 1.3 KB

README.md

File metadata and controls

29 lines (18 loc) · 1.3 KB

#Text Analysis and Machine Learning

These is a Simple Study purpose project of Text Analysis and ML. It has a simple task initially to generate Graphical representation of related entity words given in the given unstructured text. In the process I first learn how individual word get extracted and removal of suffix is important in finding the meaning form stemmed root word. But Than I find a algorithm which is word2vec, and I find it very useful to learn. So I tend to learn it first.

The Concept I learn till now:

Task 1 --

  • Word extraction
  • Stemming
  • Word to Vector (Initial Part)

Task 2 --

  • Implement the SVD algorithm
  • Add the important part of the project. The Single Value Decomposition of the word co-occurrence matrix.
  • think their might be pseudo relations because of not removing stopwords, I considered stopwords which are in NLTK kit.

Next Task are

  • Create a CBOW and Skip-gram Algorithm

  • Implement Word to Vector using SVD in CBOW and Skip-gram algorithm

  • Future Vision :-*

  • As the main notion of these study generate a graphical representation from unstructured text between real word Entity.

  • For graphical view use D3.js with JSON conversion of my C++ API

  • Create a site, a portal, which upload a text file from user and generate Graphical representation on the website.