TextGraphCrafts

Python-based summary, keyphrase and relation extractor from text documents using dependency graphs.

HOME: https://github.com/ptarau/TextGraphCrafts

Project Description

** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. Developed with Python 3, on OS X, but portable to Linux.**

Dependencies:

python 3.7 or newer, pip3, java 9.x or newer. Also, having git installed is recommended for easy updates
pip3 install nltk
also, run in python3 something like

import nltk
nltk.download('wordnet')
nltk.download('words')
nltk.download('stopwords')

or, if that fails on a Mac, use run python3 down.py to collect the desired nltk resource files.
pip3 install networkx
pip3 install requests
pip3 install graphviz, also ensure .gv files can be viewed
pip3 install stanfordnlp parser
Note that stanfordnlp requires torch binaries which are easier to instal with ````anaconda```.

Tested with the above on a Mac, with macOS Mojave and Catalina and on Ubuntu Linux 18.x.

Running it:

in a shell window, run

start_server.sh

in another shell window, start with

python3 -i tests.py

and then interactively, at the ">>>" prompt, try

>>> test1()
>>> test2()
>>> ...
>>> test9()
>>> test12()
>>> test0()

see how to activate other outputs in file

deepRank.py

text file inputs (including the US Constitution const.txt) are in the folder

examples/

Handling PDF documents

The easiest way to do this is to install pdftotext, which is part of Poppler tools.

If pdftotext is installed, you can place a file like textrank.pdf already in subdirectory pdfs/ and try something similar to:

Change setting in file params.py to use the system with other global parameter settings.

Alternative NLP toolkit

Optionally, you can activate the alternative Stanford CoreNLP toolkit as follows:

install Stanford CoreNLP and unzip in a derictory of your choice (ag., the local directory)
edit if needed start_parser.sh with the location of the parser directory
override the params class and set corenlp=True

Note however that the Stanford CoreNLP is GPL-licensed, which can place restrictions on proprietary software activating this option.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.idea		.idea
examples		examples
pdfs		pdfs
textcrafts		textcrafts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_test.py		api_test.py
clean.sh		clean.sh
go		go
requirements.txt		requirements.txt
setup.py		setup.py
start_server.sh		start_server.sh
test.py		test.py
upload.sh		upload.sh
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextGraphCrafts

Project Description

Dependencies:

Running it:

in a shell window, run

in another shell window, start with

see how to activate other outputs in file

text file inputs (including the US Constitution const.txt) are in the folder

Handling PDF documents

Alternative NLP toolkit

About

Releases

Packages

Contributors 2

Languages

License

ptarau/TextGraphCrafts

Folders and files

Latest commit

History

Repository files navigation

TextGraphCrafts

Project Description

Dependencies:

Running it:

in a shell window, run

in another shell window, start with

see how to activate other outputs in file

text file inputs (including the US Constitution const.txt) are in the folder

Handling PDF documents

Alternative NLP toolkit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages