Python-based summary, keyphrase and relation extractor from text documents using dependency graphs.
HOME: https://github.com/ptarau/TextGraphCrafts
** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. Developed with Python 3, on OS X, but portable to Linux.**
- python 3.7 or newer, pip3, java 9.x or newer. Also, having git installed is recommended for easy updates
pip3 install nltk
- also, run in python3 something like
import nltk
nltk.download('wordnet')
nltk.download('words')
nltk.download('stopwords')
- or, if that fails on a Mac, use run
python3 down.py
to collect the desired nltk resource files. pip3 install networkx
pip3 install requests
pip3 install graphviz
, also ensure .gv files can be viewedpip3 install stanfordnlp
parser- Note that
stanfordnlp
requires torch binaries which are easier to instal with ````anaconda```.
Tested with the above on a Mac, with macOS Mojave and Catalina and on Ubuntu Linux 18.x.
start_server.sh
python3 -i tests.py
and then interactively, at the ">>>" prompt, try
>>> test1()
>>> test2()
>>> ...
>>> test9()
>>> test12()
>>> test0()
deepRank.py
examples/
The easiest way to do this is to install pdftotext, which is part of Poppler tools.
If pdftotext is installed, you can place a file like textrank.pdf already in subdirectory pdfs/ and try something similar to:
Change setting in file params.py to use the system with other global parameter settings.
Optionally, you can activate the alternative Stanford CoreNLP toolkit as follows:
- install Stanford CoreNLP and unzip in a derictory of your choice (ag., the local directory)
- edit if needed
start_parser.sh
with the location of the parser directory - override the
params
class and setcorenlp=True
Note however that the Stanford CoreNLP is GPL-licensed, which can place restrictions on proprietary software activating this option.