Python News Scraper

A simple python scraper that grabs text off of news sites for sentiment analysis and creates a word cloud.

Running the script

This script relies on the matplotlib, wordcloud, requests, BeautifulSoup, json, and nltk packages.

git clone https://github.com/jcbond92/python-news-scraper.git
cd python-news-scraper
pip install matplotlib wordcloud requests BeautifulSoup json nltk
python run.py

Files will be output to the app/results subdirectory.

Editing the pages that are analyzed

In run.py you can update the configuration with more pages:

pages = [
    {
        "url": "https://www.washingtonpost.com", # path of the page to request
        "name": "wash-post-homepage-headlines", # a name that will be used when the output files are created
        "cssSelector": "h2" # the CSS selector used to grab the text for evaluation (this is grabs all instances of that element)
    },
    {
        "url": "https://www.washingtonpost.com/us-policy/2021/10/04/biden-schumer-debt-ceiling/",
        "name": "wash-post-debt-ceiling",
        "cssSelector": "section"
    }
]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
.gitignore		.gitignore
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python News Scraper

Running the script

Editing the pages that are analyzed

About

Releases

Packages

Languages

jcbond92/python-news-scraper

Folders and files

Latest commit

History

Repository files navigation

Python News Scraper

Running the script

Editing the pages that are analyzed

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages