Skip to content

Content aggregator that collects metadata about news articles

License

Notifications You must be signed in to change notification settings

pi-sigma/nous-aggregator

Repository files navigation

Nous Aggregator

Python 3.11 Django 5.01 Django CI License: MIT

Overview

A content aggregator that collects metadata about articles from newspapers, journals, blogs, etc. The scraper uses information about the structure of the targeted pages as well as regular expressions in order to scrape selectively and filter the results. To facilitate the start of a new project, the data required by the scraper is extracted to files in the fixtures directory.

Inspired by AllTop.

Installation

Make sure Docker is installed on your system.

Clone the repository into a directory of your choice:

mkdir MYAPPDIR
git clone https://github.com/pi-sigma/nous-aggregator.git MYAPPDIR

Inside the new directory, create a file for the environment variables:

touch .env

Open the file with the editor of your choice and set the environment variables. See env-sample for instructions.

Build the Docker image:

docker-compose build

Start the web container in detached mode, apply the migrations, and initialize the database:

docker-compose up -d web
docker-compose run web python manage.py migrate
docker-compose run web python manage.py loaddata fixtures/sources.json

Create a superuser for the Django app:

docker-compose run web python manage.py createsuperuser

Stop the containers:

docker-compose stop web
docker-compose stop db

Usage

Start the Docker containers:

docker-compose up

You can access the page at one of the following addresses:

http://0.0.0.0:8000
http://127.0.0.1:8000
http://localhost:8000

If all went well, you should see the homepage of the app with a list of news sources arranged in a grid. The grids are empty to begin with and fill up when the celery workers start (depends on the schedule in scraper.tasks).

In order to extract data about the sources from the database, use the following command while the web container is running (the commands for the other tables are analogous):

docker-compose run web python manage.py dumpdata articles.source --indent 2 > fixtures/sources.json

About

Content aggregator that collects metadata about news articles

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published