Skip to content

This personalized news is a complex web application to read personalized news. Major components includes React+Express frontend-backend, message queue based data pipelines, news topic classifiers, user preference predictions, and etc.

Notifications You must be signed in to change notification settings

xuwenzhe/news-recommend

Repository files navigation

Real-time News Scraping and Recommendation System

This system is a complex web application for users to read pensonalized news content around the world.

Architect

The major components are:

project-architect

  • Front-End:
    • renders a React single-page application
  • Node Server:
    • handles the client-side Authentication request (SignUp, Login)
    • hands over "get_news(userID, pageID)" to backend server
    • hands over "post_click_log(userID, newsID)" to backend server
  • Backend Server (RPC):
    • handles the "get_news" request from the web server
    • sends click_log to message queue
  • Recommendation (RPC):
    • updates the user's topic-preference list based on their click behavior
    • recommends news that belong to the user's interested topics
  • Real-time News Data Pipeline:
    • monitors and scrapes news from various media sources and save them into a database
    • removes literal and semantic duplicate news
  • News Classifier:
    • classifies news into topic categories given their text

Tech Stack

  • React - A JavaScript library for building user interfaces
  • ExpressJS - a minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications
  • MongoDB - a cross-platform document-oriented database program
  • Redis - an in-memory data structure project implementing a distributed, in-memory key-value database with optional durability
  • CloudAMQP - RabbitMQ (an open-source message-broker software that originally implemented the Advanced Message Queuing Protocol) as a service
  • Keras - an open-source neural-network library written in Python
  • Tensorflow - a free and open-source software library for dataflow and differentiable programming across a range of tasks
  • News API - A JSON API for live news and blog articles
  • JSONRPClib - a library implements the JSON-RPC 2.0 proposed specification in pure Python

Build Guide

  1. UI-Design
  2. Node Server
  3. Authentication
  4. MongoDB, RabbitMQ
  5. Data Pipeline
  6. Pagination, Click log collector
  7. Recommendation
  8. Classifier

Author: Wenzhe Xu

app-screenshot

About

This personalized news is a complex web application to read personalized news. Major components includes React+Express frontend-backend, message queue based data pipelines, news topic classifiers, user preference predictions, and etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published