Keep-Current (Express)

Express version

This repository is a monolith version of the Keep-Current project.

The goal is to locate new articles (scientific papers), parse and process them. The articles comes from different sources. i.e. arxiv.org.

It exposes a REST API for the client to use it thorugh the UI.

Getting started

for running this project locally, you need first to install the dependency packages. To install them, you can use

pipenv
anaconda
virtualenv

Installation using pipenv (which combines virtualenv with pip)

Install pipenv

sudo easy_install pip # if you haven't installed pip
pip install pipenv # install pipenv

brew install pipenv # with homebrew (on macOS)

Before installing, we set pipenv to install the packages on our current folder:

OS	CLI command
Windows	`set PIPENV_VENV_IN_PROJECT=true`
Mac / Linux	`PIPENV_VENV_IN_PROJECT=true`
Docker	`ENV PIPENV_VENV_IN_PROJECT true`

Now we're ready to install the packages and run the server:

pipenv install # install all packages

pipenv run flask run # run the server

If you are on Windows OS, some packages may not be installed. Specifically - feedparser. In case the web server doesn't run, please install these packages manually using

pip install feedparser

Installing using Anaconda

If you have anaconda installed, it's recommended to create an environment for the project, and install the dependencies in it.

conda create -q -n web-miner python=3.6 # create the environment

source activate web-miner # activate the environment

pip install pipenv

pipenv install

and test your installation by running the web server:

flask run # start server

Installing using virtualenv and pip

sudo easy_install pip # installl pip if you haven't

pip3 install --upgrade virtualenv # install virtualenv

virtualenv --python3 <targetDirectory> # create the environment

source <targetDirectory>/./bin/activate # activate the virtualenv

pip install pipenv

pipenv install

flask run # start server

Who are we?

This project intends to be a shared work of meetup members, with the purpose, beside the obvious result, to also be used as a learning platform, while advancing the Natural Language Processing / Machine Learning field by exploring, comparing and hacking different models.

Please visit

the project board on Github
the repository board on Github for more.

How to Contribute

Our Project board is located here on GitHub and we use Slack as our communication channel. Please use this link to join. There's also a facebook group where we discuss and share current topics also outside of the project.

We welcome anyone who would like to join and contribute.

Please see our contribute guide.

We meet regularly every month in Vienna through

the Data Science Cafe meetup of the VDSG or
the WeAreDevelopers :: Keep-Current meetup

to show progress and discuss the next steps.

Keep-Current Project

After studying a topic, keeping current with the news, published papers, advanced technologies and such proved to be a hard work. One must attend conventions, subscribe to different websites and newsletters, go over different emails, alerts and such while filtering the relevant data out of these sources.

In this project, we aspire to create a platform for students, researchers, professionals and enthusiasts to discover news on relevant topics. The users are encouraged to constantly give a feedback on the suggestions, in order to adapt and personalize future results.

The goal is to create an automated system that scans the web, through a list of trusted sources, classify and categorize the documents it finds, and match them to the different users, according to their interest. It then presents it as a timely summarized digest to the user, whether by email or within a site.

This repository is the web miner. It encourage you to learn about software architecture, mining the web, setting up web-spiders, scheduling CRON Jobs, creating pipelines, etc.

If you wish to assist in different aspects (Data Engineering / Web development / DevOps), we have divided the project to several additional repositories focusing on these topics:

The machine-learning engine can be found in our Main repository
Web Development & UI/UX experiments can be found in our App repository
Data Engineering tasks are more than welcomed in our Data Engineering repository
Devops tasks are all across the project. This project is developed mostly in a serverless architecture. Using Docker and Kubernetes enables freedom in deploying it on different hosting providers and plans.

Feel free to join the discussion and provide your input!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Keep-Current (Express)

Express version

Getting started

Installation using pipenv (which combines virtualenv with pip)

Installing using Anaconda

Installing using virtualenv and pip

Who are we?

How to Contribute

Keep-Current Project

Files

README.md

Latest commit

History

README.md

File metadata and controls

Keep-Current (Express)

Express version

Getting started

Installation using pipenv (which combines virtualenv with pip)

Installing using Anaconda

Installing using virtualenv and pip

Who are we?

How to Contribute

Keep-Current Project