Skip to content
This repository has been archived by the owner on Aug 10, 2020. It is now read-only.
/ scheduler Public archive

Go orchestrator process used to schedule website parsing

License

Notifications You must be signed in to change notification settings

trandoshan-io/scheduler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scheduler

Build Status Go Report Card Maintainability

Scheduler is a Go written program designed to orchestrate resource parsing

features

  • use scalable messaging protocol (nats)

how it work

  • The Scheduler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag doneSubject
  • When an URL is received the scheduler will apply list of crawling rules to determinate if resource is to be crawled
  • If resource should be crawled to scheduler will sent the url to nats with subject todoSubject for the crawlers

crawling rules

Here is the rules that determinate if crawling is to be done:

  • Url has not been already crawled