Skip to content

tomz/nytimes_newswire_api_app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Rails App to collect New York Times news items

This application uses the New York Times Newswire API to collect the latest news items.

A cron job is set up using the ‘whenever’ gem, it runs:

rake newswire:collect

every 20 minutes.

The captured news items are represented in the following Rails models:

  • NewsItem

  • MediaItem

  • MediaMetadataItem

  • RelatedUrl

One potential use of pre-categorized texts is as training data for building models for machine learning/text mining tools like LIBSVM (the most popular C++ implementation of Support Vector Machines).

The Ruby interface to the LIBSVM C++ library is available at:

http://github.com/tomz/libsvm-ruby-swig/tree/master

The power of Convention over Configuration is demonstrated here, there are less than 80 lines of Ruby code in lib/task/newswire.rake including the setup code, comments, and blank lines. To achieve the same in Java would require code that is both mind-boggling and ugly.

Installation

  • apply for an API key from the NYTimes Developer Network site at:

    http://developer.nytimes.com/docs/times_newswire_api/
  • get the code from github:

    git clone git://github.com/tomz/nytimes_newswire_api_app.git
  • install the required gem(s):

    sudo gem install json
    
  • modify database.yml for your db environment

  • create the database and tables:

    rake db:create && rake db:migrate
  • modify lib/task/newswire.rake to put in your NYTimes Newswire API key

  • install the ‘whenever’ gem:

    gem sources -a http://gems.github.com  #only if you haven't run this before  
    sudo gem install javan-whenever
  • run whenever in the rails app directory:

    whenever
    
  • the newswire:collect task will run every 20 minutes to grab the latest NYTimes news, you can change the time interval in config/schedule.rb:

    every 20.minutes do
      rake "newswire:collect", :environment => "development"
    end
    
  • and you can optionally run:

    rake newswire:test
    

    to test without the API key

  • and for integration test, run:

    rake RAILS_ENV=test db:create && rake RAILS_ENV=test db:migrate
    cucumber features -n
  • now relax and watch your tables get filled up with New York Times news items:

    tail -100f log/collect.log
  • to clear up the collected items, run:

    rake newswire:empty_all_tables
    

Author

Tom Zeng

www.tomzconsulting.com

www.linkedin.com/in/tomzeng

tom.z.zeng at gmail dot com

About

News collection using New York Times Newswire API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published