This application uses the New York Times Newswire API to collect the latest news items.
A cron job is set up using the ‘whenever’ gem, it runs:
rake newswire:collect
every 20 minutes.
The captured news items are represented in the following Rails models:
One potential use of pre-categorized texts is as training data for building models for machine learning/text mining tools like LIBSVM (the most popular C++ implementation of Support Vector Machines).
The Ruby interface to the LIBSVM C++ library is available at:
The power of Convention over Configuration is demonstrated here, there are less than 80 lines of Ruby code in lib/task/newswire.rake including the setup code, comments, and blank lines. To achieve the same in Java would require code that is both mind-boggling and ugly.
apply for an API key from the NYTimes Developer Network site at:
get the code from github:
git clone git://
install the required gem(s):
sudo gem install json
modify database.yml for your db environment
create the database and tables:
rake db:create && rake db:migrate
modify lib/task/newswire.rake to put in your NYTimes Newswire API key
install the ‘whenever’ gem:
gem sources -a #only if you haven't run this before sudo gem install javan-whenever
run whenever in the rails app directory:
the newswire:collect task will run every 20 minutes to grab the latest NYTimes news, you can change the time interval in config/schedule.rb:
every 20.minutes do rake "newswire:collect", :environment => "development" end
and you can optionally run:
rake newswire:test
to test without the API key
and for integration test, run:
rake RAILS_ENV=test db:create && rake RAILS_ENV=test db:migrate cucumber features -n
now relax and watch your tables get filled up with New York Times news items:
tail -100f log/collect.log
to clear up the collected items, run:
rake newswire:empty_all_tables
Tom Zeng
tom.z.zeng at gmail dot com