Skip to content

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

License

Notifications You must be signed in to change notification settings

ZaneH/ocw-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT OpenCourseWare Crawler

Crawl Output

Last updated: November 27, 2023

Description

This is a simple crawler to save the available courses on MIT OpenCourseWare. This crawler will export the courses with video lectures as a CSV file.

You can crawl for courses other than video lectures by changing the @start_urls in crawler.rb.

Docker Run (Recommended)

This is the simplest way to run the crawler. It will run the crawler and save the results in results.csv using a Docker volume.

$ docker build -t ocw-crawl:1.0 .
$ docker run --volume $(pwd)/results.csv:/app/results.csv \
             --rm \
             --name ocw-crawl \
             ocw-crawl:1.0

Manually Run

To run the crawler without Docker, you'll need to install an older version of Ruby that's compatible with kimurai. You'll also need geckodriver and Firefox. Read more about setting up kimurai here if you run into trouble.

Setup

Install Ruby 2.5.0 and run bundle install.

$ asdf install ruby 2.5.0
$ asdf global ruby 2.5.0
$ gem install bundler
$ bundle install # install dependencies

Run

$ ruby crawler.rb
...

Possible Improvements

  • Use OCW Sitemaps to crawl all courses
  • Get more information about each course from the sitemap
    • Course materials often follow these patterns:
      • Syllabus: /pages/syllabus/
      • Course download: /download/
      • Resources: /resources/*/
        • PDFs, slides, lectures notes, etc.
      • Course pages: /pages/*/
        • Readings: /pages/readings/
  • Turn the data into an app or API

About

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

Topics

Resources

License

Stars

Watchers

Forks