Last updated: November 27, 2023
- OCW Video Lectures: results.csv
This is a simple crawler to save the available courses on MIT OpenCourseWare. This crawler will export the courses with video lectures as a CSV file.
You can crawl for courses other than video lectures by changing the @start_urls
in crawler.rb
.
This is the simplest way to run the crawler. It will run the crawler and save the results in results.csv
using a Docker volume.
$ docker build -t ocw-crawl:1.0 .
$ docker run --volume $(pwd)/results.csv:/app/results.csv \
--rm \
--name ocw-crawl \
ocw-crawl:1.0
To run the crawler without Docker, you'll need to install an older version of Ruby that's compatible with kimurai
. You'll also need geckodriver
and Firefox. Read more about setting up kimurai
here if you run into trouble.
Install Ruby 2.5.0 and run bundle install
.
$ asdf install ruby 2.5.0
$ asdf global ruby 2.5.0
$ gem install bundler
$ bundle install # install dependencies
$ ruby crawler.rb
...
- Use OCW Sitemaps to crawl all courses
- Get more information about each course from the sitemap
- Course materials often follow these patterns:
- Syllabus:
/pages/syllabus/
- Course download:
/download/
- Resources:
/resources/*/
- PDFs, slides, lectures notes, etc.
- Course pages:
/pages/*/
- Readings:
/pages/readings/
- Readings:
- Syllabus:
- Course materials often follow these patterns:
- Turn the data into an app or API