Skip to content
Change the repository type filter

All

    Repositories list

    • brozzler

      Public
      brozzler - distributed browser-based web crawler
      Python
      Apache License 2.0
      986843217Updated Feb 14, 2025Feb 14, 2025
    • The Internet Archive BookReader
      JavaScript
      GNU Affero General Public License v3.0
      4281k13694Updated Feb 14, 2025Feb 14, 2025
    • Efficient hOCR tooling
      Python
      Other
      94221Updated Feb 14, 2025Feb 14, 2025
    • Zeno

      Public
      State-of-the-art web crawler 🔱
      HTML
      GNU Affero General Public License v3.0
      17109214Updated Feb 14, 2025Feb 14, 2025
    • heritrix3

      Public
      Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
      Java
      Other
      7622.9k344Updated Feb 14, 2025Feb 14, 2025
    • One webpage for every book ever published!
      Python
      GNU Affero General Public License v3.0
      1.4k5.4k796145Updated Feb 14, 2025Feb 14, 2025
    • TypeScript
      GNU Affero General Public License v3.0
      16215Updated Feb 13, 2025Feb 13, 2025
    • PHP
      GNU Affero General Public License v3.0
      3413202Updated Feb 13, 2025Feb 13, 2025
    • iare

      Public
      An interactive IARI JSON viewer
      JavaScript
      GNU Affero General Public License v3.0
      55323Updated Feb 13, 2025Feb 13, 2025
    • displays notifications and automatically clears them
      TypeScript
      GNU Affero General Public License v3.0
      00112Updated Feb 12, 2025Feb 12, 2025
    • HTML
      2510Updated Feb 10, 2025Feb 10, 2025
    • A repository of cleanup bots implementing the openlibrary-client
      Python
      Other
      5165279Updated Feb 10, 2025Feb 10, 2025
    • TypeScript
      GNU Affero General Public License v3.0
      00112Updated Feb 10, 2025Feb 10, 2025
    • warcprox

      Public
      WARC writing MITM HTTP/S proxy
      Python
      54395206Updated Feb 10, 2025Feb 10, 2025
    • iiif

      Public
      The official Internet Archive IIIF service
      JavaScript
      GNU General Public License v3.0
      522141Updated Feb 6, 2025Feb 6, 2025
    • React components to render differences between captures at the Wayback Machine
      JavaScript
      GNU General Public License v3.0
      83210Updated Feb 6, 2025Feb 6, 2025
    • archive.org software emulation
      JavaScript
      GNU Affero General Public License v3.0
      0300Updated Feb 5, 2025Feb 5, 2025
    • Data models and scripts to build a database of references (broadly defined) appearing on Wikipedia and other wikis
      Python
      GNU General Public License v3.0
      0330Updated Feb 3, 2025Feb 3, 2025
    • nomad

      Public
      Shell
      0000Updated Feb 3, 2025Feb 3, 2025
    • An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
      Scala
      MIT License
      19700Updated Feb 3, 2025Feb 3, 2025
    • Sparkling

      Public
      Internet Archive's Sparkling Data Processing Library
      Scala
      MIT License
      21110Updated Feb 3, 2025Feb 3, 2025
    • cicd

      Public
      build & test using github registry; deploy to nomad clusters
      GNU Affero General Public License v3.0
      01400Updated Feb 1, 2025Feb 1, 2025
    • iaux

      Public
      Monorepo for Archive.org UX development and prototyping.
      JavaScript
      GNU Affero General Public License v3.0
      876989146Updated Jan 31, 2025Jan 31, 2025
    • Python Client Library for the Archive.org OpenLibrary API
      Python
      GNU Affero General Public License v3.0
      91397295Updated Jan 30, 2025Jan 30, 2025
    • gocrawlhq

      Public
      Go client for Crawl HQ v3
      Go
      GNU Affero General Public License v3.0
      0000Updated Jan 30, 2025Jan 30, 2025
    • caddy-php

      Public
      a simple Caddy static file server with added PHP backend demo
      Dockerfile
      GNU Affero General Public License v3.0
      0000Updated Jan 30, 2025Jan 30, 2025
    • rclone

      Public
      [vault fork] of "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
      Go
      MIT License
      4.3k200Updated Jan 28, 2025Jan 28, 2025
    • A web browser extension for Chrome, Firefox, Edge, and Safari 14.
      JavaScript
      GNU Affero General Public License v3.0
      209681686Updated Jan 28, 2025Jan 28, 2025
    • Voice Apps (Actions on Google, Alexa Skill) of Internet Archive. Just say: "Ok Google, Ask Internet Archive to Play Jazz" or "Alexa, Ask Internet Internet Archive to play Instrumental Music"
      JavaScript
      42499515Updated Jan 28, 2025Jan 28, 2025
    • Python
      142522Updated Jan 28, 2025Jan 28, 2025