Skip to content
Change the repository type filter

All

    Repositories list

    • moodwarc

      Public
      Analysis of how content on the web affects mood over time
      GNU General Public License v3.0
      0000Updated Jul 30, 2024Jul 30, 2024
    • linked data collection tool for web archive graph visualization (LinkGate)
      Python
      GNU General Public License v3.0
      2061Updated Feb 22, 2024Feb 22, 2024
    • Machine learning-based content classification for web archives
      GNU General Public License v3.0
      1000Updated Oct 5, 2023Oct 5, 2023
    • warchtml

      Public
      Extract HTML data from WARC files
      GNU General Public License v3.0
      0000Updated Aug 25, 2023Aug 25, 2023
    • meshwarc

      Public
      Wxploring the semantic-based web archive graph using MeshWARC
      GNU General Public License v3.0
      0000Updated Aug 25, 2023Aug 25, 2023
    • Crawl the web for text (Rust implementation)
      GNU General Public License v3.0
      1000Updated Aug 25, 2023Aug 25, 2023
    • waget

      Public
      Incrementally fetch web archive data files and run actions
      Shell
      GNU General Public License v3.0
      1200Updated Apr 18, 2023Apr 18, 2023
    • link-serv

      Public
      versioned graph data service for web archive graph visualization (LinkGate)
      Java
      GNU General Public License v3.0
      20138Updated Dec 10, 2022Dec 10, 2022
    • Republishing IIPC collections through alternative interfaces for researcher access
      Shell
      GNU General Public License v3.0
      0000Updated Sep 8, 2022Sep 8, 2022
    • txtcrawl

      Public
      Crawl the web for text
      Python
      GNU General Public License v3.0
      1000Updated Aug 25, 2022Aug 25, 2022
    • warc-serv

      Public
      Serve records from WARC files just like you serve files from a web server's root directory
      Python
      GNU General Public License v3.0
      3000Updated Aug 25, 2022Aug 25, 2022
    • WASAPI data transfer APIs
      Python
      6000Updated Apr 23, 2022Apr 23, 2022
    • linkgate

      Public
      common material for IIPC Project LinkGate, including research use cases for web archive graph visualization
      Shell
      GNU General Public License v3.0
      1410Updated Jan 13, 2022Jan 13, 2022
    • link-viz

      Public
      temporal graph rendering and exploration web frontend for web archive visualization (LinkGate)
      JavaScript
      GNU General Public License v3.0
      10190Updated Sep 29, 2021Sep 29, 2021
    • llx

      Public
      Parallel execution of processes based on a command template and input fields
      Perl
      GNU General Public License v3.0
      0000Updated Sep 23, 2021Sep 23, 2021
    • Recipe for crawling the web, by the Bibliotheca Alexandrina
      Shell
      GNU General Public License v3.0
      1000Updated Dec 2, 2020Dec 2, 2020
    • pywb

      Public
      Core Python Web Archiving Toolkit for replay and recording of web archives
      Python
      GNU General Public License v3.0
      217000Updated Oct 12, 2020Oct 12, 2020
    • Example scripts for identifying content that falls within scope of a web crawl using machine learning
      Python
      1100Updated May 14, 2019May 14, 2019
    • Crawl Log Animator
      0000Updated Apr 3, 2019Apr 3, 2019
    • warcrefs

      Public
      Web archive deduplication tools
      Java
      16110Updated Oct 18, 2018Oct 18, 2018
    • racktk

      Public
      Command-line tools for computer clusters
      Perl
      GNU General Public License v3.0
      1100Updated Aug 17, 2017Aug 17, 2017
    • gzmulti

      Public
      Manipulate multi-member GZIP files
      C
      GNU General Public License v3.0
      0000Updated May 16, 2017May 16, 2017
    • Purge old kernel packages on a Debian system
      Shell
      GNU General Public License v3.0
      0100Updated Mar 12, 2017Mar 12, 2017
    • warcsum

      Public
      Web archive checksum
      C
      GNU General Public License v3.0
      0400Updated May 12, 2016May 12, 2016