Skip to content
@bottomless-archive-project

Bottomless Archive Project

A project about archiving anything that's available digitally.

Pinned Loading

  1. library-of-alexandria library-of-alexandria Public

    Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

    Java 112 1

  2. url-collector url-collector Public

    An application that crawls the Common Crawl corpus for URLs with the specified file extensions.

    Java

  3. file-collector file-collector Public

    Java

  4. document-location-database document-location-database Public

  5. java-warc java-warc Public

    Forked from laxika/java-warc

    Read Web ARChive (WARC) files in Java.

    Java 5

  6. common-crawl-client common-crawl-client Public

    This library is a very lightweight client to Common Crawl's WARC files.

    Java

Repositories

Showing 7 of 7 repositories
  • library-of-alexandria Public

    Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

    bottomless-archive-project/library-of-alexandria’s past year of commit activity
    Java 112 MIT 1 14 2 Updated Jul 5, 2024
  • library-of-alexandria.github.io Public

    The official website of the Library of Alexandria project.

    bottomless-archive-project/library-of-alexandria.github.io’s past year of commit activity
    HTML 1 0 0 1 Updated May 16, 2024
  • bottomless-archive-project/file-collector’s past year of commit activity
    Java 0 MIT 0 4 0 Updated Nov 14, 2021
  • bottomless-archive-project/document-location-database’s past year of commit activity
    0 MIT 0 0 0 Updated Oct 18, 2021
  • url-collector Public

    An application that crawls the Common Crawl corpus for URLs with the specified file extensions.

    bottomless-archive-project/url-collector’s past year of commit activity
    Java 0 MIT 0 2 0 Updated Oct 15, 2021
  • java-warc Public Forked from laxika/java-warc

    Read Web ARChive (WARC) files in Java.

    bottomless-archive-project/java-warc’s past year of commit activity
    Java 5 Apache-2.0 6 0 0 Updated Sep 12, 2021
  • common-crawl-client Public

    This library is a very lightweight client to Common Crawl's WARC files.

    bottomless-archive-project/common-crawl-client’s past year of commit activity
    Java 0 0 0 0 Updated Jan 16, 2020

Top languages

Loading…

Most used topics

Loading…