Skip to content
Change the repository type filter

Sources

    Repositories list

    • Internal library to allow querying multiple media platforms with a consistent API.
      Python
      31132Updated Feb 21, 2025Feb 21, 2025
    • devops tools
      Python
      Apache License 2.0
      1010Updated Feb 21, 2025Feb 21, 2025
    • Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.
      JavaScript
      Apache License 2.0
      1610611Updated Feb 20, 2025Feb 20, 2025
    • sc-buffet

      Public
      Sous-chef buffet - Self-service data access for sous-chef.
      Python
      1051Updated Jan 31, 2025Jan 31, 2025
    • UNDER CONSTRUCTION - A package containing a library of issue validators in a flexibly deployable wrapper.
      Jupyter Notebook
      2090Updated Jan 31, 2025Jan 31, 2025
    • es-tools

      Public
      Elasticsearch tools developed by the Media Cloud project
      Python
      Apache License 2.0
      1000Updated Jan 25, 2025Jan 25, 2025
    • The core pipeline used to ingest online news stories in the Media Cloud archive.
      Python
      Apache License 2.0
      64436Updated Dec 22, 2024Dec 22, 2024
    • How Media Cloud approaches extracting metadata from online news stories
      Python
      Apache License 2.0
      51260Updated Dec 22, 2024Dec 22, 2024
    • Public client for consuming content from the Media Cloud Online News Archive & Directory.
      Python
      Apache License 2.0
      297241Updated Dec 10, 2024Dec 10, 2024
    • Intelligently fetch lists of URLs from a large collection of RSS Feeds as part of the Media Cloud Directory.
      Python
      Apache License 2.0
      66111Updated Dec 5, 2024Dec 5, 2024
    • Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
      Python
      GNU Affero General Public License v3.0
      41100Updated Oct 25, 2024Oct 25, 2024
    • sous-chef

      Public
      Configurable Data Analytics Pipeline
      Python
      0190Updated Oct 21, 2024Oct 21, 2024
    • Find rss, atom, xml, and rdf feeds on webpages
      Python
      MIT License
      133041Updated Oct 10, 2024Oct 10, 2024
    • simple toolkit of tools for consuming sitemaps
      Python
      Apache License 2.0
      1420Updated Oct 9, 2024Oct 9, 2024
    • mc-manage

      Public
      Python
      0000Updated Oct 8, 2024Oct 8, 2024
    • Daily performance metrics for the mediacloud application
      Python
      0010Updated Sep 20, 2024Sep 20, 2024
    • A client library to access the Wayback Machine news archive search.
      Python
      Apache License 2.0
      2410Updated Dec 15, 2023Dec 15, 2023
    • A set of jupyter notebooks demonstrating how to use the Media Cloud API.
      Jupyter Notebook
      143600Updated Dec 13, 2023Dec 13, 2023
    • Dokku app that serves a static HTML catch-all page, displayed for bad domains
      HTML
      0000Updated Oct 25, 2023Oct 25, 2023
    • A simple homepage for the CLIFF project
      HTML
      MIT License
      1100Updated May 30, 2023May 30, 2023
    • A Python client for the CLIFF geoparsing tool
      Python
      MIT License
      5501Updated May 21, 2024May 21, 2024
    • Tag news stories based on models trained on the NYT corpus.
      Python
      Apache License 2.0
      134216Updated Mar 1, 2023Mar 1, 2023
    • Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
      Python
      Other
      2923822Updated Nov 7, 2022Nov 7, 2022
    • .github

      Public
      Default community health files
      0000Updated Dec 3, 2020Dec 3, 2020
    • Builds and releases CLAVIN GeoNames.org index as a binary
      1100Updated Nov 25, 2020Nov 25, 2020
    • A library to extract a publication date from a web page, along with a measure of the accuracy.
      Python
      MIT License
      74100Updated Aug 13, 2019Aug 13, 2019
    • Hausa language stemmer (Bimba et al., 2015)
      Python
      Other
      0100Updated Sep 7, 2017Sep 7, 2017
    0 suggestions