Skip to content
Patrick Hochstenbach edited this page Nov 15, 2018 · 67 revisions

Here we provide a list of related projects that also provide ETL/data processing tools.

selected formats

general frameworks

  • Akara - Akara is a platform for developing data services available on the Web, using REST architecture. Akara is open source software written in Python and C
  • App::RecordStream - App::RecordStream - recs - A system for command-line analysis of data.
  • ATTX - Putting Linked Data to Work (University of Helsinki)
  • bibcat - Engineering toolkit for building semantic web and bibliographic applications
  • Conduit - Haskell framework for dealing with streaming data
  • COMSODE - The project COMSODE is an SME-driven RTD project aimed at progressing the capabilities in the field of Open Data re-use.
  • DNet
  • d:swarm - data management platform for enrichment, normalization and linkage of knowledge data structures.
  • ETL::Yertl - ETL with a Shell
  • Fink - Apache Flink® - Stateful Computations over Data Streams
  • Heiðrún - Heiðrún is the DPLA metadata ingestion and QA system, and is an implementation of the Kri-kri Rails engine.
  • JAQL - Query Language for JavaScript(r) Object Notation (JSON)
  • KNIME - Open source Analytics Platform
  • Krikri - DPLA Ruby on Rails engine for metadata aggregation, enhancement and quality control.
  • Luwak - A Lucene extention to search data streams. See also this blog entry.
  • Metadata Interoperability Framework (MIF) - http://elag2014.org/programme/elag-workshops-list-page/11-5/ PPT
  • MINT - Metadata Interoperability Services
  • Meresco - Under the Meresco name Dutch public institutions share quality software components related to metadata management and search.
  • Metacrunch
  • metafacture - used in culturegraph
  • Metadata Services Toolkit - part of the eXtensible Catalog (XC)
  • Metadata & Object Repository (MoRe)
  • MUPD8 - Data stream processing from Wallmartlabs.
  • OpenRefine - (formerly Google Refine) a toolkit to work with tabular data.
  • Petl - Python ETL library
  • Pig - Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
  • Ratchet - A library for performing data pipeline / ETL tasks in Go.
  • REPOX - Data Aggregation and Interoperability Manager
  • Samza - Apache Samza is a distributed stream processing framework.
  • Silk - The Silk framework provides a declarative language for specifying which types of RDF links should be discovered between data sources as well as which conditions data items must fulfil in order to be interlinked.
  • Spark Streaming - Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.
  • Storm - Apache Storm is a distributed stream processing framework.
  • Strukt - The most interactive way to work with all kinds of tabular data
  • Supplejack - Supplejack was designed to provide assurance to the quality of data management activities when working at scale.
  • TeePee - Command line tool to extract data from structures
Clone this wiki locally