Skip to content

BBMRI FP ETL facilitates the integration of a biobank into the BBMRI Federated Platform

License

Notifications You must be signed in to change notification settings

crs4/bbmri-fp-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BBMRI Federated Platform Converter

This project consists in a software framework to support the implementation of tools to read and convert data about biomedical samples into formats compatible with the BBMRI Federated Platform. In particular, the framework supports the conversion of data into FHIR Resources for the BBMRI Sample Locator and in OMOP for the BBMRI Finder.

The framework implements an internal data model compatible with MIABIS.

To implement conversion from a new source, a new concrete class extending AbstractSource should be implemented to create the MIABIS classes. The instances of these classes will be automatically converted and serialized by the framework for the needed Federated Platform solution.

The following diagram illustrates the flow ETL process implemented by the framework:

ETL PROCESS

An example of a class implementing a source from a mock dataset can be found in examples directory.

Dependencies

All other Python dependencies will be installed by Poetry. Run the following command in the project directory to complete the installation:

poetry install

Note: you may want to create and activate a Python virtual environment prior to installing the framework and its dependencies.

Usage

The package can convert data about aggregated objects (i.e., biobanks and collections) or about cases. The source can provide data about one type of entity or both: indeed in some cases, the Case data may refer to a collection/biobank whose data are taken from another source (e.g., the collections and biobanks are from the BBMRI Directory).

Depending on the type of data the source provides, some methods of the AbstractSource class may not be require implementations: if biobank data are provided, the method get_biobanks_data must be implemented; on the other hand, if sample data is provided, the method get_cases_data must be implemented.

To generate data from a source a Converter must be instantiated with a Source and a Destination class.

An example is:

source = ExampleSource()
output_dir = os.path.join(os.path.dirname(__file__), 'output')

if not os.path.isdir(output_dir):
    os.mkdir(output_dir)

destination = FHIRDest(JsonFile(output_dir))
c = Converter(source, destination, Converter.CASE)
c.run()

License

This project is licensed under the terms of the GNU Affero General Public License v3.0 (GNU AGPLv3). See the COPYING for details.

Acknowledgments

This work has been partially funded by the following sources:

About

BBMRI FP ETL facilitates the integration of a biobank into the BBMRI Federated Platform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages