Tanagra

Tanagra is a project to build a configurable cohort builder and data explorer. Our goal is to make it easy to set up a new dataset for exploring with little or no custom code required, so everything we've built is configuration-driven.

Project overview

The project has three main pieces: indexer, service, UI. All three pieces are highly interconnected and are not intended to be used or deployed separately. Everything lives in this single GitHub repository.

The indexer takes the source dataset and produces a logical copy that's better suited to the types of queries the UI needs to run. It denormalizes some data, precomputes some things, and reorganizes tables. The goal is not to meet some query benchmark, only to have the UI not time out.

The service processes queries for the UI and manages the application database, which stores user-managed artifacts like cohorts and data feature sets.

The UI includes the cohort builder, data feature set builder, export, and cohort review interfaces.

Configure a new dataset

Tanagra supports data patterns, rather than specific SQL schemas. Check the list of currently supported patterns to see how they map to your dataset.

Tanagra defines a custom object model on top of the underlying relational data. The dataset configuration language is based on this object model, so it's helpful to be familiar with the main concepts.

A dataset configuration is spread across multiple files, to improve readability and allow easier sharing across datasets. See an overview of the different files and directory structure, as well as pointers to example files. Check the full dataset configuration schema documentation to lookup specific properties. Documentation for protocol buffers used for visualizations and criteria plugins is here.

Set up a new deployment

Choose a deployment pattern and configure the GCP project(s).

Once you've defined the configuration files for a dataset, run the indexer. Check the full indexer CLI documentation to lookup specific commands.

Tanagra does not provide an API for managing access control for a population of users. Instead, we provide an interface for calling an external access control service. (e.g. The VUMC admin service serves as the external access control service for the SD deployment.) Either reuse an existing access control implementation, or add your own.

We expect deployments to require varied methods of exporting data. Either reuse an existing export implementation, or add your own.

Check the full application configuration documentation to lookup specific deployment properties.

Once your deployment is up and running, create a regression test suite to detect unexpected changes due to config or underlying data changes and run it re

Name		Name	Last commit message	Last commit date
Latest commit History 1,640 Commits
.github		.github
.run		.run
annotationProcessor		annotationProcessor
buildSrc		buildSrc
cli		cli
client		client
docs		docs
gradle		gradle
indexer		indexer
scripts		scripts
service		service
ui		ui
underlay		underlay
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
publish.sh		publish.sh
pull-credentials.sh		pull-credentials.sh
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tanagra

Project overview

Configure a new dataset

Set up a new deployment

License

DataBiosphere/tanagra

Folders and files

Latest commit

History

Repository files navigation

Tanagra

Project overview

Configure a new dataset

Set up a new deployment