iris2bq - A utility to move InterSystems IRIS Data to Google Cloud Platform's Big Query

What

Let's say IRIS is contributing to workload for a Hospital system, marshalling DICOM, HL7, FHIR, or CCDA. Natively, IRIS persists these objects in various stages of the pipeline via the nature of the business processes and anything you included along the way.

Lets send that up to Google Big Query to augment and compliment the rest of our Data Warehouse data and ETL (Extract Transform Load) or ELT (Extract Load Transform) to our hearts desire.

Why

Were capturing a wealth of transactional Healthcare data in our IRIS for Health clusters.
Our Data Warehouse is in Google Cloud Platform Big Query (GCP).
Our Data Science team asked us to put in a process to automatically send transactional data to the Data Warehouse.

Who

Technical Actors

How

This iris2bq utility has got your back.

Exactly how again?

It exports the data from IRIS into DataFrames
It saves them into GCS as .avro to keep the schema along the data: this will avoid to specify/create the BigQuery table schema beforehands.
It starts BigQuery jobs to import those .avro into the respective BigQuery tables you specify.

Note: internally, it is using the Spark framework for the sake of simplicity, but no Hadoop cluster is needed. It is configured as a "local" cluster by default, meaning the application and is running standalone.

How Its Implemented

IRIS for Health is implemented as a Docker Container and deployed locally running Docker.
Google Cloud Platform needs a project, with a service account and the account able to use Big Query and Google Cloud Storage.
iris2bq synching those two places above ^^.

Let's Go

How to run it

☕ Ensure you have a Java SDK on your system OpenJDK1.8 worked for us.
⬇️ Download the latest release iris2bq
☁️ Create a Service Account to Google Cloud which has access to GCS and BigQuery, and create a json key
📄 Create a configuration file configuration.conf for iris2bq to know where to grab and put the data:

jdbc {
  url = "jdbc:IRIS://127.0.0.1:51773/USER"
  user = "_SYSTEM"
  password = "flounder"
  tables = [ "people" ]
}

gcloud {
  project = "iris2bq-demo"
  service-account-key-path = "service.key.json"
  bq.dataset = "iris2bqdemods"
  gcs.tmp-bucket = "iris2bqdemobucket"
}

Up there in the jdbc connection block is where the magic happens for us. Just specify a list of applicable tables in IRIS for the Namespace you are connecting to (USER) and watch them appear in Big Query in our Dataset.

Run the application specifying the config file:

GOOGLE_CLOUD_PROJECT=iris2bq-demo GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json ./bin/iris2bq -Dconfig.file=configuration.conf

Done!

Add this to a scheduler (Airflow, crontab, etc) every 10min and enjoy your JOINs in BigQuery of your IRIS Data.

Give this a shot yourself

If you want to give this a shot on your own, here is a quick way to get up and running with InterSystems IRIS for Health. You will need Docker installed on your system.

Show IRIS Deployment Instructions

Build Container:
```
$ cd demo
$ bash iris-docker.sh
```
Create a GCP project with Big Query and Google Cloud Storage API's enabled:
```
gcloud projects create iris2bq-demo--enable-cloud-apis
```

-or- try your hand as a Terraformer

$ cd demo
$ terraform init
$ terraform plan
$ terraform apply

Development

Got a better idea? See a horrific bug? Grab sbt for your system.

Show iris2bq Development Instructions

Build:

$ git clone https://github.com/basenube/iris2bq.git
$ cd iris2bq
// Develop, Develop, Develop
$ sbt stage

Thank You

Thanks to our valued partners InterSystems and Google Cloud Platform for the feedback and support.

Credits

We dont pretend to be very good at Scala. Most of the original guts of this thing come out of the great work of these hackers at Powerspace. We have used a similar version of this they developed for Postgres to lift transactional data to a datawarehouse for a customer without issue for over a year now.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
demo		demo
docs		docs
lib		lib
project		project
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iris2bq - A utility to move InterSystems IRIS Data to Google Cloud Platform's Big Query

What

Why

Who

Technical Actors

How

How Its Implemented

Let's Go

How to run it

Give this a shot yourself

Development

Thank You

Credits

About

Releases 1

Packages

Languages

License

intreqd/iris2bq

Folders and files

Latest commit

History

Repository files navigation

iris2bq - A utility to move InterSystems IRIS Data to Google Cloud Platform's Big Query

What

Why

Who

Technical Actors

How

How Its Implemented

Let's Go

How to run it

Give this a shot yourself

Development

Thank You

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages