The Tuva Provider project combines and transforms messy public provider datasets into usable data. This project contains the transformations we use to create the clean datasets for users of the Tuva Project. We have made this project public to share our methodology and code.
You can easily load the cleaned provider data into your data warehouse by using the terminology seeds from The Tuva Project package.
- Snowflake
- You have dbt installed and configured (i.e. connected to your data warehouse). If you have not installed dbt, here are instructions for doing so.
- You have created a database for the output of this project to be written in your data warehouse.
- You have downloaded the source data and loaded it into your data warehouse.
Complete the following steps to configure the project to run in your environment.
- Clone this repo to your local machine or environment.
- Update the
dbt_project.yml
file:- Add the dbt profile connected to your data warehouse.
- Update the variable
provider_database
to use the new database you created for this project, default is "nppes"..
- Update the
models/_sources.yml
file:- Update the database where your source data has been loaded, default is "nppes".
- Update the schema where your source data has been loaded, default is "raw_data".
- If the source tables are named differently then you can add the table identifier property.
- Run
dbt build
.
The Tuva Project team maintaining this project only maintains the latest version of the project. We highly recommend you stay consistent with the latest version.
Have an opinion on the mappings? Notice any bugs when installing and running the project? If so, we highly encourage and welcome feedback! While we work on a formal process in Github, we can be easily reached on our Slack community.
Join our growing community of healthcare data practitioners on Slack!