Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data loading documentation to README #899

Merged
merged 2 commits into from
Feb 10, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,32 @@ Example usage:
lines so you'll need to be careful to reconstruct this
- `rails debug:saml['tmp/your_saml_to_debug.txt']`

## Data Loading

There are three types of data that get loaded into this system:

### Database Seeds

These should only be loaded when the application database is initially set up (e.g. for new PR/development deploys or if the staging database needs to be destroyed and recreated). These seeds contain default values for certain tables such as copyrights, licenses, hold sources, and degree types.

The above seed data is loaded automatically during PR builds from Github. During local development it can be loaded during first deployment by running `rails db:seed`.

Additionally, degrees and departments can be manually seeded from a CSV file if desired by running `rails db:seed_degrees <csv_file_url>` and `rails db:seed_departments <csv_file_url>`, respectively. See Jira project documentation for link to a Google doc with the initial list of departments and degrees that were loaded into the production database (not maintained).

Seed data is not maintained to match the production database values, which can be changed by admin users as needed. The production database *shouldn't* ever need to be reseeded.

### QA/Stakeholder Testing Data

We're working on a process to load test data for stakeholder testing/QA in an automated fashion. Note this is different from fixture data used for automated tests. Check back soon for more info!

### Registrar Data

Thesis and author data for each term is loaded from a CSV file downloaded from the Registrar. This process is handled manually in the UI by the thesis processing team, and they have their own documentation on how they obtain the right data to load.

Loading registrar data may also add new degrees and departments, which are then manually updated and maintained by stakeholders.

Note: if registrar data needs to be loaded in a local, PR, or staging deployment it should be anonymized first to ensure no protected user data is added to a non-secure database. The test fixtures (test/fixtures/files) include both full and small sample files containing anonymized registrar data that can be used for this purpose.

## Publishing workflow

- stakeholders process theses until they are valid and accurate
Expand Down