From 175391967b353010d11f67e661f7a5209662d6cc Mon Sep 17 00:00:00 2001
From: Helen Bailey <hbailey@mit.edu>
Date: Wed, 9 Feb 2022 16:49:05 -0500
Subject: [PATCH 1/2] Add data loading documentation to README

Why these changes are being introduced:
To provide documentation for developers on how and when various types
of data get loaded into the system.

How this addresses that need:
* Adds a Data Loading section to the README with documentation on seed
  data and registrar data loads, and a placeholder for test data loads

Side effects of this change:
None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/ETD-217
---
 README.md | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/README.md b/README.md
index 7eb76325..ff0492db 100644
--- a/README.md
+++ b/README.md
@@ -211,6 +211,32 @@ Example usage:
   lines so you'll need to be careful to reconstruct this
 - `rails debug:saml['tmp/your_saml_to_debug.txt']`
 
+## Data Loading
+
+There are three types of data that get loaded into this system:
+
+### Database Seeds
+
+These should only be loaded when the application database is initially set up (e.g. for new PR/development deploys or if the staging database needs to be destroyed and recreated). These seeds contain default values for certain tables such as copyrights, licenses, hold sources, and degree types.
+
+The above seed data is loaded automatically during PR builds from Github. During local development it can be loaded during first deployment by running `rake db:seed`.
+
+Additionally, degrees and departments can be manually seeded from a CSV file if desired by running `rake db:seed_degrees <csv_file_url>` and `rake db:seed_departments <csv_file_url>`, respectively. See Jira project documentation for link to a Google doc with the initial list of departments and degrees that were loaded into the production database (not maintained).
+
+Seed data is not maintained to match the production database values, which can be changed by admin users as needed. The production database *shouldn't* ever need to be reseeded.
+
+### Test Data
+
+We're working on a process to load test data in an automated fashion. Check back soon!
+
+### Registrar Data
+
+Thesis and author data for each term is loaded from a CSV file downloaded from the Registrar. This process is handled manually in the UI by the thesis processing team, and they have their own documentation on how they obtain the right data to load.
+
+Loading registrar data may also add new degrees and departments, which are then manually updated and maintained by stakeholders.
+
+Note: if registrar data needs to be loaded in a local, PR, or staging deployment it should be anonymized first to ensure no protected user data is added to a non-secure database. See Jira project documentation for a link to a Google doc with an initial set of anonymized registrar data, or ask stakeholders for such if needed.
+
 ## Publishing workflow
 
 - stakeholders process theses until they are valid and accurate

From 0d47cc15f8fdfcfa3443bbae7075f1e8d032e2a3 Mon Sep 17 00:00:00 2001
From: Helen Bailey <hbailey@mit.edu>
Date: Thu, 10 Feb 2022 09:47:29 -0500
Subject: [PATCH 2/2] Revise per PR comments

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index ff0492db..c754addf 100644
--- a/README.md
+++ b/README.md
@@ -219,15 +219,15 @@ There are three types of data that get loaded into this system:
 
 These should only be loaded when the application database is initially set up (e.g. for new PR/development deploys or if the staging database needs to be destroyed and recreated). These seeds contain default values for certain tables such as copyrights, licenses, hold sources, and degree types.
 
-The above seed data is loaded automatically during PR builds from Github. During local development it can be loaded during first deployment by running `rake db:seed`.
+The above seed data is loaded automatically during PR builds from Github. During local development it can be loaded during first deployment by running `rails db:seed`.
 
-Additionally, degrees and departments can be manually seeded from a CSV file if desired by running `rake db:seed_degrees <csv_file_url>` and `rake db:seed_departments <csv_file_url>`, respectively. See Jira project documentation for link to a Google doc with the initial list of departments and degrees that were loaded into the production database (not maintained).
+Additionally, degrees and departments can be manually seeded from a CSV file if desired by running `rails db:seed_degrees <csv_file_url>` and `rails db:seed_departments <csv_file_url>`, respectively. See Jira project documentation for link to a Google doc with the initial list of departments and degrees that were loaded into the production database (not maintained).
 
 Seed data is not maintained to match the production database values, which can be changed by admin users as needed. The production database *shouldn't* ever need to be reseeded.
 
-### Test Data
+### QA/Stakeholder Testing Data
 
-We're working on a process to load test data in an automated fashion. Check back soon!
+We're working on a process to load test data for stakeholder testing/QA in an automated fashion. Note this is different from fixture data used for automated tests. Check back soon for more info!
 
 ### Registrar Data
 
@@ -235,7 +235,7 @@ Thesis and author data for each term is loaded from a CSV file downloaded from t
 
 Loading registrar data may also add new degrees and departments, which are then manually updated and maintained by stakeholders.
 
-Note: if registrar data needs to be loaded in a local, PR, or staging deployment it should be anonymized first to ensure no protected user data is added to a non-secure database. See Jira project documentation for a link to a Google doc with an initial set of anonymized registrar data, or ask stakeholders for such if needed.
+Note: if registrar data needs to be loaded in a local, PR, or staging deployment it should be anonymized first to ensure no protected user data is added to a non-secure database. The test fixtures (test/fixtures/files) include both full and small sample files containing anonymized registrar data that can be used for this purpose.
 
 ## Publishing workflow