Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[Issue #106] Transform agency data #157

Merged
merged 63 commits into from
Sep 16, 2024
Merged

Conversation

chouinar
Copy link
Collaborator

Summary

Fixes #106

Time to review: 10 mins

Changes proposed

Add transformations for agency data

Context for reviewers

Agency data is structured oddly in the existing system, instead of being in ordinary tables, its in a tgroups table that has values stored as key-value pairs. We want to normalize that into something more workable, so the transformation needs to work a bit differently than the transformations of other tables.

For simplicity, I load all of the data for every agency (and later filter to just what changed) as this removes a lot of weird edge cases that we would have otherwise needed to consider. Only modified rows actually get used, but we know we have the full set of data now.

Additional information

I have a snapshot of the prod tgroups table and loaded it into my DB locally and ran the transform script. In total, it takes ~2 seconds to run and didn't hit any issues.

A set of the relevant metrics:

total_records_processed=1152
total_records_deleted=0
total_records_inserted=1152
total_records_updated=0
total_error_count=0
agency.total_records_processed=1152
agency.total_records_inserted=1152
TransformAgency_subtask_duration_sec=2.14
task_duration_sec=2.14

As a sanity test, I also loaded in the tgroups data from dev and tried running it through. While it generally worked, there were 12 agencies that failed because they were missing the ldapGp and AgencyContactCity fields. I'm not certain if we want to do anything about that as they all seemed to be test agencies based on the names.

acouch
acouch previously approved these changes Jul 29, 2024
Base automatically changed from chouinar/125-agency-table to main September 13, 2024 16:59
@chouinar chouinar dismissed acouch’s stale review September 13, 2024 16:59

The base branch was changed.

@chouinar chouinar merged commit b5ff8c8 into main Sep 16, 2024
8 checks passed
@chouinar chouinar deleted the chouinar/106-transform-agency branch September 16, 2024 14:13
acouch pushed a commit that referenced this pull request Sep 18, 2024
Fixes HHS#2051

Add transformations for agency data

Agency data is structured oddly in the existing system, instead of being
in ordinary tables, its in a `tgroups` table that has values stored as
key-value pairs. We want to normalize that into something more workable,
so the transformation needs to work a bit differently than the
transformations of other tables.

For simplicity, I load all of the data for every agency (and later
filter to just what changed) as this removes a lot of weird edge cases
that we would have otherwise needed to consider. Only modified rows
actually get used, but we know we have the full set of data now.

I have a snapshot of the prod tgroups table and loaded it into my DB
locally and ran the transform script. In total, it takes ~2 seconds to
run and didn't hit any issues.

A set of the relevant metrics:
```
total_records_processed=1152
total_records_deleted=0
total_records_inserted=1152
total_records_updated=0
total_error_count=0
agency.total_records_processed=1152
agency.total_records_inserted=1152
TransformAgency_subtask_duration_sec=2.14
task_duration_sec=2.14
```

As a sanity test, I also loaded in the tgroups data from dev and tried
running it through. While it generally worked, there were 12 agencies
that failed because they were missing the ldapGp and AgencyContactCity
fields. I'm not certain if we want to do anything about that as they all
seemed to be test agencies based on the names.

---------

Co-authored-by: nava-platform-bot <platform-admins@navapbc.com>
acouch pushed a commit that referenced this pull request Sep 18, 2024
Fixes HHS#2051

Add transformations for agency data

Agency data is structured oddly in the existing system, instead of being
in ordinary tables, its in a `tgroups` table that has values stored as
key-value pairs. We want to normalize that into something more workable,
so the transformation needs to work a bit differently than the
transformations of other tables.

For simplicity, I load all of the data for every agency (and later
filter to just what changed) as this removes a lot of weird edge cases
that we would have otherwise needed to consider. Only modified rows
actually get used, but we know we have the full set of data now.

I have a snapshot of the prod tgroups table and loaded it into my DB
locally and ran the transform script. In total, it takes ~2 seconds to
run and didn't hit any issues.

A set of the relevant metrics:
```
total_records_processed=1152
total_records_deleted=0
total_records_inserted=1152
total_records_updated=0
total_error_count=0
agency.total_records_processed=1152
agency.total_records_inserted=1152
TransformAgency_subtask_duration_sec=2.14
task_duration_sec=2.14
```

As a sanity test, I also loaded in the tgroups data from dev and tried
running it through. While it generally worked, there were 12 agencies
that failed because they were missing the ldapGp and AgencyContactCity
fields. I'm not certain if we want to do anything about that as they all
seemed to be test agencies based on the names.

---------

Co-authored-by: nava-platform-bot <platform-admins@navapbc.com>
acouch pushed a commit to HHS/simpler-grants-gov that referenced this pull request Sep 18, 2024
Fixes #2051

Add transformations for agency data

Agency data is structured oddly in the existing system, instead of being
in ordinary tables, its in a `tgroups` table that has values stored as
key-value pairs. We want to normalize that into something more workable,
so the transformation needs to work a bit differently than the
transformations of other tables.

For simplicity, I load all of the data for every agency (and later
filter to just what changed) as this removes a lot of weird edge cases
that we would have otherwise needed to consider. Only modified rows
actually get used, but we know we have the full set of data now.

I have a snapshot of the prod tgroups table and loaded it into my DB
locally and ran the transform script. In total, it takes ~2 seconds to
run and didn't hit any issues.

A set of the relevant metrics:
```
total_records_processed=1152
total_records_deleted=0
total_records_inserted=1152
total_records_updated=0
total_error_count=0
agency.total_records_processed=1152
agency.total_records_inserted=1152
TransformAgency_subtask_duration_sec=2.14
task_duration_sec=2.14
```

As a sanity test, I also loaded in the tgroups data from dev and tried
running it through. While it generally worked, there were 12 agencies
that failed because they were missing the ldapGp and AgencyContactCity
fields. I'm not certain if we want to do anything about that as they all
seemed to be test agencies based on the names.

---------

Co-authored-by: nava-platform-bot <platform-admins@navapbc.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Add transformation for agency data
3 participants