Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Grants.gov XML zip archive downloads are failing #456

Closed
TylerHendrickson opened this issue Nov 2, 2023 · 1 comment
Closed

[Bug]: Grants.gov XML zip archive downloads are failing #456

TylerHendrickson opened this issue Nov 2, 2023 · 1 comment
Assignees
Labels
bug Something isn't working go Pull requests that update Go code terraform Pull requests that update Terraform code

Comments

@TylerHendrickson
Copy link
Member

TylerHendrickson commented Nov 2, 2023

Why is this issue important?

The ingestion pipeline sources most of its data from Grants.gov, which it accomplishes by downloading daily XML database snapshots served through a date-based URL pattern. If these files fail to download, new and modified grant data cannot be ingested from Grants.gov.

Current State

This error is tracked in Datadog.

The DownloadGrantsGovDB Lambda function is consistently failing with the following error:

Error downloading source archive: unexpected http response Content-Type header: text/html;charset=utf-8

This error is being returned by the validateDownloadResponse() function, which expects that download responses have a Content-Type: application/zip header. Since this issue began, responses instead contain a Content-Type: text/html header, which our application (correctly) rejects as invalid.

Upon investigation, it appears that the underlying issue is actually that the URL for XML extracts has moved. Although our application is configured to expect these files to be served from a URL like https://www.grants.gov/extract/GrantsDBExtractYYYYMMDDv2.zip, this appears to no longer be the case. For example, the following link responds with an HTML page (and a 200 status code, which seems incorrect) providing no content other than the standard Grants.gov header/footer: https://www.grants.gov/extract/GrantsDBExtract20231101v2.zip.

After a bit of digging, it appears that these downloads are now being served from a base URL, also with a slightly modified path structure (plural /extracts rather than /extract), the pattern for which is: https://prod-grants-gov-chatbot.s3.amazonaws.com/extracts/GrantsDBExtractYYYYMMDDv2.zip.

A simple test confirms that the 2023-11-01 extract can be successfully downloaded from this URL: https://prod-grants-gov-chatbot.s3.amazonaws.com/extracts/GrantsDBExtract20231101v2.zip.

Expected State

The pipeline's DownloadGrantsGovDB Lambda function is able to download XML database extract zip files successfully, according to the new URL pattern being used by Grants.gov as of today.

Implementation Plan

Technical mitigation:

  1. Update the $GRANTS_GOV_BASE_URL environment variable configured (in Terraform) for the DownloadGrantsGovDB Lambda function.
    • Old value: "https://www.grants.gov"
    • New value: "https://prod-grants-gov-chatbot.s3.amazonaws.com"
  2. Update the DownloadGrantsGovDB Lambda handler source code so that the URL path pattern is updated from the old base path to the new one. Note that the URL-formatting behavior is provided by the grantsURL() method of the ScheduledEvent struct .
    • Old: /extract/
    • New (plural): /extracts/
    • Note: Consider whether this should also be configurable as an environment variable, which could avoid needing to update Go code in the future.
  3. Update any necessary unit tests for the Lambda handler.

Separately, we should consider reaching out to contacts at Grants.gov who may have knowledge of this change and could advise us on the following:

  1. Whether this new URL pattern can be depended upon for the foreseeable future.
  2. Since this was a breaking change (presumably related to the recent Grants.gov redesign), whether there was any communication issued prior to the change, and/or whether resources exist that can give us a heads-up before any similar changes roll out in the future.
@TylerHendrickson TylerHendrickson added bug Something isn't working go Pull requests that update Go code terraform Pull requests that update Terraform code labels Nov 2, 2023
@SDBowen SDBowen self-assigned this Nov 2, 2023
@SDBowen SDBowen moved this from 🆕 New to 🚢 Completed in Grants Team Agile Planning Nov 2, 2023
@github-project-automation github-project-automation bot moved this from 🚢 Completed to ✅ Staging in Grants Team Agile Planning Dec 12, 2023
@ClaireValdivia
Copy link

@TylerHendrickson just wanted to confirm, was this issue was deployed to production?

@ClaireValdivia ClaireValdivia moved this from ✅ Staging to 🚢 Completed in Grants Team Agile Planning Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working go Pull requests that update Go code terraform Pull requests that update Terraform code
Projects
Archived in project
Development

No branches or pull requests

3 participants