You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ingestion pipeline sources most of its data from Grants.gov, which it accomplishes by downloading daily XML database snapshots served through a date-based URL pattern. If these files fail to download, new and modified grant data cannot be ingested from Grants.gov.
This error is being returned by the validateDownloadResponse() function, which expects that download responses have a Content-Type: application/zip header. Since this issue began, responses instead contain a Content-Type: text/html header, which our application (correctly) rejects as invalid.
Upon investigation, it appears that the underlying issue is actually that the URL for XML extracts has moved. Although our application is configured to expect these files to be served from a URL like https://www.grants.gov/extract/GrantsDBExtractYYYYMMDDv2.zip, this appears to no longer be the case. For example, the following link responds with an HTML page (and a 200 status code, which seems incorrect) providing no content other than the standard Grants.gov header/footer: https://www.grants.gov/extract/GrantsDBExtract20231101v2.zip.
After a bit of digging, it appears that these downloads are now being served from a base URL, also with a slightly modified path structure (plural /extracts rather than /extract), the pattern for which is: https://prod-grants-gov-chatbot.s3.amazonaws.com/extracts/GrantsDBExtractYYYYMMDDv2.zip.
The pipeline's DownloadGrantsGovDB Lambda function is able to download XML database extract zip files successfully, according to the new URL pattern being used by Grants.gov as of today.
Implementation Plan
Technical mitigation:
Update the $GRANTS_GOV_BASE_URL environment variable configured (in Terraform) for the DownloadGrantsGovDB Lambda function.
Old value: "https://www.grants.gov"
New value: "https://prod-grants-gov-chatbot.s3.amazonaws.com"
Update the DownloadGrantsGovDB Lambda handler source code so that the URL path pattern is updated from the old base path to the new one. Note that the URL-formatting behavior is provided by the grantsURL() method of the ScheduledEvent struct .
Old: /extract/
New (plural): /extracts/
Note: Consider whether this should also be configurable as an environment variable, which could avoid needing to update Go code in the future.
Update any necessary unit tests for the Lambda handler.
Separately, we should consider reaching out to contacts at Grants.gov who may have knowledge of this change and could advise us on the following:
Whether this new URL pattern can be depended upon for the foreseeable future.
Since this was a breaking change (presumably related to the recent Grants.gov redesign), whether there was any communication issued prior to the change, and/or whether resources exist that can give us a heads-up before any similar changes roll out in the future.
The text was updated successfully, but these errors were encountered:
Why is this issue important?
The ingestion pipeline sources most of its data from Grants.gov, which it accomplishes by downloading daily XML database snapshots served through a date-based URL pattern. If these files fail to download, new and modified grant data cannot be ingested from Grants.gov.
Current State
This error is tracked in Datadog.
The
DownloadGrantsGovDB
Lambda function is consistently failing with the following error:This error is being returned by the
validateDownloadResponse()
function, which expects that download responses have aContent-Type: application/zip
header. Since this issue began, responses instead contain aContent-Type: text/html
header, which our application (correctly) rejects as invalid.Upon investigation, it appears that the underlying issue is actually that the URL for XML extracts has moved. Although our application is configured to expect these files to be served from a URL like
https://www.grants.gov/extract/GrantsDBExtractYYYYMMDDv2.zip
, this appears to no longer be the case. For example, the following link responds with an HTML page (and a 200 status code, which seems incorrect) providing no content other than the standard Grants.gov header/footer: https://www.grants.gov/extract/GrantsDBExtract20231101v2.zip.After a bit of digging, it appears that these downloads are now being served from a base URL, also with a slightly modified path structure (plural
/extracts
rather than/extract
), the pattern for which is:https://prod-grants-gov-chatbot.s3.amazonaws.com/extracts/GrantsDBExtractYYYYMMDDv2.zip
.A simple test confirms that the 2023-11-01 extract can be successfully downloaded from this URL: https://prod-grants-gov-chatbot.s3.amazonaws.com/extracts/GrantsDBExtract20231101v2.zip.
Expected State
The pipeline's
DownloadGrantsGovDB
Lambda function is able to download XML database extract zip files successfully, according to the new URL pattern being used by Grants.gov as of today.Implementation Plan
Technical mitigation:
$GRANTS_GOV_BASE_URL
environment variable configured (in Terraform) for theDownloadGrantsGovDB
Lambda function."https://www.grants.gov"
"https://prod-grants-gov-chatbot.s3.amazonaws.com"
DownloadGrantsGovDB
Lambda handler source code so that the URL path pattern is updated from the old base path to the new one. Note that the URL-formatting behavior is provided by thegrantsURL()
method of theScheduledEvent
struct ./extract/
/extracts/
Separately, we should consider reaching out to contacts at Grants.gov who may have knowledge of this change and could advise us on the following:
The text was updated successfully, but these errors were encountered: