Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split rows from downloaded FFIS excel spreadsheets into per-grant JSON files #12

Closed
TylerHendrickson opened this issue Feb 2, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request Grant Finder

Comments

@TylerHendrickson
Copy link
Member

TylerHendrickson commented Feb 2, 2023

DOD: Whenever a new FFIS excel spreadsheet is saved to the "Grants Source Data" S3 bucket, the data from each row contained in the spreadsheet is converted to JSON and saved as an individual S3 object.

Blockers:

Implementation details:

  • Create a "Split FFIS Spreadsheet" Lambda function that:
    • Allows invocation from the "Grants Source Data" S3 bucket
    • Expects to receive an S3 Bucket Notification from the "Grants Source Data" bucket for the s3:ObjectCreated event that indicates an FFIS spreadsheet is fully (not partially) saved to S3, with the following object key filters:
      • Prefix: /sources/
      • Suffix: /ffis/download.xlsx
    • When invoked, performs the following actions:
      • Identifies the key of the source S3 object presented in the invocation payload.
      • Downloads the spreadsheet from S3 and parses its data, creating a JSON-serializable object representing the data pertaining to each grant in the spreadsheet.
      • Serializes each grant object to JSON and uploads it to the "Grants Prepared Data" destination bucket.
  • Notes:
    • Uploads saved to the destination bucket should be stored with an object key formatted as /<first 3 digits of the grant ID/<grant id>/ffis.org/v1.json.
    • Grant IDs are not present in the text data of the spreadsheet. However, the cells in the "Opportunity Number" column are hyperlinked to the grant listing on grants.gov; the hyperlink contains the ID in the ?oppId querystring parameter.
@TylerHendrickson TylerHendrickson added the enhancement New feature or request label Feb 2, 2023
@TylerHendrickson TylerHendrickson transferred this issue from usdigitalresponse/usdr-gost Mar 21, 2023
@TylerHendrickson TylerHendrickson moved this from 🔩 Eng planning to 📋 Backlog in Grants Team Agile Planning Mar 27, 2023
@TylerHendrickson TylerHendrickson moved this from 📋 Backlog to 🔖 Ready in Grants Team Agile Planning Apr 18, 2023
@TylerHendrickson
Copy link
Member Author

Example spreadsheet file: CG_23_16.01.xlsx

@pearkes pearkes moved this from 🔖 Ready to 🏗 In progress in Grants Team Agile Planning Apr 25, 2023
@jakekreider
Copy link
Contributor

@pearkes After some discussions last week, it sounds like only the bill/act (FKA "Program") field is going to be persisted in Grants DB for now, so I'm only expecting a bill field and opportunityNumber. I have a serialization class in my WIP branch as well as a test fixture for reference. Will that work?

I'm currently expecting an array, but re-reading above, it sounds like maybe it's only going to be one object per file at a time?

cc @TylerHendrickson

@pearkes
Copy link
Contributor

pearkes commented May 15, 2023

@jakekreider Yeah, I was planning on one object per grant/opportunity per the spec and pattern in the SplitGrantsXML workflow.

Opportunity Number I definitely have, but I'm not sure about bill. This is my current struct:

// Represents a funding opportunity sourced from an FFIS spreadsheet
type FFISFundingOpportunity struct {
	CFDA     string `json:"cfda"`              // eg. 11.525
	OppTitle string `json:"opportunity_title"` // eg. "FY 2020 Community Connect Grant Program"

	// In the FFIS spreadsheet, this is the header row for a group of opportunities
	OppCategory string `json:"opportunity_category"` // eg. Inflation Reduction Act

	Agency           string                 `json:"opportunity_agency"` // eg. Forest Service
	EstimatedFunding string                 `json:"estimated_funding"`  // eg. $25,000,000
	ExpectedAwards   string                 `json:"expected_awards"`    // eg. 10 or N/A
	OppNumber        string                 `json:"opportunity_number"` // eg. USDA-FS-2020-01
	Eligibility      FFISFundingEligibility `json:"eligibility"`
	DueDate          time.Time              `json:"due_date"`

	Match bool `json:"match"`
}

// Eligibility for FFIS funding opportunities as presented in FFIS spreadsheets
type FFISFundingEligibility struct {
	State           bool `json:"state"`            // State Governments
	Local           bool `json:"local"`            // Local Governments
	Tribal          bool `json:"tribal"`           // Tribal Governments
	HigherEducation bool `json:"higher_education"` // Institutions of Higher Education
	NonProfits      bool `json:"non_profits"`      // Non-profits
	Other           bool `json:"other"`            // Other/see announcement
}

I'll reach out to you and see how we can derive bill?

@pearkes pearkes mentioned this issue May 15, 2023
5 tasks
@pearkes pearkes moved this from 🏗 In progress to 👀 In review in Grants Team Agile Planning May 15, 2023
@jakekreider
Copy link
Contributor

Ok cool, from looking at your schema, opportunity_category is what I want. We had a call and Slack thread where naming was discussed and it was called "act" or "bill". I'll follow your lead in #94 until a different name is used.

@pearkes
Copy link
Contributor

pearkes commented May 16, 2023

@jakekreider Oh nice, things are making sense to me now. I'll change it to bill act then actually, as I just guessed for the category name.

@pearkes
Copy link
Contributor

pearkes commented May 17, 2023

This is done in #94.

@pearkes pearkes closed this as completed May 17, 2023
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Grants Team Agile Planning May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Grant Finder
Projects
Archived in project
Development

No branches or pull requests

4 participants