Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enqueue FFIS email downloads #53

Closed
wants to merge 10 commits into from
Closed

Enqueue FFIS email downloads #53

wants to merge 10 commits into from

Conversation

jakekreider
Copy link
Contributor

@jakekreider jakekreider commented Apr 25, 2023

Ticket #4

Closes #4

Description

As described in the ticket,

  • Defines a new Lambda, EnqueueFFISDownload
  • Defines SQS queue

Handler (in cmd/EnqueueFFISDownload) responds to S3 events in the grants bucket source, looks for a download link in it, and enqueues it into SQS for later processing.

Some things to discuss:

  1. I'm not familiar with the CloudPosse modules, but I didn't see where SES could be configured for the SQS queue, so I used the stock TF resource
  2. For the TF structure, I put the queue in the EnqueueFFISDownload module. Should it be top-level?
  3. Am I using the log formatter correctly?

I expect some revs on this, but it was getting lengthy and I wanted to come up for air/iterate. Appreciate feedback as always!

Testing

  1. Launch localstack (docker compose up -d) works here
  2. Apply the infrastructure tflocal apply -var-file=local.tfvars -auto-approve
  3. Push a test file to the bucket awslocal s3 cp ../cmd/EnqueueFFISDownload/fixtures/good.eml s3://grants-ingest-grantssourcedata-000000000000-us-west-2/sources/2023/04/24/ffis/raw.eml
  4. Observe the message in SQS via one of the supported methods, for example curl -H "Accept: application/json" http://localhost:4566/_aws/sqs/messages?QueueUrl=http://localhost:4566/000000000000/ffis_downloads | jq '.ReceiveMessageResponse.ReceiveMessageResult.Message.Body'

Automated and Unit Tests

  • Added Unit tests

Manual tests for Reviewer

  • Added steps to test feature/functionality manually

Checklist

  • Provided ticket and description
  • Provided testing information
  • Provided adequate test coverage for all new code
  • Added PR reviewers

- Terraform skeleton
- Command skeleton
- Simple parser for plaintext email
- Compose file for Localstack
- TF updates, formatting
- More configurability in handler
- Refactoring of AWS APIs
- Tests
@jakekreider jakekreider requested a review from a team as a code owner April 25, 2023 03:38
@jakekreider jakekreider self-assigned this Apr 25, 2023
Copy link
Member

@TylerHendrickson TylerHendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few scattered comments, but overall this is looking good.

To address the points you listed for further discussion:

  1. I'm not familiar with the CloudPosse modules, but I didn't see where SES could be configured for the SQS queue, so I used the stock TF resource

We've used certain CloudPosse modules only where it (presumably) simplifies things, but they're by no means required – your implementation of the aws_sqs_queue resource looks good to me.

For the TF structure, I put the queue in the EnqueueFFISDownload module. Should it be top-level?

Since this resource is common to multiple Lambdas (as it serve as the event source for the Lambda to-be-implemented in #11), I think it would be better to define it at the top-level and pass in the necessary attribute(s) to this module via input variable.

Am I using the log formatter correctly?

A few of the usages' arguments appear to be in the wrong order with regards to the main message – the INFO-level "Starting EnqueueFFISDownload" usage is an example of correct usage. Put plainly, any call to log.Info() or log.Debug() requires some kind of (narrative/explanatory) message, and any additional arguments are intended to be key/value pairs that get added to the logged JSON output.

For example, this call in getEmailFromS3Event():

log.Debug(logger, bucket, "Reading from bucket")

will emit JSON that looks like this (formatted as multiline for readability):

{
  "msg": "name-of-the-bucket", 
  "Reading from bucket": "(MISSING)",
  "level": "debug",
  "caller": "handler.go:145",
  "ts": "2023-04-25T18:51:20.549912"
}

By contrast, a correct call would look like this:

log.Debug(logger, "Reading from bucket", "bucket_name", bucket)

and would produce the following JSON:

{
  "msg": "Reading from bucket", 
  "bucket_name": "name-of-the-bucket",
  "level": "debug",
  "caller": "handler.go:145",
  "ts": "2023-04-25T18:51:20.549912"
}

Any additional pairs of arguments are handled similarly:

log.Debug(logger, "Reading from bucket", "bucket", bucket, "key", s3Event.Records[0].S3.Object.Key)

produces:

{
  "msg": "Reading from bucket", 
  "bucket": "name-of-the-bucket",
  "key": "path/to/the/file.eml",
  "level": "debug",
  "caller": "handler.go:145",
  "ts": "2023-04-25T18:51:20.549912"
}

cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/main.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/main.go Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
localstack/entrypoint/init-aws.sh Outdated Show resolved Hide resolved
terraform/modules/EnqueueFFISDownload/main.tf Outdated Show resolved Hide resolved
terraform/modules/EnqueueFFISDownload/main.tf Outdated Show resolved Hide resolved
terraform/modules/EnqueueFFISDownload/main.tf Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
- Moved queue out of module
- Fixed log param ordering
- Renamed handleS3Event
- Fixed param required
- Moved global vars to env object
- Cleaned up docker-compose for localstack
- Moved key vars from init-aws startup script
- Changed lambda params for timeout and memory
- Removed unneeded DD vars from lambda definition
- Referenced full queue URL in lambda instead of name
- Use reader for cleaner/more efficient IO in email handling
@jakekreider
Copy link
Contributor Author

Thanks again for the detailed feedback @TylerHendrickson - Not sure why the log param ordering wasn't clicking with me but your explanation made perfect sense.

I think I incorporated everything you requested. A couple of callouts:

  1. The "Validate and plan terraform" action is failing, apparently due to initializing AWS creds. Are there permission settings for those from forks or something like that?
  2. grants-ingest-ffis_ingest fails to apply for me (error below) locally. Is there some local configuration I need for that?
SES terraform error
│ Error: creating SES Receipt Rule (grants-ingest-ffis_ingest): RuleSetDoesNotExist: Invalid Rule Set Name.
│       status code: 400, request id: 81BMAZPFR81EF0N4A2VZGUPOO3ECBF5E7D4DCMA6B8YI6KNZPPAI                                                                                                                                          │
│   with aws_ses_receipt_rule.ffis_ingest,                                                                                                                                                                                          │   on main.tf line 207, in resource "aws_ses_receipt_rule" "ffis_ingest":
│  207: resource "aws_ses_receipt_rule" "ffis_ingest" {                                                                                                                                                                             │

@TylerHendrickson
Copy link
Member

TylerHendrickson commented Apr 27, 2023

@jakekreider Responding to your callouts:

  1. The "Validate and plan terraform" action is failing, apparently due to initializing AWS creds. Are there permission settings for those from forks or something like that?

Yeah, looks like workflows that are initiated by the pull_request trigger on a fork don't have access to repository secrets, which are necessary for running terraform plan. Theoretically, we could switch to a pull_request_target trigger and solve the issue, but there are some security considerations (outlined in this GitHub Security Lab post) that I don't think we've fully resolved yet. One potential solution (which that article briefly touches on, and is exemplified here) would be to use something like an "Approved for CI" label on these types of PRs that allows a privileged user to first inspect the contents of a PR and give manual approval that sensitive workloads are allowed to run for that PR. Something for us to look into.

For the case at-hand, I think the easiest path here might be to change (rebase?) this PR to a branch local to this repository, which you should have permission to do as a team member.

  1. grants-ingest-ffis_ingest fails to apply for me (error below) locally. Is there some local configuration I need for that?

This is also currently happening on the main branch – #61 outlines the reason for this failure and also the implementation steps to resolve. Commenting out (or otherwise disabling) the aws_ses_receipt_rule.ffis_ingest resource (defined here) in terraform/main.tf would be one workaround to the issue. Alternatively, you could wait for the hotfix that @slapula is working on in #64.

Copy link
Member

@TylerHendrickson TylerHendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few very minor nits and suggested tweaks, most of which are related to logging.

We don't have any firm test coverage requirements, but I think the missing branch I mentioned (regarding the multiple.eml case/fixture) in the comments would be good to have.

cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler.go Outdated Show resolved Hide resolved
cmd/EnqueueFFISDownload/handler_test.go Outdated Show resolved Hide resolved
}{
{"good.eml", "https://mcusercontent.com/123456/files/file-01.xlsx"},
{"missing.eml", ""},
{"multiple.eml", ""},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite this test cases's fixture looking correct as far as I can tell, I don't see any coverage on the if len(matches) > 1 branch of parseURLFromEmailBody(). Would you be able to look into that?

Hint: if you have Taskfile installed, you can run task test to generate an HTML coverage report.

Co-authored-by: Tyler Hendrickson <hendrickson.tsh@gmail.com>
@jakekreider
Copy link
Contributor Author

For the case at-hand, I think the easiest path here might be to change (rebase?) this PR to a branch local to this repository, which you should have permission to do as a team member.
Agreed; I applied your suggestions and will close this PR & open up a new one with a branch. Will have your other requests in that too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enqueue download jobs for FFIS digest spreadsheets
2 participants