Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement fetch new submissions for form id #33

Merged
merged 20 commits into from
Sep 6, 2024
Merged

Conversation

wmoussa-gc
Copy link
Contributor

@wmoussa-gc wmoussa-gc commented Aug 22, 2024

Allows a service account to download all the new submissions for one form
e.g. GET http://FORMS_API_DOMAIN/forms/FORM_ID/submission/new

Sample response returned:

[
  {
    "createdAt": 1722610684687,
    "submissionName": "02-08-d732"
  },
{
    "createdAt": 172261033333,
    "submissionName": "06-77-f456"
  }
]

@wmoussa-gc wmoussa-gc marked this pull request as ready for review August 22, 2024 21:15
Copy link
Contributor

@craigzour craigzour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a some changes I suggested. I will let you review them and tell me what you think.

Also I noticed a few other things that are missing in this PR:

  • A maximum number of new form submissions that can be returned for one request. This should be implemented in the DynamoDB query operation using the Limit option.
    The LastEvaluatedKey value in the query operation response should also be checked as DynamoDB will use it to tell us whether we should send follow up request to get all the items we requested. See example here https://github.com/cds-snc/platform-forms-client/blob/develop/lib/vault.ts#L119
    With that we would also need to communicate, through the API response, the fact that there could be more new form submissions to be retrieved.
  • Related to my previous point. The logic we discussed earlier this week where we would keep the LastEvaluatedKey in cache for when new requests are received so that we can respond with the next bunch of new form submissions that are available for retrieval
  • Unit tests on the /new router code

src/lib/vault/dataStructures/formSubmission.ts Outdated Show resolved Hide resolved
src/lib/vault/dataStructures/formSubmission.ts Outdated Show resolved Hide resolved
src/lib/vault/dataStructures/formSubmission.ts Outdated Show resolved Hide resolved
src/lib/vault/getFormNewSubmissions.ts Outdated Show resolved Hide resolved
src/lib/vault/getFormNewSubmissions.ts Outdated Show resolved Hide resolved
src/lib/vault/getFormNewSubmissions.ts Outdated Show resolved Hide resolved
src/lib/vault/getFormNewSubmissions.ts Outdated Show resolved Hide resolved
src/routes/forms/submission/new/router.ts Outdated Show resolved Hide resolved
test/routes/forms/router.test.ts Outdated Show resolved Hide resolved
src/lib/vault/getFormNewSubmissions.ts Outdated Show resolved Hide resolved
@wmoussa-gc
Copy link
Contributor Author

@craigzour Thanks for the review! I think I've covered most of it. For the optimization on how many names we return in one call, I suggest we let the prototyping partner mess around with those endpoints before we tweak the flow. What do you think?

@wmoussa-gc wmoussa-gc requested a review from craigzour August 27, 2024 20:59
@craigzour
Copy link
Contributor

@craigzour Thanks for the review! I think I've covered most of it. For the optimization on how many names we return in one call, I suggest we let the prototyping partner mess around with those endpoints before we tweak the flow. What do you think?

About the limit on the number of new form submissions being returned by this /new path, I think that the main issue is not really the number of items we return. The problem is that if a user decides to make 2 calls in a row to get a certain amount of form submissions before retrieving them through the get submission endpoint then the two responses will approximatively contain the exact same new form submissions.

What do you think @bryan-robitaille ?

@wmoussa-gc
Copy link
Contributor Author

@craigzour Thanks for the review! I think I've covered most of it. For the optimization on how many names we return in one call, I suggest we let the prototyping partner mess around with those endpoints before we tweak the flow. What do you think?

About the limit on the number of new form submissions being returned by this /new path, I think that the main issue is not really the number of items we return. The problem is that if a user decides to make 2 calls in a row to get a certain amount of form submissions before retrieving them through the get submission endpoint then the two responses will approximatively contain the exact same new form submissions.

What do you think @bryan-robitaille ?

Got it. I'm also not a fan of introducing caching or a "stored checkpoint" before we're sure it's necessary 'There are only two hard things in Computer Science: cache invalidation...' However, I have no issue doing it if you both believe it's important for our testing partners.

@craigzour
Copy link
Contributor

@craigzour Thanks for the review! I think I've covered most of it. For the optimization on how many names we return in one call, I suggest we let the prototyping partner mess around with those endpoints before we tweak the flow. What do you think?

About the limit on the number of new form submissions being returned by this /new path, I think that the main issue is not really the number of items we return. The problem is that if a user decides to make 2 calls in a row to get a certain amount of form submissions before retrieving them through the get submission endpoint then the two responses will approximatively contain the exact same new form submissions.
What do you think @bryan-robitaille ?

Got it. I'm also not a fan of introducing caching or a "stored checkpoint" before we're sure it's necessary 'There are only two hard things in Computer Science: cache invalidation...' However, I have no issue doing it if you both believe it's important for our testing partners.

I definitely understand your point! I think it all comes down to how we want to design API v1. If we want to keep it just like that then we should make sure to explain in the documentation what will happen if you hit the endpoint and you are not planning on retrieving and confirming the responses right after. But even if this is something we specify to our users, there is still a chance you could end-up getting some of the same new form submissions because of the DynamoDB Read Consistency. We are using the default Eventually Consistent Reads with our request because they consume less read capacity units.

@bryan-robitaille
Copy link
Contributor

What do you think @bryan-robitaille ?

The more I think about it we should probably limit it to 100 at a time and always return the 100 oldest responses. This way we don't have to keep track of the LastEvaluatedKey in a cache. It might also tie in well to the Token Rate Limiting. Why give a user 10k response names when they are rate limited to a set amount?

I'm not overly concerned about the delay in Eventually Consistent Reads at this point in time. Each confirmation round trip will probably be in the 500ms range so DynamoDB should have enough time to complete the replication of the write to the index within the 1 second time frame.

We could also add in the documentation that there is a small chance that confirmed responses may continue to appear in the NEW endpoint for a very short period of time and Validation should be completed on their end.

@craigzour
Copy link
Contributor

What do you think @bryan-robitaille ?

The more I think about it we should probably limit it to 100 at a time and always return the 100 oldest responses. This way we don't have to keep track of the LastEvaluatedKey in a cache. It might also tie in well to the Token Rate Limiting. Why give a user 10k response names when they are rate limited to a set amount?

I'm not overly concerned about the delay in Eventually Consistent Reads at this point in time. Each confirmation round trip will probably be in the 500ms range so DynamoDB should have enough time to complete the replication of the write to the index within the 1 second time frame.

We could also add in the documentation that there is a small chance that confirmed responses may continue to appear in the NEW endpoint for a very short period of time and Validation should be completed on their end.

Make sense! Though, how would you return the 100 oldest responses? Would we have to query all new responses and then sort them by creation date at the API level?

@bryan-robitaille
Copy link
Contributor

Make sense! Though, how would you return the 100 oldest responses? Would we have to query all new responses and then sort them by creation date at the API level?

I'd have to test it out but I'm assuming the index would keep the same order as the main table since all the sort keys are identical.

@wmoussa-gc
Copy link
Contributor Author

I'll give it a try. Ill create a 1000 items in local DB, and see how it goes

@wmoussa-gc
Copy link
Contributor Author

from my tests and reading the SDK doc: Items with the same partition key value are stored in sorted order by sort key. If the sort key data type is Number, the results are stored in numeric order. For type String, the results are stored in order of UTF-8 bytes.

From my understanding we will not have the same Sort Key i.e. NAME#21-53-46d2 so it will be sorted in order of UTF-8 bytes..

@craigzour
Copy link
Contributor

from my tests and reading the SDK doc: Items with the same partition key value are stored in sorted order by sort key. If the sort key data type is Number, the results are stored in numeric order. For type String, the results are stored in order of UTF-8 bytes.

From my understanding we will not have the same Sort Key i.e. NAME#21-53-46d2 so it will be sorted in order of UTF-8 bytes..

Correct! I guess the interrogation was more about what happened to that logic when you are using a Global Secondary Index (like us) that has a different set of keys (FormID + Status) where we know that the items will be ordered by Status. Is there a third criteria that DynamoDB uses to sort items beyond that? It could have been the creation date for example.

If it does not work then I think we could go with just a request that retrieves 100 new form submissions (we would let DynamoDB decide what are those submissions) for the experimentation phase.
The other solution I mentioned this morning was to get all new form submissions and then sort them at the API level but if you have a client that gets a LOT of submissions then it would put too much pressure on the system (our server but also DynamoDB and their rate limiting logic).

@craigzour craigzour requested review from patheard and bryan-robitaille and removed request for bryan-robitaille September 5, 2024 18:10
TableName: "Vault",
IndexName: "StatusCreatedAt",
ExclusiveStartKey: lastEvaluatedKey ?? undefined,
Limit: limit - newFormSubmissions.length,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me again why we would need the evaluated key to ensure the limit works properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this may not be useful just now because of the maximum number of new form submission we are requesting and the way DynamoDB handle pagination..

A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression. If LastEvaluatedKey is present in the response, you will need to paginate the result set. For more information, see Paginating the Results
in the Amazon DynamoDB Developer Guide.

.. I wanted to have this logic built in for if something changes on their side or ours (we increase our maximum of returned new form submissions or we have to request more fields for each entry).

Copy link
Contributor

@bryan-robitaille bryan-robitaille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!
I tested and everything seems to work as planned.

@wmoussa-gc wmoussa-gc merged commit 7a51903 into main Sep 6, 2024
4 checks passed
@wmoussa-gc wmoussa-gc deleted the submission-new branch September 6, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants