Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update error response codes for v4 metadata and stats endpoints #3644

Merged
merged 14 commits into from
Apr 19, 2023

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented Apr 17, 2023

Summary

Implements changes for ECS/EC2 proposed in aws/containers-roadmap#2001. The proposal is for fixing misleading error response codes for v4 metadata and stats endpoints and align them with ECS/Fargate's codes.

With these changes

  • 404 Not Found is returned when the task/container was not found instead of 500 Internal Server Error and 400 Bad Request that are returned currently, and
  • 500 Internal Server Error is returned for unexpected failures instead of 400 Bad Request that is returned for some endpoints currently.
  • Fix a bug in task metadata handler that would have caused the handler to crash.

Implementation details

  • Make v4 metadata handlers perform v3EndpointId and container/task lookups, catch lookup failures, and return a 404 Not Found response code on lookup failures.
  • Make v4 stats handlers return 500 Internal Server Error on unexpected failures.
  • Task metadata handler currently ignores failures when looking up task by task ARN. Added error handling for this case that makes the handler return a 500 Internal Server Error.

Testing

New unit tests are added to cover all the changes. Existing unit tests are updated as needed.

Also tested changes for container/task not found cases manually by installing ECS Agent from source on an EC2 machine and sending requests with a bad v3EndpointID to TMDE. Results are shown below.

Before

$ curl http://169.254.170.2/v4/bad -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:51848
< Date: Mon, 17 Apr 2023 22:51:17 GMT
< Content-Length: 122
<
* Connection #0 to host 169.254.170.2 left intact
"V4 container metadata handler: unable to get container ID from request: unable to get docker ID from v3 endpoint ID: bad"%

$ curl http://169.254.170.2/v4/bad/task -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad/task HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:49324
< Date: Mon, 17 Apr 2023 22:51:26 GMT
< Content-Length: 112
<
* Connection #0 to host 169.254.170.2 left intact
"V4 task metadata handler: unable to get task arn from request: unable to get task Arn from v3 endpoint ID: bad"%

$ curl http://169.254.170.2/v4/bad/stats -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad/stats HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:36116
< Date: Mon, 17 Apr 2023 22:51:38 GMT
< Content-Length: 108
<
* Connection #0 to host 169.254.170.2 left intact
"V4 container handler: unable to get task arn from request: unable to get task Arn from v3 endpoint ID: bad"%

$ curl http://169.254.170.2/v4/bad/task/stats -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad/task/stats HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:47270
< Date: Mon, 17 Apr 2023 22:51:42 GMT
< Content-Length: 109
<
* Connection #0 to host 169.254.170.2 left intact
"V4 task stats handler: unable to get task arn from request: unable to get task Arn from v3 endpoint ID: bad"%

After

$ curl http://169.254.170.2/v4/bad -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:34780
< Date: Mon, 17 Apr 2023 22:47:12 GMT
< Content-Length: 122
<
* Connection #0 to host 169.254.170.2 left intact
"V4 container metadata handler: unable to get container ID from request: unable to get docker ID from v3 endpoint ID: bad"%

$ curl http://169.254.170.2/v4/bad/task -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad/task HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:59784
< Date: Mon, 17 Apr 2023 22:47:20 GMT
< Content-Length: 112
<
* Connection #0 to host 169.254.170.2 left intact
"V4 task metadata handler: unable to get task arn from request: unable to get task Arn from v3 endpoint ID: bad"%

$ curl http://169.254.170.2/v4/bad/stats -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad/stats HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:57020
< Date: Mon, 17 Apr 2023 22:47:31 GMT
< Content-Length: 108
<
* Connection #0 to host 169.254.170.2 left intact
"V4 container handler: unable to get task arn from request: unable to get task Arn from v3 endpoint ID: bad"%

$ curl http://169.254.170.2/v4/bad/task/stats -v
*   Trying 169.254.170.2:80...
* Connected to 169.254.170.2 (169.254.170.2) port 80 (#0)
> GET /v4/bad/task/stats HTTP/1.1
> Host: 169.254.170.2
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json
< X-Rate-Limit-Duration: 1
< X-Rate-Limit-Limit: 5000.00
< X-Rate-Limit-Request-Forwarded-For:
< X-Rate-Limit-Request-Remote-Addr: 172.31.3.52:56042
< Date: Mon, 17 Apr 2023 22:47:41 GMT
< Content-Length: 109
<
* Connection #0 to host 169.254.170.2 left intact
"V4 task stats handler: unable to get task arn from request: unable to get task Arn from v3 endpoint ID: bad"%

New tests cover the changes: yes

Description for the changelog

Fix misleading error response codes for v4 metadata and stats endpoints for TMDE. The endpoints will now return a 404 Not Found error code when task/container is not found and a 500 Internal Server Error on unexpected failures. TMDE clients that branch logic based on error response status codes will behave differently. For example, clients that retry on 5XX errors but not on 4XX errors will no longer perform futile retries.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 requested a review from a team as a code owner April 17, 2023 18:40
@amogh09 amogh09 changed the title Update error responses for v4 metadata and stats endpoints Update error response codes for v4 metadata and stats endpoints Apr 17, 2023
xxx0624
xxx0624 previously approved these changes Apr 18, 2023
Copy link
Contributor

@xxx0624 xxx0624 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have these tests! LGTM.

}

// Tests that v4 stats endpoints return a 500 error on unexpected failures
func TestV4Stats500Error(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can cover all other v4 endpoints as well for 500? like the task metadata, container metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a great idea. I will add new tests for other endpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new test cases.

@amogh09 amogh09 merged commit 01a7207 into aws:dev Apr 19, 2023
@Yiyuanzzz Yiyuanzzz mentioned this pull request Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants