Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Protocol Tests #3227

Draft
wants to merge 11 commits into
base: develop
Choose a base branch
from
Draft

Conversation

jonathan343
Copy link
Contributor

@jonathan343 jonathan343 commented Jul 26, 2024

Context

This PR replaces the existing protocol tests in botocore/tests/unit/protocols/... with new tests generated from Smithy protocol test models. We use a version of these Smithy tests that are converted to the format currently supported by our existing test runner (test_protocols.py).

Issues

Many of the new protocol tests are failing due to test runner, serialization, and parsing issues. I've highlighted the notable issues below to provide additional context for reviwers:

Test Runner (test_protocols.py)

  • Response Body Normalization - The new protocol tests define expected body values with JSON and XML that includes extra white space and newlines. This prohibits us from continuing to do direct assertions between expected and actual values. Instead, we normalize the expected body value by attempting to parse the content as a JSON or XML object based on the protocol.
  • Handle Special Float Types - In the protocol test suite, certain special float types are represented as strings: "Infinity", "-Infinity", and "NaN". However, we parse these values as actual floats types, so we need to convert them back to their string representation before comparing with the expected values.

Input Serialization (serialize.py)

...

Response Parsing (parsers.py)

  • Infer Root XML Node - We should be trying to infer the following root nodes when parsing responses.
    • ec2 - Serializes XML responses within an XML root node with the name of the operation's output suffixed with "Response" such as <OperationNameResponse>. (see more)
    • query - Serializes a nested element with the name of the operation's output suffixed with "Result" such as <OperationNameResult>. (see more)
  • JSON Error Parsing - Parse errors as described in Operation error serialization for ALL json-based protocols. This behavior currently only exists for the rest-json protocol and doesn't handle : characters.
  • JSON Parse Header Values - This PR includes updates to the rest-json parser to handle boolean, float, and double values when represented as strings in a header.
    ...

Copy link
Contributor

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start! I left some initial feedback to make sure we're not regressing things with these changes. Ideally we should be able to tie code blocks to individual test(s) changes.

botocore/parsers.py Outdated Show resolved Hide resolved
botocore/parsers.py Outdated Show resolved Hide resolved
botocore/parsers.py Outdated Show resolved Hide resolved
Comment on lines -463 to -476
# This method is needed because we have to special case flattened list
# with a serialization name. If this is the case we use the
# locationName from the list's member shape as the key name for the
# surrounding structure.
if shape.type_name == 'list' and shape.serialization.get('flattened'):
list_member_serialized_name = shape.member.serialization.get(
'name'
)
if list_member_serialized_name is not None:
return list_member_serialized_name
serialized_name = shape.serialization.get('name')
if serialized_name is not None:
return serialized_name
return member_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we hoist all of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the conditional logic we have defined here goes against the guidance provided in the Flattened list serialization guide.

The xmlName trait applied to the member of a list has no effect when serializing a flattened list into a structure/union. For example, given the following:

union Choice {
    @xmlFlattened
    flat: MyList
}

list MyList {
    @xmlName("Hi")
    member: String
}

The XML serialization of Choice is:

 <Choice>
     <flat>example1</flat>
     <flat>example2</flat>
     <flat>example3</flat>
 </Choice>

Copy link
Contributor Author

@jonathan343 jonathan343 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I look another look at this and did find some outliers.
I wrote a script to parse all service model files to look for list shapes with "flattened":true and a locationName. The results are posted below:

Detecting flattened list shapes with a locationName:
s3control:
 * StorageLensConfigurationList - StorageLensConfiguration
 * StorageLensGroupList - StorageLensGroup
sdb:
 * AttributeList - Attribute
 * AttributeNameList - AttributeName
 * DeletableItemList - Item
 * DomainNameList - DomainName
 * ItemList - Item
 * ReplaceableAttributeList - Attribute
 * ReplaceableItemList - Item 

These services don't follow the expected behavior indicated by the Smithy guidance. Looking more into this.

Comment on lines 574 to 577
if not response.get('body'):
return {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error today because it means we got back a malformed response from the service. I think this only arises with S3 in specific edge cases with their deferred 200 responses. What test cases are enforcing this requirement now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a misunderstanding on my part. There is a new query protocol test case (id:QueryEmptyInputAndEmptyOutput) that ensures SDKs properly parse responses for output shapes with no members to {}.

These responses normally look like below:

<ExampleResponse xmlns="...">
  <ExampleResult/>
  <ResponseMetadata>
    <RequestId>778994ee-73cf-4128-a909-55c10282758c</RequestId>
  </ResponseMetadata>
</ExampleResponse>

The smithy test should be updated to use a response similar to above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back to this and will remove it from the next revision.
I misunderstood a test case, a fix needs to be made upstream to Smithy.

tests/unit/test_protocols.py Outdated Show resolved Hide resolved
Comment on lines 456 to 455
if protocol_type in ['query', 'ec2']:
if expected.get('headers', {}).get('Content-Type'):
expected['headers']['Content-Type'] += '; charset=utf-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only getting charset information with query/ec2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the query and ec2 protocols, we specifically set the Content-Type header to application/x-www-form-urlencoded; charset=utf-8. The smithy protocol tests only expect application/x-www-form-urlencoded. This line adds the missing ; charset=utf-8 to match what we expect for our SDK.

tests/unit/test_protocols.py Outdated Show resolved Hide resolved
Comment on lines 452 to 450
except (json.JSONDecodeError, ET.ParseError):
assert_equal(actual_body, expected_body, 'Body value')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test cases that are giving us back bad bodies that we expect to work? This seems like an anti-pattern because we're potentially letting breakages through the test suite with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many cases in rest-json and rest-xml that expect bodies not in JSON or XML format. If it fails to parse, we still assert the exact string value.

Comment on lines 327 to 329
if value in [float('Infinity'), float('-Infinity')] or math.isnan(
value
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something unique about NaN that we don't use float('NaN') for the check like we do inf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, NaN is unique in that it isn't equal to any value, including itself. Using the in or == operators won't work with float('NaN') since it will always equate to false. Because NaN it's the only float that doesn't equal itself, we can check for NaN with value != value. I can update to use this option if preferred.

Copy link
Contributor Author

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied/resolved some of the comments and suggestions. Still looking deeper into a few of them but keeping changes local for now because I think GitHub will hide the unresolved comments and make it harder to keep track of them if I push new changes.

botocore/parsers.py Outdated Show resolved Hide resolved
botocore/parsers.py Outdated Show resolved Hide resolved
Comment on lines 456 to 455
if protocol_type in ['query', 'ec2']:
if expected.get('headers', {}).get('Content-Type'):
expected['headers']['Content-Type'] += '; charset=utf-8'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the query and ec2 protocols, we specifically set the Content-Type header to application/x-www-form-urlencoded; charset=utf-8. The smithy protocol tests only expect application/x-www-form-urlencoded. This line adds the missing ; charset=utf-8 to match what we expect for our SDK.

Comment on lines 327 to 329
if value in [float('Infinity'), float('-Infinity')] or math.isnan(
value
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, NaN is unique in that it isn't equal to any value, including itself. Using the in or == operators won't work with float('NaN') since it will always equate to false. Because NaN it's the only float that doesn't equal itself, we can check for NaN with value != value. I can update to use this option if preferred.

Comment on lines -463 to -476
# This method is needed because we have to special case flattened list
# with a serialization name. If this is the case we use the
# locationName from the list's member shape as the key name for the
# surrounding structure.
if shape.type_name == 'list' and shape.serialization.get('flattened'):
list_member_serialized_name = shape.member.serialization.get(
'name'
)
if list_member_serialized_name is not None:
return list_member_serialized_name
serialized_name = shape.serialization.get('name')
if serialized_name is not None:
return serialized_name
return member_name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the conditional logic we have defined here goes against the guidance provided in the Flattened list serialization guide.

The xmlName trait applied to the member of a list has no effect when serializing a flattened list into a structure/union. For example, given the following:

union Choice {
    @xmlFlattened
    flat: MyList
}

list MyList {
    @xmlName("Hi")
    member: String
}

The XML serialization of Choice is:

 <Choice>
     <flat>example1</flat>
     <flat>example2</flat>
     <flat>example3</flat>
 </Choice>

@@ -586,14 +585,24 @@ def _parse_body_as_xml(self, response, shape, inject_metadata=True):
start = self._find_result_wrapped_shape(
shape.serialization['resultWrapper'], root
)
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was added in effort to align with the following serialization guidance:

Query:

The awsQuery protocol serializes XML responses within an XML root node with the name of the operation's output suffixed with "Response". A nested element, with the name of the operation's output suffixed with "Result", contains the contents of the successful response.
ref: https://smithy.io/2.0/aws/protocols/aws-query-protocol.html#response-serialization

EC2:

The ec2Query protocol serializes XML responses within an XML root node with the name of the operation's output suffixed with "Response", which contains the contents of the successful response.
ref: https://smithy.io/2.0/aws/protocols/aws-ec2-query-protocol.html#response-serialization

Note: The next revision doesn't hardcode "Result" and instead uses a ROOT_NODE_SUFFIX constant that is defined as ROOT_NODE_SUFFIX = 'Result' for QueryParser and ROOT_NODE_SUFFIX = 'Response' for EC2QueryParser.

Comment on lines 714 to +724
code = body.get('__type', response_code and str(response_code))
if code is not None:
# The "Code" value can come from either a response
# header or a value in the JSON body.
if 'x-amzn-errortype' in response['headers']:
code = response['headers']['x-amzn-errortype']
elif 'code' in body or 'Code' in body:
code = body.get('code', body.get('Code', ''))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol tests require us to support following requirement for operation error parsing:

The component MUST be one of the following: an additional header with the name X-Amzn-Errortype, a body field with the name code, or a body field named __type.

One update I will have to make is the preference of __type for JSON-1.1/1.0 and x-amzn-errortype for REST-JSON

Comment on lines 982 to 986
if location is None or location not in self.KNOWN_LOCATIONS:
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you're right. I'll revert this change. It would short circuit earlier, but not necessary.

@@ -994,14 +1020,28 @@ def _handle_string(self, shape, value):
parsed = value
if is_json_value_header(shape):
decoded = base64.b64decode(value).decode(self.DEFAULT_ENCODING)
parsed = json.loads(decoded)
parsed = json.dumps(json.loads(decoded))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstood this, but I assumed if this method is handling a string value, we should be returning the string representation of the parsed json object.

Comment on lines 1056 to 1060
if isinstance(value, str):
if value == 'true':
return True
else:
return False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this to use the ensure_boolean. Was the concern with only accepting true and not True? Or with the change as a whole?

@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 99.24242% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.10%. Comparing base (c68aa1a) to head (2ec3a18).
Report is 113 commits behind head on develop.

Files Patch % Lines
botocore/serialize.py 98.93% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3227      +/-   ##
===========================================
- Coverage    93.12%   93.10%   -0.03%     
===========================================
  Files           66       66              
  Lines        14252    14354     +102     
===========================================
+ Hits         13272    13364      +92     
- Misses         980      990      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants