You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a segment/subsegment name has invalid characters, that segment/subsegment will not be accepted by X-Ray service back-end. But since the X-Ray daemon sends segments on batches, the invalid segment/subsegment will be in "Unprocessed" of the API PutTraceSegments's response body but the call will succeed. This is the background of the sanitization happening on the SDK side to make sure valuable data will not be dropped due to invalid characters on names.
The SDK should have equal or at least loosen restriction than the back-end has. Doing a full regex using ([\\p{L}\\p{Z}\\p{N}_.:/%&#=+\\-@]*)$ adds performance overhead since this regex match happens for every single segment/subsegment capture.
The purpose is to have the SDK to switch to blacklist based sanitization. It drops common invalid characters like ? * $ ; ( ) [ ] { }. This ensures the lightweight design and unicode letters from non-English nature languages pass through.
Any feedback is welcome.
The text was updated successfully, but these errors were encountered:
The segment/subsegment name supports unicode characters per schema provided in https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html. Here is the content of the schema regarding name property:
However, the SDK is dropping characters that are not ASCII: https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/core/models/entity.py#L18.
If a segment/subsegment name has invalid characters, that segment/subsegment will not be accepted by X-Ray service back-end. But since the X-Ray daemon sends segments on batches, the invalid segment/subsegment will be in "Unprocessed" of the API
PutTraceSegments
's response body but the call will succeed. This is the background of the sanitization happening on the SDK side to make sure valuable data will not be dropped due to invalid characters on names.The SDK should have equal or at least loosen restriction than the back-end has. Doing a full regex using
([\\p{L}\\p{Z}\\p{N}_.:/%&#=+\\-@]*)$
adds performance overhead since this regex match happens for every single segment/subsegment capture.The purpose is to have the SDK to switch to blacklist based sanitization. It drops common invalid characters like
? * $ ; ( ) [ ] { }
. This ensures the lightweight design and unicode letters from non-English nature languages pass through.Any feedback is welcome.
The text was updated successfully, but these errors were encountered: