You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug Description opensearchproject/data-prepper container image incorrectly handles UTF-8 characters when streaming data from DynamoDB to S3 buckets in NDJSON format. Non-ASCII characters are replaced with question marks (?) in the output files.
Steps to Reproduce
Set up data-prepper using the opensearchproject/data-prepper container image
Create a DynamoDB table with items containing strings with non-ASCII characters (e.g., Mandarin, Tamil)
Configure data-prepper to stream changes from the DynamoDB table to an S3 bucket using NDJSON format
Observe the resulting S3 objects
Actual Behavior
All non-ASCII characters in the original DynamoDB data are replaced with question marks (?) in the S3 output files.
Expected Behavior
All UTF-8 characters, including non-ASCII characters, should be preserved in the output NDJSON files exactly as they appear in the source DynamoDB table.
Workaround
Adding the environment variable LC_ALL=C.UTF-8 to the container configuration resolves the issue. This environment variable should be set by default in the container image to ensure proper UTF-8 handling.
The text was updated successfully, but these errors were encountered:
Bug Description
opensearchproject/data-prepper
container image incorrectly handles UTF-8 characters when streaming data from DynamoDB to S3 buckets in NDJSON format. Non-ASCII characters are replaced with question marks (?) in the output files.Steps to Reproduce
opensearchproject/data-prepper
container imageActual Behavior
All non-ASCII characters in the original DynamoDB data are replaced with question marks (?) in the S3 output files.
Expected Behavior
All UTF-8 characters, including non-ASCII characters, should be preserved in the output NDJSON files exactly as they appear in the source DynamoDB table.
Workaround
Adding the environment variable
LC_ALL=C.UTF-8
to the container configuration resolves the issue. This environment variable should be set by default in the container image to ensure proper UTF-8 handling.The text was updated successfully, but these errors were encountered: