-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JsonException: Malformed UTF-8 characters, possibly incorrectly encoded
#43
Conversation
Hi, @VincentLanglet - I'm still investigating this but it might be an issue with PHP/fixable in the library. $ curl -X POST 'https://api.deepl.com/v2/translate' \
--header 'Authorization: DeepL-Auth-Key MYKEY' \
--header 'Content-Type: application/json' \
--data '{"text": ["Portal<span></span>"], "target_lang": "FR", "source_lang":"EN", "tag_handling":"xml", "ignore_tags":["notranslate"]}'
{"translations":[{"detected_source_language":"EN","text":"Portail<span>/span>"}]} I tried logging the request/response that gets sent over the wire via PHP yesterday but the logger I used modified the data, so I'm checking with a new one now. |
When I tried to log the
notice that without the tag_handling, the request with
works. When looking at
Maybe the |
Hi @VincentLanglet, I was sick until yesterday, looking into this now too. I don't think this is likely a problem of PHP, as I can reproduce it with our Python library too. Nor is it likely a problem with sending a JSON-encoded request or URL-encoded request; I could reproduce the response in both cases. It seems to be caused because this input (combined with XML tag-handling) triggers some unusual case, and our API response includes an invalid UTF-8 sequence: |
Thanks @daniel-jones-dev, any idea about when it will be solve ? |
Hi @VincentLanglet, the backend team has looked into the cause of this issue; unfortunately it will not be easily fixed. In the meantime, I wonder if a workaround in this library could help you: we could suppress these invalid UTF-8 sequences by replacing them with the replacement character “�” (U+FFFD), this would at least allow you to use the other requests. Do you think this would help you? |
Sure, I talked with our team and it would help a lot. (As a first step until the issue is fixed on the api side). What implementation did you have in mind ? I think that
could do the job. |
JsonException: Malformed UTF-8 characters, possibly incorrectly encoded
JsonException: Malformed UTF-8 characters, possibly incorrectly encoded
Thanks @VincentLanglet, your change for the workaround looks good. We need to check some internal tests and then we should be able to merge this tomorrow. The backend team will still work on fixing the issue in the API. |
Workaround is published in v1.7.2 |
Hi @JanEbbing @daniel-jones-deepl,
We recently encountered an issue with text input which after deepl translation cannot be json_decoded by the library.
I create a reproducer of the issue. Notice this only occurs with the option
This is how is rendered the input in my IDE
The test added is failing with the error:
This is especially annoying because when translating a payload with 1000 texts, if one of them has such a character, the whole payload is failing and no text is translated (when 999 could have been).
Is something can be done:
Thanks