Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mochiutf8 codepoint_to_bytes failing for values between 55296 and 57343 #164

Closed
jockee opened this issue Feb 8, 2016 · 3 comments
Closed

Comments

@jockee
Copy link

jockee commented Feb 8, 2016

:mochiweb_html.parse("��")
** (FunctionClauseError) no function clause matching in :mochiutf8.codepoint_to_bytes/1
  src/mochiutf8.erl:38: :mochiutf8.codepoint_to_bytes(55357)
  src/mochiweb_html.erl:665: :mochiweb_html.tokenize_charref/3
  src/mochiweb_html.erl:642: :mochiweb_html.tokenize_charref/2
  src/mochiweb_html.erl:302: :mochiweb_html.tokens/3
  src/mochiweb_html.erl:82: :mochiweb_html.parse/1

Bumped into this when trying to parse HTML containing ��, which looks like it's supposed to represent the "😊" smiley.

http://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%98%8A - under surrogates.

@etrepum
Copy link
Member

etrepum commented Feb 8, 2016

While this is a bug (can't handle surrogates as charrefs), it wouldn't work if that bug was fixed because it can't handle parsing anything without a surrounding tag.

etrepum added a commit that referenced this issue Feb 9, 2016
Support parsing UTF-16 surrogate pairs in mochiweb_html #164
@etrepum
Copy link
Member

etrepum commented Feb 9, 2016

Fixed in 2.13.0

@etrepum etrepum closed this as completed Feb 9, 2016
@jockee
Copy link
Author

jockee commented Feb 9, 2016

Great stuff! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants