Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need some way to parse &[u8] as UTF-8 with replacement chars #9516

Closed
lilyball opened this issue Sep 26, 2013 · 12 comments · Fixed by #12062
Closed

Need some way to parse &[u8] as UTF-8 with replacement chars #9516

lilyball opened this issue Sep 26, 2013 · 12 comments · Fixed by #12062

Comments

@lilyball
Copy link
Contributor

We need some way to interpret a &[u8] as UTF-8, using the replacement character for invalid sequences instead of conditions. This would ideally be provided in the form of an Iterator<char>.

@thestinger
Copy link
Contributor

Could we remove the current condition API in favour of this? It doesn't really seem to offer anything but failing, and we already have an API returning an Option.

@lilyball
Copy link
Contributor Author

@thestinger I would be quite pleased if we could do that. Using the condition is rather awkward, and I would rather just have replacement chars myself.

@pnkfelix
Copy link
Member

This is a duplicate of #8968, no?

@pnkfelix
Copy link
Member

And also the e-mail thread here I think is relevant: https://mail.mozilla.org/pipermail/rust-dev/2013-September/005503.html

@bluss
Copy link
Member

bluss commented Sep 26, 2013

This is an implementation of a mostly complete Iterator<u8>Iterator<char> UTF-8 decoder

https://gist.github.com/anonymous/0363d92055cf4552dd1f

@lilyball
Copy link
Contributor Author

@pnkfelix It's only a duplicate if the intended resolution of #8968 is to use the replacement char instead of conditions, but it's unclear from the discussion if that's the plan.

@pnkfelix
Copy link
Member

@kballard I guess my thinking was that a maximally expressive condition-based solution would let one express a replacement char approach using conditions.

But I don't actually favor doing that at this point. Or at least, I don't favor making that the only way to accomplish this, since I suspect a more specialized approach will be much nicer to use (both in terms of programmer convenience and in terms of efficiency).

So okay, this is not a duplicate of #8968.

@bluss
Copy link
Member

bluss commented Sep 26, 2013

Rust needs to support both fatal and replacement error handling modes, it should follow the WHATWG spec simonsapin has said

(edit by @pnkfelix: here is WHATWG spec.)

@thestinger
Copy link
Contributor

@blake2-ppc: the fatal mode is already handled by the Option API

@bluss
Copy link
Member

bluss commented Sep 26, 2013

well, the current functions in str don't handle decoding buffers in chunks like the proposed encodings API. But yes, it does handle one-off decoding with 'fatal' error handling.

@pnkfelix
Copy link
Member

pnkfelix commented Feb 7, 2014

I think we need to improve/extend our API along the lines suggested here for 1.0.

Nominating for P-backcompat-lang.

(Arguably we could avoid the backwards compatibility hazard by offering a fail-only method for 1.0 and then adding an alternative entry point that provides the more flexible API supporting replacement characters in post 1.0. But I think this case is important enough that we should try to get the primary API method to provide both choices up front. That, or put dynamic fluid support in for representing the state of that choice.)

@lilyball
Copy link
Contributor Author

lilyball commented Feb 7, 2014

@pnkfelix I submitted a PR for this last night, although I forgot there was an open issue too so I didn't link them. The PR is #12062

@bors bors closed this as completed in 36f1b38 Feb 7, 2014
flip1995 pushed a commit to flip1995/rust that referenced this issue Oct 6, 2022
Rustup

r? `@ghost`

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants