-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str::not_utf8 condition is a sign that the from_bytes API needs massaging #8968
Comments
|
I'm responsible for this code. I'm fine with changing the condition to carry a byte vector instead of a message. |
Here is some test code to check the performance difference of conditions vs. statically specifying the recovery strategy (inserting replacement char). Conditions have a small cost when setting up to catch them, and a comparatively big cost if they fire, but otherwise it's fine. https://gist.github.com/anonymous/904d9e4c90a1a199f452 by the way pnkfelix, the |
Adding to that last comment, even |
The general For an iterator interface, arbitrary-length replacements sound much more complicated. |
I don't think it's too hard, i.e. the following would allow splicing an arbitrary number of bytes into a stream (clearly one can generalise to streams of any type, and one could easily restrict to struct FooDecoder {
splice: ~[u8]
// ... other fields ...
}
impl Iterator<u8> for FooDecoder {
fn next(&mut self) -> Option<u8> {
if !self.splice.is_empty() {
// left-over splices from the previous `next` call.
Some(self.splice.pop())
} else {
// ... normal activities ...
if need_to_splice {
// splicing [x,y,z], need to go on in reverse order to match the `pop`
self.splice.push(z);
self.splice.push(y);
return Some(x);
}
// ... more normal activities ...
}
}
} |
Hmm. I think this is actually a duplicate of #4837; closing in favor of that. |
Reopening because #4837 is two-bugs-in-one and this is more specific. |
see also #6164 for a more open-ended discussion of problems here. |
Conditions are gone, but I think this is still relevant. |
Yeah, closing. |
This has been a long time coming. Conditions in rust were initially envisioned as being a good alternative to error code return pattern. The idea is that all errors are fatal-by-default, and you can opt-in to handling the error by registering an error handler. While sounding nice, conditions ended up having some unforseen shortcomings: * Actually handling an error has some very awkward syntax: let mut result = None; let mut answer = None; io::io_error::cond.trap(|e| { result = Some(e) }).inside(|| { answer = Some(some_io_operation()); }); match result { Some(err) => { /* hit an I/O error */ } None => { let answer = answer.unwrap(); /* deal with the result of I/O */ } } This pattern can certainly use functions like io::result, but at its core actually handling conditions is fairly difficult * The "zero value" of a function is often confusing. One of the main ideas behind using conditions was to change the signature of I/O functions. Instead of read_be_u32() returning a result, it returned a u32. Errors were notified via a condition, and if you caught the condition you understood that the "zero value" returned is actually a garbage value. These zero values are often difficult to understand, however. One case of this is the read_bytes() function. The function takes an integer length of the amount of bytes to read, and returns an array of that size. The array may actually be shorter, however, if an error occurred. Another case is fs::stat(). The theoretical "zero value" is a blank stat struct, but it's a little awkward to create and return a zero'd out stat struct on a call to stat(). In general, the return value of functions that can raise error are much more natural when using a Result as opposed to an always-usable zero-value. * Conditions impose a necessary runtime requirement on *all* I/O. In theory I/O is as simple as calling read() and write(), but using conditions imposed the restriction that a rust local task was required if you wanted to catch errors with I/O. While certainly an surmountable difficulty, this was always a bit of a thorn in the side of conditions. * Functions raising conditions are not always clear that they are raising conditions. This suffers a similar problem to exceptions where you don't actually know whether a function raises a condition or not. The documentation likely explains, but if someone retroactively adds a condition to a function there's nothing forcing upstream users to acknowledge a new point of task failure. * Libaries using I/O are not guaranteed to correctly raise on conditions when an error occurs. In developing various I/O libraries, it's much easier to just return `None` from a read rather than raising an error. The silent contract of "don't raise on EOF" was a little difficult to understand and threw a wrench into the answer of the question "when do I raise a condition?" Many of these difficulties can be overcome through documentation, examples, and general practice. In the end, all of these difficulties added together ended up being too overwhelming and improving various aspects didn't end up helping that much. A result-based I/O error handling strategy also has shortcomings, but the cognitive burden is much smaller. The tooling necessary to make this strategy as usable as conditions were is much smaller than the tooling necessary for conditions. Perhaps conditions may manifest themselves as a future entity, but for now we're going to remove them from the standard library. Closes rust-lang#9795 Closes rust-lang#8968
This has been a long time coming. Conditions in rust were initially envisioned as being a good alternative to error code return pattern. The idea is that all errors are fatal-by-default, and you can opt-in to handling the error by registering an error handler. While sounding nice, conditions ended up having some unforseen shortcomings: * Actually handling an error has some very awkward syntax: let mut result = None; let mut answer = None; io::io_error::cond.trap(|e| { result = Some(e) }).inside(|| { answer = Some(some_io_operation()); }); match result { Some(err) => { /* hit an I/O error */ } None => { let answer = answer.unwrap(); /* deal with the result of I/O */ } } This pattern can certainly use functions like io::result, but at its core actually handling conditions is fairly difficult * The "zero value" of a function is often confusing. One of the main ideas behind using conditions was to change the signature of I/O functions. Instead of read_be_u32() returning a result, it returned a u32. Errors were notified via a condition, and if you caught the condition you understood that the "zero value" returned is actually a garbage value. These zero values are often difficult to understand, however. One case of this is the read_bytes() function. The function takes an integer length of the amount of bytes to read, and returns an array of that size. The array may actually be shorter, however, if an error occurred. Another case is fs::stat(). The theoretical "zero value" is a blank stat struct, but it's a little awkward to create and return a zero'd out stat struct on a call to stat(). In general, the return value of functions that can raise error are much more natural when using a Result as opposed to an always-usable zero-value. * Conditions impose a necessary runtime requirement on *all* I/O. In theory I/O is as simple as calling read() and write(), but using conditions imposed the restriction that a rust local task was required if you wanted to catch errors with I/O. While certainly an surmountable difficulty, this was always a bit of a thorn in the side of conditions. * Functions raising conditions are not always clear that they are raising conditions. This suffers a similar problem to exceptions where you don't actually know whether a function raises a condition or not. The documentation likely explains, but if someone retroactively adds a condition to a function there's nothing forcing upstream users to acknowledge a new point of task failure. * Libaries using I/O are not guaranteed to correctly raise on conditions when an error occurs. In developing various I/O libraries, it's much easier to just return `None` from a read rather than raising an error. The silent contract of "don't raise on EOF" was a little difficult to understand and threw a wrench into the answer of the question "when do I raise a condition?" Many of these difficulties can be overcome through documentation, examples, and general practice. In the end, all of these difficulties added together ended up being too overwhelming and improving various aspects didn't end up helping that much. A result-based I/O error handling strategy also has shortcomings, but the cognitive burden is much smaller. The tooling necessary to make this strategy as usable as conditions were is much smaller than the tooling necessary for conditions. Perhaps conditions may manifest themselves as a future entity, but for now we're going to remove them from the standard library. Closes #9795 Closes #8968
The current code for
str::not_utf8
condition does this:It might be more reasonable for condition handlers to recover in this situation if they had a reference to the original byte vector, rather than just a string describing what was wrong.
(In one case, the byte vector being passed is a
~[u8]
, but in another it is&[u8]
; in the latter, case, attempting to do this would run into #5370, unless we were willing to allocate a copy when raising the condition, which is probably not a good idea.)(Update: blake2-ppc makes a reasonable suggestion in his comment below, though it involves a more substantial revision to the code. I've generalized the title of this ticket accordingly.)
The text was updated successfully, but these errors were encountered: