Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we have a fallible Writeable? #4741

Closed
sffc opened this issue Mar 26, 2024 · 19 comments · Fixed by #4787
Closed

Should we have a fallible Writeable? #4741

sffc opened this issue Mar 26, 2024 · 19 comments · Fixed by #4787
Labels
C-datetime Component: datetime, calendars, time zones

Comments

@sffc
Copy link
Member

sffc commented Mar 26, 2024

Our .format functions are generally low-cost, pushing all business logic to the stage when we have a sink available. Unfortunately, this means that these code paths cannot have any errors. Currently we convert the DateTimeError to a core::fmt::Error, but this is wrong because it makes ToString panic.

We could implement a trait such as

pub trait WriteableWithError {
    type Error;
    pub fn write_to_with_error(&self, sink) -> Result<(), Self::Error>
}

(or an enum between core::fmt::Error and Self::Error)

@robertbastian
Copy link
Member

This will be a pain for components like list formatter that take Writeables as inputs. I like the current model because you can pass around Writeables and don't have to deal with errors from other domains, figure out what a good replacement value would be, etc. Once you create the Writeable it will work.

@sffc
Copy link
Member Author

sffc commented Mar 26, 2024

Letting clients choose how to handle their errors seems beneficial. The WriteableWithError can have functions like

  • as_panicky() returning a Writeable that either panics or returns core::fmt::Errors. This can go on the trait and be default-impl'd.
  • validated() returning a Result<Writeable> that checks the error condition. This would be specific to the concrete type implementing WriteableWithError.

@sffc
Copy link
Member Author

sffc commented Mar 26, 2024

I guess we could also do this only in the datetime crate without making a trait, and we can make it into a trait later.

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting C-datetime Component: datetime, calendars, time zones discuss-priority Discuss at the next ICU4X meeting labels Mar 26, 2024
@sffc sffc added this to the ICU4X 2.0 milestone Mar 26, 2024
@robertbastian
Copy link
Member

I don't think we should ever panic implicitly through core::fmt:Error, that is violating the trait interface. So you might as well pass around a Result<Writeable, Error> if you want to unwrap later.

@sffc
Copy link
Member Author

sffc commented Apr 2, 2024

This comes up again in the named placeholder pattern. The map of placeholder keys to values may or may not have all of the required placeholders. Checking it in the constructor takes time and we still need unwraps or GIGO in the interpolate function.

@sffc
Copy link
Member Author

sffc commented Apr 2, 2024

Writing down some more thoughts:

  • FallibleWriteable can have two default-implemented functions .lossy() and .panicky() that return wrappers referencing &self and perform one of the two behaviors
  • The error associated type should implement Writeable and that impl is used in lossy mode
  • The type implementing FallibleWriteable could also implement Writeable and use a debug-assert-or-lossy implementation. Not sure if this should be the default implementation or not.
  • To support both lossy and panicky modes, I think we need a .write_to_with_error_handler function which causes two different behaviors to be used, one that returns early and the other that recovers and continues writing.

@sffc
Copy link
Member Author

sffc commented Apr 3, 2024

See #4772 for an implementation of this.

@robertbastian
Copy link
Member

I'd like to reconsider this statement:

Our .format functions are generally low-cost, pushing all business logic to the stage when we have a sink available.

We could very well do this the other way around. It might even be more natural, as once constructed a FormattedDateTime can be cheaply written multiple times. I don't really see a reason for deferring the logic until write-time.

@sffc
Copy link
Member Author

sffc commented Apr 9, 2024

Current design:

pub trait FallibleWriteable {
    type Error;
    fn fallible_write_to<
        W: fmt::Write + ?Sized,
        L: Writeable,
        E,
        F: FnMut(Self::Error) -> Result<L, E>,
    >(
        &self,
        sink: &mut W,
        handler: F,
    ) -> Result<Result<(), E>, fmt::Error>;
}

pub trait TryWriteable: FallibleWriteable {
    #[inline]
    fn checked(&self) -> CheckedWriteable<&Self> {
        CheckedWriteable(self)
    }
    #[inline]
    fn lossy(&self) -> LossyWriteable<&Self> {
        LossyWriteable(self)
    }
}

impl<T> TryWriteable for T where T: FallibleWriteable {}

Simpler design:

pub trait TryWriteable {
    type Error;
    /// Implementers MUST finish writing the writeable in the error case
    /// with appropriate replacements, and then return the error. The
    /// error can be dropped by the caller in lossy mode.
    fn try_write_to<
        W: fmt::Write + ?Sized,
    >(
        &self,
        sink: &mut W,
    ) -> Result<Result<(), Self::Error>, fmt::Error>;

    // Other functions:
    fn try_write_to_parts(...) -> Result<Result<(), Self::Error>, fmt::Error>
    fn try_write_to_string(...) -> Result<String, Self::Error>;
    
    /// Invariant: returns length of lossy string
    fn try_writeable_length(...) -> LengthHint;
    /// Invariant: compares to lossy string
    fn try_write_cmp_bytes(...) -> cmp::Ordering;
}

// No need for `.lossy()` etc functions. At call sites:
let _ = dtf.format(...).try_write_to(...)?; // lossy
dtf.format(...).try_write_to(...)?.unwrap(); // panicky
let _ = dtf.format(...).try_write_to(...)?.map_err(|e| {debug_assert!(false, "{e:?}"); e}); // gigo
  • Pattern: "Today is {weekday} at {hour} {period}"
  • Example Output: "Today is (weekday 2) at 10 am"
  • Buggy but "fine" output: "Today is " (if impl uses ?)

Discussion:

  • @robertbastian - Can we check the invariants in the format function?
  • @sffc - We'd need to put the writable artifacts in an intermediate data store. We could use smallvec or something for that.
  • @robertbastian - Do people want to write the writeable multiple times? I guess they could just clone the writeable
  • @sffc - Generally I think the main purpose for the Writeable trait is that it is the most convenient API to choose the sink or format_to_parts.
  • @robertbastian - I think the current FallibleWriteable is too complex
  • @Manishearth - Yeah, people will try implementing it, not figure it out, and give up
  • @sffc - The new "simpler design" solves that problem, but the complex solution has the nice property that if you do manage to implement it, you have a correct implementation.
  • @robertbastian - Can't the error recovery happen at the call site?
  • @sffc - In order for lossy mode to work, the whole string needs to be written.

Approved above design.

LGTM: @robertbastian @Manishearth @sffc

@sffc
Copy link
Member Author

sffc commented Apr 9, 2024

The GIGO case can be written as follows since Rust 1.76 (a few characters shorter):

let _ = dtf.format(...).try_write_to(...)?
    .inspect_err(|e| debug_assert!(false, "{e:?}"));

https://doc.rust-lang.org/std/result/enum.Result.html#method.inspect_err

@sffc
Copy link
Member Author

sffc commented Apr 9, 2024

An issue I encountered while implementing this regarding try_writeable_length_hint and try_write_cmp_bytes: we agreed on the signatures above. However, I'm not sure we considered all of the implications. Should we

  1. Keep the signatures approved above
  2. Change the names to be without try_ since they don't return a Result (this would be the same name as the Writeable functions, which is probably fine since these aren't often called in user code)
  3. Make the functions return Result which must return the same type of Result as try_write_to (both Ok or both Err for the same TryWriteable).
    • 3a: keep writeable_length_hint with a default impl
    • 3b: remove the default impl of writeable_length_hint
  4. Make the functions return Result which can be less strict (okay for it to return Ok, but if it returns Err, then write_to must also return Err)
  5. Change the signature to return something different (note: still need to discuss the strictness of the error matching the try_write_to fns)
    • 5a: -> Option<Result<LengthHint, Self::Error>>
    • 5b: -> (LengthHint, Option<Self::Error>)
  6. Introduce a new type TryLengthHint and maybe TryOrdering with the semantics we want and return it from these functions

Note on point 4: the default impl of writeable_length_hint is LengthHint::undefined(). If we keep the default impl in TryWriteable to be Ok(LengthHint::undefined()), then the invariant in point 3 is broken.

Another note: it might be useful to detect early when a Writeable is in a failure state. It seems consistent with the design of Writeable for us to bail early if writeable_length_hint returns an error, rather than having that function conceal the error and only find out later when we call write_to.

EDIT: Detecting the error case might be expensive; we don't implement LengthHint in FormattedDateTime in part because of this. So I think we still want a way to make the function return an undefined hint without regard to the error state.

@robertbastian @Manishearth

@robertbastian
Copy link
Member

In favour of 2. I think it's an advantage that they would be the same names as on Writeable, types that implement both Writeable and TryWriteable seem weird.

@robertbastian
Copy link
Member

Wait, why does write_cmp_bytes have a write_ prefix? The other methods that start with write_ do writing, it should have a writeable_ prefix like or no prefix at all.

@sffc
Copy link
Member Author

sffc commented Apr 10, 2024

Wait, why does write_cmp_bytes have a write_ prefix? The other methods that start with write_ do writing, it should have a writeable_ prefix like or no prefix at all.

Because it's what I proposed in #4402 and it didn't get any pushback. I think it's more a bug that writeable_length_hint is not write_length_hint, and we may have a chance to rethink that function in #4709.

@sffc
Copy link
Member Author

sffc commented Apr 10, 2024

We haven't released write_cmp_bytes yet, so there could still be time to bikeshed this.

@robertbastian
Copy link
Member

Why would it be write_length_hint? I can see length_hint, and writeable_length_hint to avoid conflicts with other length_hints from different traits, but the trait is Writeable, not Write.

@sffc
Copy link
Member Author

sffc commented Apr 10, 2024

Because all the other functions are called write_. I don't see a strong reason that write_ needs to be only used when it is the active verb. It can also just be Writeable's method prefix.

@robertbastian
Copy link
Member

@sffc
Copy link
Member Author

sffc commented Apr 10, 2024

aha, we did already vote... ok

sffc added a commit that referenced this issue Apr 10, 2024
Fixes #4741

---------

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
@sffc sffc removed discuss Discuss at a future ICU4X-SC meeting discuss-priority Discuss at the next ICU4X meeting labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones
Projects
Status: Done
2 participants