Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for best practices around specializing fmt::formatter #2086

Closed
yeswalrus opened this issue Jan 6, 2021 · 29 comments
Closed

Comments

@yeswalrus
Copy link
Contributor

This is a request/discussion issue:
I feel the documentation around how to do this well is somewhat minimal. If there are known best practices, it'd be really nice to document them, perhaps by adding a few more complex examples. One good case might be how to provide formatting for a custom 'point' type. Such a type might have an underlying floating point storage, so a user might want to be able to format using the full range of floating point specs and have them be used. Are there known pitfalls to doing this? Certainly the width spec is a little weird since it's hard to say if it's the user's intent for that to apply to each coordinate or the overall point output. Currently this would require re-implementing (or just copy-pasting) the internals of the existing floating point spec parser since those fields are all understandably private to, I'm guessing, limit API surface. Is this even an expected use-case?

@yeswalrus
Copy link
Contributor Author

Edit: This is a bad example since this just works with ranges.h. However, imagining we wanted to do it for a struct with x and y coordinates or some other thing where we might want to handle a similar range of formatting specifiers would be interesting.

@vitaut
Copy link
Contributor

vitaut commented Jan 12, 2021

Thanks for the suggestion. I agree that the extension API documentation is somewhat lacking at the moment.

Are there known pitfalls to doing this?

Not sure about pitfalls, the format spec API just needs some cleanup and documentation.

Is this even an expected use-case?

Yes.

@AndrewJD79
Copy link

It's not clear from documentation should I use FMT_STRING or simple string literal in case of I need the best performance?

@vitaut
Copy link
Contributor

vitaut commented Feb 18, 2021

should I use FMT_STRING or simple string literal in case of I need the best performance?

There is no difference in perf between the two.

@LukeMauldin
Copy link

Is there a link somewhere to the updated documentation on overloading fmt::formatter? I am able to override it for a base class but it is not being invoked on a derived class.

@vitaut
Copy link
Contributor

vitaut commented Mar 8, 2021

The documentation for the latest release is available here: https://fmt.dev/latest/api.html#formatting-user-defined-types

@vitaut vitaut changed the title Improve documentation for best practices around overloading fmt::formatter Improve documentation for best practices around specializing fmt::formatter Jul 9, 2021
@mmolch
Copy link

mmolch commented Jan 10, 2022

Hi, I am new to fmt and since my small issue deals with the same section of the documentation, I decided to put it here.
I tried to print an std::array and got the following error message:
error: static assertion failed: Cannot format an argument. To make type T formattable provide a formatter specialization: https://fmt.dev/latest/api.html#udt

It's great that there's a link, but unfortunately there's no immediate mention of ranges.h so I unknowingly reinvented the wheel and only just now found out about it. It would be great IMHO if the API reference would have a short notice right at the beginning of the "Formatting User-defined Types" section saying something like:
Note: The fmt library already contains formatters for a lot of C++ standard data types. See ranges.h (link to #ranges-api) for containers and tuples and chrono.h (link to #chrono-api) for date and time formatting.

Maybe the error message itself could be a little more verbose in this regard as well?

Thank you very much for this great library and kind regards,
Moritz

@vitaut
Copy link
Contributor

vitaut commented Jan 13, 2022

@mmolch, good idea, thanks! Added (slightly tweaked) to the current dev docs in https://fmt.dev/dev/api.html#udt.

@provtemp
Copy link

@vitaut There you have »The formatter should parse specifiers until '}' or the end of the range.« - wouldn't you have to check at least for }} to follow the convention that doubling escapes the delimiter?

@vitaut
Copy link
Contributor

vitaut commented Apr 15, 2022

wouldn't you have to check at least for }} to follow the convention that doubling escapes the delimiter?

No, } is not allowed in format specs, at least not in the standard ones.

@provtemp
Copy link

@vitaut So implementations of the parse method are free to pick an end point as they see fit, so long as it's a } between ctx.begin() and ctx.end(), like all 3 of these points would be fine:

{0:}}}{1}
   1 2  3

I would have expected that I can look at a format string and tell what is literal text and what isn't.

@vitaut
Copy link
Contributor

vitaut commented Apr 17, 2022

In theory yes but I wouldn't recommend it. None of the standard formatters do this.

@cfyzium
Copy link

cfyzium commented Apr 21, 2022

Correct me if I'm wrong, but it seems the only options for custom formatter are either do format spec parsing manually or completely delegate the whole process to some other existing formatter, with no utilities to parse the standard format spec string.

If I want a custom formatter that supports standard format spec parameters like fill/alignment, the proposed solution is to derive from fmt::formatter<fmt::string_view> to reuse its behavior.

Well, the base formatter may handle fill and alignment, but what about other parameters: floating-point format and precision, sign, alternative form, locale flag? There is no access to these in a derived formatter.

Furthermore, treatment of some parameters becomes somewhat inconsistent. The default alignment of string and numeric arguments is different:

fmt::format("{:6}", 10);                // "      10"
fmt::format("{:6}", Point<int>{1, 2});  // "(1,2)   "

And even if you somehow do a scan of the spec string to extract the precision option, the base string formatter has a very different purpose for precision:

fmt::format("{:.2}", Point<float>{1.5f, 2.0f});  // "(1" instead of "(1.50,2.00)"

Ah, and things like {:e} are impossible.

Seems like reusing existing formatters for anything beyond basic string alignment is not feasible.

But then it means in the general case you have to reimplement the entire formatting spec parsing and some of the standard formatting behavior from scratch.

Which is a fairly complex task with a few obscure and subtle parts. Things like access to nested width and precision arguments are complicated, conformance to standard behavior is tedious, and how many people would notice right away that the fill character may not be a single char but a multi-byte UTF-8 codepoint?

Complex, error-prone, and not very productive task but what are the alternatives? While the library parses standard format spec string in its base formatter implementation, it seems to be hidden inside the detail namespace with no official way to access the parse results.

I think it would be useful to have a public type for parsing results of standard format spec, with either a field accessible for inspection and modification in derived formatters or simply a utility function to explicitly parse the standard format spec.

@vitaut
Copy link
Contributor

vitaut commented Apr 22, 2022

Format string parsing is indeed not exposed in the public API at the moment.

@gix
Copy link
Contributor

gix commented Nov 18, 2022

The point example could be realized using a nested fmt::formatter<double> to parse the spec and then format each value. This also highlights that formatting for composites is tricky. Should fill apply to each item, or the whole?

The current printf-style formatting syntax doesn't lend itself well to this, IMO. It would be easier if padding/alignment were specified separately, like with .NET format strings, where {0,-10:D02} gives you a left-aligned space-padded 0-filled number.

@vitaut
Copy link
Contributor

vitaut commented Nov 20, 2022

Should fill apply to each item, or the whole?

Fill should normally apply to the whole item.

There are no plans to switch to a different format string syntax.

@BenFrantzDale
Copy link

BenFrantzDale commented Dec 20, 2022

I came here to open a ticket like this. I'd be happy to propose some additional wording. Here's what I'm looking for:

  1. A dead-simple example: A type that only allows {} as its format specification. (In my mind, this is the analog of overloading std::ostream& operator<<(std::ostream& os, const MyType&).)
  2. Clarity about exactly what parse gets and returns. I think I've doped it out here: https://godbolt.org/z/cEE18er7z

For 1, I propose:

#include <fmt/format.h>

//! Simplest possible struct: one with only one state:
struct my_monostate {};

template <> struct fmt::formatter<my_monostate> {
  // Parses format specification, which must be empty:
  constexpr auto parse(format_parse_context& ctx) -> decltype(ctx.begin()) {
    // [ctx.begin(), ctx.end()) is a character range. It can be:
    // 1. Empty in the case of "{}" or "{1}" (i.e., no ":" in the braces)
    // 2. The everything after the ":" if there is a ":" in the braces. E.g.:
    //    * "foo {:} bar" => "} bar".
    //    * "foo {0:} bar" => "} bar".
    //    * "foo {:xyz} bar" => "xyz} bar".
    //    * "foo {0:xyz} bar" => "xyz} bar".
    // In case 1, parse should return ctx.begin() which of course equals ctx.end().
    // In case 2, parse should return the iterator to the closing brace.
    // In case of parse error, format_error should be thrown.
    
    // Here we only want to allow empty format strings:
    // We could actually just return ctx.begin(), and our caller would
    // throw an exception if the format specification weren't empty.
    auto it = ctx.begin();
    auto end = ctx.end();
    if (it == end || *it == '}') return it;
    throw format_error("invalid format"); 
  }

  // Formats the object using the parsed format specification (presentation)
  // stored in this formatter.
  template <typename FormatContext>
  auto format(const my_monostate& x, FormatContext& ctx) const /* Optionally: -> decltype(ctx.out())*/ {
    // ctx.out() is an output iterator to write to.
    return fmt::format_to(ctx.out(), "my_monostate@{}", 
                          static_cast<const void*>(&x));
  }
};

With this in mind, I think this is the simplest formatter that I'm looking for:

template <> struct fmt::formatter<my_type> {
  constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }

  auto format(const auto& x, auto& ctx) const {
    return fmt::format_to(ctx.out(), "{}", static_cast<const void*>(&x));
  }
};

Or for C++17:

template <> struct fmt::formatter<my_type> {
  constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }

  template <typename FormatContext>
  auto format(const my_type& x, FormatContext& ctx) const {
    return fmt::format_to(ctx.out(), "{}", static_cast<const void*>(&x));
  }
};

as a replacement for

std::ostream& operator<<(std::ostream& os, const my_type& x) {
    return os << static_cast<const void*>(&x);
}

It would be nice (maybe it exists?) if there were a base class to get the dummy parse behavior, so the simplest boilerplate would be

template <> struct fmt::formatter<my_type> : fmt::parse_must_be_empty {
  auto format(const my_type& x, auto& ctx) const {
    return fmt::format_to(ctx.out(), "{}", static_cast<const void*>(&x));
  }
};

@BenFrantzDale
Copy link

@vitaut if you give me a bit of feedback on my proposed wording, I'm happy to PR it. Or if you'd rather I could PR a change in wording and you can comment on it there. I'm not confident in my wording, which, of course, is why I want clearer documentation :-D

@vitaut
Copy link
Contributor

vitaut commented Jan 24, 2023

@BenFrantzDale, thanks for the suggestion but I'm not sure if my_monostate example adds much compared to the existing examples.

@VictorEijkhout
Copy link

Could someone also add an example of std::formatter on a templated type?

template<typename I,int d>
struct fmt::formatter<coordinate<I,d>> { .... }

but I get a screen full of error message ending:

/opt/local/include/gcc12/c++/type_traits:980:52: error: static assertion failed: template argument must be a complete class or an unbounded array
  980 |       static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),

@liushapku
Copy link

The documentation for the latest release is available here: https://fmt.dev/latest/api.html#formatting-user-defined-types

Hi From the parse function definition in the above example, I would feel that if a user provided a non-closing {, the function will just return end. Is this a valid format string? From practice, I saw that an exception is thrown. It looks like that before the parse is called, there is some other mechanism to verify the format string is a valid string. Is this true?

If the above is true, that means an openning { will always be matched with a }, how can it == end happen? Looks like that the comment "// Check if reached the end of the range:" to be not necessary.

@vitaut
Copy link
Contributor

vitaut commented Feb 27, 2023

Is this a valid format string?

No because format specs should either be followed by } or the end of string.

@liushapku
Copy link

Is this a valid format string?

No because format specs should either be followed by } or the end of string.

do you mean fmt::print('{', 42) has a valid format spec? sounds wierd

@vitaut
Copy link
Contributor

vitaut commented Feb 28, 2023

It is not valid which you can easily check but formatter can be used on it's own to parse just specs which is sometimes useful.

@liushapku
Copy link

It is not valid which you can easily check but formatter can be used on it's own to parse just specs which is sometimes useful.

Yeah. it is not valid as I checked. Then how can this be true: "format specs should either be followed by } or the end of string".
If I put { as spec, then there is an empty spec at the end of string.

Did I miss something here?

@vitaut
Copy link
Contributor

vitaut commented Feb 28, 2023

As I wrote before:

formatter can be used on it's own to parse just specs

Also you need the end check to correctly handle invalid format strings. Such strings still need to be parsed up to the error and diagnosed which means that you cannot always rely on existence of terminating }.

emmanuelthome added a commit to cado-nfs/cado-nfs that referenced this issue Mar 27, 2023
I would agree with fmtlib/fmt#2086 on that.
@TobiSchluter
Copy link
Contributor

TobiSchluter commented Aug 16, 2023

I came here to open a ticket like this. I'd be happy to propose some additional wording. Here's what I'm looking for:

  1. A dead-simple example: A type that only allows {} as its format specification. (In my mind, this is the analog of overloading std::ostream& operator<<(std::ostream& os, const MyType&).)
    ...
    For 1, I propose:
    ...
template <> struct fmt::formatter<my_monostate> {
  // Parses format specification, which must be empty:
  constexpr auto parse(format_parse_context& ctx) -> decltype(ctx.begin()) {
    // [ctx.begin(), ctx.end()) is a character range. It can be:
    // 1. Empty in the case of "{}" or "{1}" (i.e., no ":" in the braces)
    // 2. The everything after the ":" if there is a ":" in the braces. E.g.:
    //    * "foo {:} bar" => "} bar".
    //    * "foo {0:} bar" => "} bar".
    //    * "foo {:xyz} bar" => "xyz} bar".
    //    * "foo {0:xyz} bar" => "xyz} bar".
    // In case 1, parse should return ctx.begin() which of course equals ctx.end().
    // In case 2, parse should return the iterator to the closing brace.
    // In case of parse error, format_error should be thrown.
    
    // Here we only want to allow empty format strings:
    // We could actually just return ctx.begin(), and our caller would
    // throw an exception if the format specification weren't empty.
    auto it = ctx.begin();
    auto end = ctx.end();
    if (it == end || *it == '}') return it;

Shouldn't that return it++ if *it == '}' ? I.e.
if (it == end || *it++ == '}') return it;

Edit: Ignore the following, I just spotted #3526 in the release notes. So a change of behavior is expected.
It also seems to be a change WRT to previous versions that checking for begin == end isn't enough any longer. That, or the behavior at compile-time is different, because only in 10.1.0 I could get compile-time checks running reliably and after updating to 10.1.0 I'm getting compile-time failures about begin != end in my code.

@vadz
Copy link
Contributor

vadz commented Sep 7, 2023

I'm not sure if this belongs here, but it would be nice if the documentation mentioned the possibility to set format_str_ in the ctor of a formatter inheriting from formatter<std::tm> as this seems the simplest way to specify the default representation of some custom date-like class -- and I wouldn't be able to implement formatting for it without duplicating all the existing code if I didn't stumble upon this (so I also hope that this remains supported in the future fmt versions).

@vitaut
Copy link
Contributor

vitaut commented Sep 18, 2023

The documentation has been completely revamped. In particular, it now contains some recommended practices such as how to handle fill, align and width. It also documents the new APIs such as format_as and nested_formatter (experimental) that make supporting new types much easier. In particular, the point formatter that follows all the recommended practices and supports all standard specifiers can be implemented as follows:

template <>
struct fmt::formatter<point> : nested_formatter<double> {
  auto format(point p, format_context& ctx) const {
    return write_padded(ctx, [=](auto out) {
      return format_to(out, "({}, {})", nested(p.x), nested(p.y));
    });
  }
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests