Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String formatting #638

Closed
3 tasks done
DangerOnTheRanger opened this issue Jan 27, 2023 · 0 comments · Fixed by #775
Closed
3 tasks done

String formatting #638

DangerOnTheRanger opened this issue Jan 27, 2023 · 0 comments · Fixed by #775
Assignees

Comments

@DangerOnTheRanger
Copy link
Contributor

Feature request checklist

  • There are no issues that match the desired change
  • The change is large enough it can't be addressed with a simple Pull Request
  • If this is a bug, please file a Bug Report.

(This is a design proposal that is meant to be largely implemented by #617 - as per offline discussion with @TristonianJones, the escaping functionality that was originally part of the formatting proposal will be implemented as a separate function, so a tracking issue seems like a good fit to cover both.)

Overview

String formatting is handled with string.format, which has the following signature:

string.format(formatStr, args)

Where formatStr is the formatting string, and args is a list containing arguments relevant to the formatting string. There is a second overloaded signature for locale identifiers (see Internationalization for more):

string.format(formatStr, args, locale)

Implementation-wise, the formatting string is walked in a linear fashion to build up the output string, either copying from the formatting string or inserting members of args where appropriate. A potential optimization would be to do this once and cache/memoize the resulting built/formatted string, but benchmarking should probably be done to judge whether or not that would be valuable. Another optimization would be to build up an optimized, opcode/machine-readable version of any format calls on string constants (which would also fit in nicely with compile-time checking of formatting directives).

Formatting syntax

The formatting string uses a syntax similar to the printf/sprintf implementations found in languages like C and Go. Formatting clauses that start with % get replaced with an item from args at the index implied by how many clauses there were prior. For instance:

"%s, %s, %s".format(["A", “B”, “C”])

This would evaluate to A, B, C.

Conversion from non-string-types is handled by the equivalent of wrapping them with a call to CEL’s ConvertToType, except where datatype-specific formatting clauses are used (see Formatting clauses). The full list of supported CEL primitive types is:

  • string
  • bool
  • bytes
  • double
  • duration
  • int
  • list
  • timestamp

For arbitrary objects, their fields are printed as a list of comma-separated key-value pairs in ascending alphabetical order, all wrapped by a pair of curly brackets. This is similar to the behavior of Go’s %#v verb. For instance, given an object someObject, with a field someStr with a value of “a string” and a field someInt with a value of 42, string.format(“%s”, [someObject]) would yield:

“{someInt: 42, someStr: \”a string\”}”

As shown in the example above, string values printed as a part of this process are quoted/escaped as if they were passed to string.format after being escaped (See [Escaping strings](#Escaping strings)).

For escaping % (which is the character used to specify formatting directives), %% evaluates to a single %. For instance, string.format(“%%”, []) escapes to “%”. Note that embedding the equivalent Unicode/ASCII escape sequence (0x0025) is supported as well.

Some formatting clauses support a precision. For instance, to format a double with 2 decimals of precision:

“%.2f”.format([0.269])

This would give:

0.27

The Formatting clauses section goes into more detail about formatting clauses.

Outside of the instances mentioned above for argument substitution, all other characters are used unmodified in the output string.

Formatting clauses

As mentioned above, formatting clauses are specified as characters following the % character. For the initial release, f (for fixed-point), e (for scientific notation), b (for binary formatting), h (for hexadecimal formatting), and o (for octal formatting) will be supported, but others could be added in the future.

Naturally, an argument that doesn’t support the formatting directive results in an error. For instance, the following is not allowed:

“%.4f”.format([“not a number”])

For fixed-point formatting clauses, the character f must have a precision specified. In other words, the following is not allowed:

“%f”.format([0.12345])

For cases where the number of requested digits after the decimal is greater than the precision of the actual number given, the resulting string is padded with zeros. For example:

“%.3f”.format([0.3])

This gives:

“0.300”

For cases where the given number has higher precision than the number of requested digits, the resulting string uses a rounded version of the given number. For example:

“%.2f”.format([0.347])

This gives:

“0.35”

This mirrors the behavior found in Go and Python.

For scientific notation, there is the e formatting clause. E (in upper-case) is also valid, which changes the case of the e character used in the resulting string. In both cases, the character must be followed by the precision that should be used, so the following is not allowed:

string.format(“%e”, [0.3])

As mentioned earlier, the case of the formatting clause character can be changed to upper-case in order to change the case of the e in the resulting string. So for instance, given:

string.format(“%.6e”, [1052.032911275])

This gives:

“1.052033E+03”

Formatting integers as binary can be done with the b formatting clause. For instance, given:

string.format(“%b”, [5])

This outputs:

“101”

Likewise, h (and H for uppercase hexadecimal digits) is supported for formatting integers in base 16/hexadecimal. Again, given:

string.format(“%h”, [15])

This gives:

“F”

For formatting integers in base 8/octal there is the o formatting clause. As an example, if given:

string.format(“%o”, [11])

This will output:

“13” 

Error handling

When the formatting string is a constant, the CEL compiler will check at compile/type-checking-time if the arguments passed to format match with the formatting directives. Any mismatches result in a compile-time error. Unused arguments also cause a compile-time error. For instance, the following will result in both errors:

“%d %d”.format([0, 1, 2])

In other cases (if the string is a variable or otherwise its own expression), errors that occur during formatting are indicated by adding additional output to the end of the formatted string.

Internationalization

Since formatting things like numbers is locale-specific, string.format needs a way to deal with different locales. CEL allows function overloading, and so string.format can take a third argument that represents a locale identifier. If any locale-specific formatting is used, the locale identifier will modify the output as is appropriate. For example, given a formatting string that includes a fixed-point formatting clause and a French locale identifier:

string.format(“%.2f”, [3.14], “fr”)

This gives:

“3,14”

The full list of formatting clauses that are affected by the locale identifier is:

  • f (locale affects whether . or , is used as the decimal point)
  • E and e (locale affects whether . or , is used as the decimal point)

An unrecognized locale identifier is considered a compile-time error if the identifier is a string constant. If the identifier is not a constant (the user passed in a string variable instead), then a warning will be appended to the formatted string, and the default locale (en_US) will be used instead.

If the locale argument is not passed, or if an invalid non-constant locale identifier is passed, then the locale defaults to en_US. This is the case regardless of the locale that the CEL implementation is operating in or built with.

String escaping

There will also be a separate function implemented as part of this proposal - string.escape, which escapes non-printable characters.

For example, given:

“this is a string \n with no newline”.escape()

This would result in:

“\”this is a string \\n with no newline\””

If printed, this would result in the user seeing a literal “\n” instead of a newline. This function is useful for cases where a string needs to be printed (as a part of an error, for instance) while safely ensuring surrounding formatting is preserved. The complete list of escape sequences that string.escape covers is:

  • \a (bell)
  • \b (backspace)
  • \f (form feed)
  • \n (newline)
  • \r (carriage return)
  • \t (horizontal tab)
  • \v (vertical tab)

Alternatives considered

Automatically deriving locale

As an alternative to the manually-specified locale identifier that can be passed as the third argument to string.format, the locale could be derived from the OS/environment variables.

However, Golang does not provide a built-in means of detecting locale, and the way locale is stored varies from OS to OS. As far as I’m aware, CEL doesn’t limit support to one particular OS or another (cel-go should work everywhere Golang does), so it is not clear which OSes need implementations.

Indexed argument substitution

With Python’s f-strings, it is possible to do something like:

“{2} {1} {0}”.format(“C”, “B”, “A”)

Which outputs:

“A B C”

However, this can be accomplished with implicit incrementing by reordering the arguments. For indexed arguments, there is also the problem of adding/removing an argument to the string; any addition not done at the end of the string (or at least after all other substitutions) will result in needing to re-number all other indices.

Another option is having both (Python does this), but for the sake of keeping CEL and its spec simple, I believe it is best to pick one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants