Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - format implementation #32

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

reubent
Copy link
Contributor

@reubent reubent commented Jan 30, 2021

The implementation of format is cool but probably overcomplex for what it actually does. There's also a TODO to add positional parameters.

I've written a (much) simpler implementation that supports positional parameters. You'll find it in format_test.go as quickformatBuiltin at the moment. I've also written a series of tests and benchmarks for both implementations.

As it stands, there is code there to support unmatched brackets, but the more I think about it, the less desirable I think that actually is. Mistakes are obvious because the number of brackets won't match the number of parameters and this gives a way to use a literal curly bracket. Considering the performance impact of it as well, this seems reasonable.

With unmatched checking turned off, performance for this version is around 20% better for simple strings:

BenchmarkQuickFormatString
BenchmarkQuickFormatString-8                      	  727240	      1533 ns/op

versus

BenchmarkFormatString
BenchmarkFormatString-8                           	  631561	      1887 ns/op

With a lot of tokens, the gap is a lot wider:

BenchmarkQuickFormatStringWithATonOfTokens
BenchmarkQuickFormatStringWithATonOfTokens-8      	  116503	     10249 ns/op

versus

BenchmarkFormatStringWithATonOfTokens
BenchmarkFormatStringWithATonOfTokens-8           	   69565	     16725 ns/op

Performance is marginally slower with positional tokens, but not a lot:

BenchmarkQuickFormatStringWithPositionalToken
BenchmarkQuickFormatStringWithPositionalToken-8   	  727245	      1666 ns/op

Turning on the simple bad token check, performance drops to:

BenchmarkQuickFormatString
BenchmarkQuickFormatString-8                      	  599984	      1723 ns/op
BenchmarkQuickFormatStringWithATonOfTokens
BenchmarkQuickFormatStringWithATonOfTokens-8      	   88561	     13319 ns/op

and if I use the "comprehensive" check:

BenchmarkQuickFormatString
BenchmarkQuickFormatString-8                      	  380971	      2916 ns/op
BenchmarkQuickFormatStringWithATonOfTokens
BenchmarkQuickFormatStringWithATonOfTokens-8      	   18986	     63204 ns/op

Copy link
Contributor

@bmatsuo-at-luthersystems bmatsuo-at-luthersystems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting benchmarks. The existing format-string does seem a bit excessive given that it hasn't really needed any extension since it was written. So I'd be open to a simplification.

I think there are some cut corners with your replacement functions. I would be interested to know what the benchmark numbers would look like if those issues were to be addressed.

if format.Type != LString {
return env.Errorf("first argument is not a string")
}
pattern := formatTokenRegexp.ReplaceAllString(strings.ReplaceAll(strings.ReplaceAll(format.Str, "%", "%%"), "{}", "%s"), "%[$1]s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain hurts a bit trying to parse this. This also doesn't catch syntax errors at all (e.g. unclosed curly brace).

func quickformatBuiltin(env *LEnv, args *LVal) *LVal {
format := args.Cells[0]
//if formatBadTokenPattern.MatchString(format.Str) {
//return env.Errorf("unmatched token")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a test/error for unmatched tokens. The existing format-string does this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Maybe this was trying to catch unclosed curly braces? What I meant was that format-string will report an error if you don't supply enough values.

elps> (format-string "hello {}, it's {} to meet you" "world")
stdin:1: lisp:format-string: too many formatting direcives for supplied values
Stack Trace [1 frames -- entrypoint last]:
  height 0: stdin:1: lisp:format-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do that - it's just fmt.Sprintf inside, so it'll handle it the same way that Go does with a warning in the output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. fmt just outputs junk when there's a mismatch between argument and substitutions. But I don't think that is very good behavior for a language like lisp. I think a runtime error is preferable.

The annoying thing about the fmt package is that format argument mismatches can literally go unnoticed for onths or years. I think they chose to do that instead of forcing programmers to check errors for every formatting call. I've never really seen another argument for why formatting should work that way (though I guess I haven't looked hard if you had one).

In dynamic languages like python and ruby format argument mismatches cause runtime exceptions. Even if it's not completely preferable, I think it's more expected of a language like elps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants