strconv: add QuotedPrefix #45033

dsnet · 2021-03-15T18:49:33Z

The existing Unquote function unescapes a Go string assuming that the entirety of input string is the escaped the string. However, in many parsing applications, we have a quoted string followed by some amount of unconsumed input, the presence of arbitrary characters after the quoted string currently breaks the Unquote function.

I propose adding:

// UnquoteLeading interprets the start of s as a single-quoted, double-quoted, or backquoted Go string literal.
// Any subsequent characters after the terminating quote are returned as rem.
func UnquoteLeading(s string) (out, rem string, error)

This functionality may simplify a number of standard packages that implement their own logic to determine the end of a quoted string so that they can pass the correctly sized string to strconv.Unquote:

The text was updated successfully, but these errors were encountered:

mvdan · 2021-03-17T13:57:52Z

I think this proposal makes sense; having to find the end of the quote manually means reimplementing a basic version of Unquote.

I also wonder if the input should be a []byte; I realise the package is called strconv, but the use case is precisely about reading and buffering, where one almost always has a byte slice. The fmt and reflect examples you show have to explicitly convert to string before calling strconv, for example. I think the result is less important, as it's possible to construct the result with a strings.Builder.

rsc · 2021-03-24T17:00:12Z

It may be that the operation to expose is "find me the end of this string" and not "also Unquote it".
I was discussing recently with Austin about adding a similar operation to regexp/parse, but I was thinking it would be separate from actually parsing. You don't always want to parse it, but you always need to identify a boundary.

mvdan · 2021-03-24T17:18:09Z

That sounds more flexible, with the downside that one would end up looping over the input bytes twice. That already happens today, so perhaps the cost isn't significant enough to matter.

dsnet · 2021-03-24T18:36:24Z

Given that many strings do not have any escape characters, the operation could report whether the string had any escape characters. If not, the user could trivially truncate off the leading and trailing quote characters and avoid a second pass.

rsc · 2021-04-07T18:02:38Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2021-04-14T17:12:38Z

So if we do the string-end-finder, I guess the API would be:

package strconv

// QuotedPrefix returns the quoted string (as defined by Quote and Unquote) at the prefix of s.
// If s does not start with a valid quoted string, QuotedPrefix returns an error.
func QuotedPrefix(s string) (string, error)

?

Reporting the presence of absence of escape characters seems like premature optimization.

dsnet · 2021-04-14T17:15:36Z

Wouldn't you need to know where the prefixed quoted string ended so that the caller can start parsing what comes afterwards?

Reporting the presence of absence of escape characters seems like premature optimization.

I agree. In my all my use-cases, performance was not paramount. Also, this loop does not seem hard to write in the extremely performance sensitive cases.

ianlancetaylor · 2021-04-14T19:14:29Z

Wouldn't you need to know where the prefixed quoted string ended so that the caller can start parsing what comes afterwards?

You can call the len function on the result.

dsnet · 2021-04-14T19:16:51Z

Ah yes, I apologize. I missed the part where it "returns the quoted string".

dsnet · 2021-04-14T19:41:21Z

I feel like an API that simply returns the length of a valid quoted string (and the entire length otherwise) is simpler to use.

Suppose we had:

// QuotedPrefixLen returns the length of the quoted string (as defined by Unquote) at the prefix of s.
// If s does not start with a valid quoted string, QuotedPrefix returns len(s).
func QuotedPrefixLen(s string) int

For example, if I wanted to parse the following text:

"string1" "string2" "string3"

the code would look like:

in = strings.TrimSpace(in)
for len(in) > 0 {
    n := strconv.QuotedPrefixLen(in)
    s, err := strconv.Unquote(in[:n])
    if err != nil {
        ...
    }
    ... // make use of s
    in = strings.TrimSpace(in[n:])
}

Alternatively, with the QuotedPrefix API, you have to do an additional error check:

in = strings.TrimLeft(in, " ")
for len(in) > 0 {
    s1, err := strconv.QuotedPrefix(in)
    if err != nil {
        ...
    }
    s2, err := strconv.Unquote(s1)
    if err != nil {
        ... // technically this will never happen, but it's not obvious to reviewers
    }
    ... // make use of s2
    in = strings.TrimLeft(in[len(s1):], " ")
}

That said, I'm still okay with QuotedPrefix as its better than nothing.

rogpeppe · 2021-04-15T08:08:24Z

I feel like an API that simply returns the length of a valid quoted string (and the entire length otherwise) is simpler to use.

Suppose we had:
// QuotedPrefixLen returns the length of the quoted string (as defined by Unquote) at the prefix of s.
// If s does not start with a valid quoted string, QuotedPrefix returns len(s).
func QuotedPrefixLen(s string) int
For example, if I wanted to parse the following text:
"string1" "string2" "string3"
the code would look like:
in = strings.TrimSpace(in)
for len(in) > 0 {
    n := strconv.QuotedPrefixLen(in)

Would QuotedPrefixLen return 0 if there's an escaping error inside the quoted prefix?

I guess if that's the case, you could still call Unquote on the full string and it'll produce the expected error, so it'll work out OK.

dsnet · 2021-04-15T08:10:51Z

I struggled between having it return 0 or len(s) on error situations. The advantage of returning len(s) is that naive slicing of the input would pass an invalid string to strconv.Unquote, which would then produce a good error message. The downside is that you can't distinguish between a valid string that happens to be the full length or an invalid one.

mvdan · 2021-04-15T08:29:17Z

Have it return (int, bool)? You could just ignore the boolean result if you don't need it.

rsc · 2021-04-21T17:10:37Z

You can always wrap QuotedPrefix to get the single-result.
Returning the error lets you get specific information about what went wrong,
which the sentinel integer does not.

rsc · 2021-04-21T18:02:32Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2021-04-28T18:08:37Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

dsnet · 2021-04-28T18:14:15Z

@rsc, To be clear, we're going with the API you proposed in #45033 (comment), right?

dsnet · 2021-04-28T18:44:58Z

Returning the error lets you get specific information about what went wrong,

I just took a look at the implementation of strconv.Unquote and it always returns strconv.ErrSyntax no matter what. I'm not sure how much the rationale of "getting specific information" applies. I would expect strconv.Unquote and strconv.UnquotePrefix to return the exact same error for some invalid input. It's not clear to me that we can (due to Hyrum's law) nor should (due to performance implications) make strconv.Unquote give more informative errors.

gopherbot · 2021-04-29T00:48:44Z

Change https://golang.org/cl/314775 mentions this issue: strconv: add QuotedPrefix

gopherbot added this to the Proposal milestone Mar 15, 2021

gopherbot added the Proposal label Mar 15, 2021

rsc mentioned this issue Apr 7, 2021

proposal: review meeting minutes #33502

Open

rsc changed the title ~~proposal: strconv: add UnquoteLeading~~ proposal: strconv: add QuotedPrefix Apr 21, 2021

rsc added the Proposal-FinalCommentPeriod label Apr 21, 2021

rsc changed the title ~~proposal: strconv: add QuotedPrefix~~ strconv: add QuotedPrefix Apr 28, 2021

rsc added the Proposal-Accepted label Apr 28, 2021

rsc modified the milestones: Proposal, Backlog Apr 28, 2021

gopherbot closed this as completed in e376929 May 4, 2021

dmitshur modified the milestones: Backlog, Go1.17 May 4, 2021

rsc mentioned this issue Jun 10, 2021

api: audit for Go 1.17 #46688

Closed

83 tasks

ianlancetaylor mentioned this issue Jun 19, 2021

strconv: QuotedPrefix documentation ambiguous #46829

Closed

oakad mentioned this issue Mar 23, 2022

proposal: strconv: export unquote and appendQuotedWith #51887

Closed

golang locked and limited conversation to collaborators May 4, 2022

gopherbot added the FrozenDueToAge label May 4, 2022

rsc moved this to Accepted in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

rsc removed this from Proposals Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strconv: add QuotedPrefix #45033

strconv: add QuotedPrefix #45033

dsnet commented Mar 15, 2021

mvdan commented Mar 17, 2021

rsc commented Mar 24, 2021

mvdan commented Mar 24, 2021

dsnet commented Mar 24, 2021

rsc commented Apr 7, 2021

rsc commented Apr 14, 2021

dsnet commented Apr 14, 2021 •

edited

Loading

ianlancetaylor commented Apr 14, 2021

dsnet commented Apr 14, 2021

dsnet commented Apr 14, 2021

rogpeppe commented Apr 15, 2021

dsnet commented Apr 15, 2021

mvdan commented Apr 15, 2021

rsc commented Apr 21, 2021

rsc commented Apr 21, 2021

rsc commented Apr 28, 2021

dsnet commented Apr 28, 2021

dsnet commented Apr 28, 2021

gopherbot commented Apr 29, 2021

strconv: add QuotedPrefix #45033

strconv: add QuotedPrefix #45033

Comments

dsnet commented Mar 15, 2021

mvdan commented Mar 17, 2021

rsc commented Mar 24, 2021

mvdan commented Mar 24, 2021

dsnet commented Mar 24, 2021

rsc commented Apr 7, 2021

rsc commented Apr 14, 2021

dsnet commented Apr 14, 2021 • edited Loading

ianlancetaylor commented Apr 14, 2021

dsnet commented Apr 14, 2021

dsnet commented Apr 14, 2021

rogpeppe commented Apr 15, 2021

dsnet commented Apr 15, 2021

mvdan commented Apr 15, 2021

rsc commented Apr 21, 2021

rsc commented Apr 21, 2021

rsc commented Apr 28, 2021

dsnet commented Apr 28, 2021

dsnet commented Apr 28, 2021

gopherbot commented Apr 29, 2021

dsnet commented Apr 14, 2021 •

edited

Loading