Long single line literal string #827

teucer · 2021-06-10T22:26:25Z

I have a long single line literal string. Is there a way to input it multi-line and trim it, as with multi-line basic strings?

eksortso · 2021-06-12T13:06:42Z

Sadly, as long as you keep it a literal string, there isn't. There are no escapes in MLL strings, so the line-ending backslash syntax available in multiline basic strings is not available.

You could convert your literal string to an MLB string by doubling up your backslashes and changing the quotes. Then you can split it into multiple lines with line-ending backslashes, making sure to keep significant whitespace before each backslash.

Simple example, inspired by dozens of my former coworkers over the years:

mypath = 'C:\src\RepoCity\bighunkymonorepo\Regional\JPJK\rpts\Custom Reporting Projects\eee\eee\Class1.cs'

mypath_readable = """
C:\\src\\RepoCity\\bighunkymonorepo\
\\regional\\JPJK\\rpts\\Custom \
Reporting Projects\\eee\\eee\
\\Class1.cs"""

teucer · 2021-06-13T20:05:40Z

The whole value add of literal strings is to avoid escaping. I believe it would be useful to handle this case. Maybe a special character after the triple quotes, e.g.

key = '''
blah 
bluh
'''-

eksortso · 2021-06-14T02:14:31Z

I believe it would be useful to handle this case.

@teucer By that, you mean you'd want it so that instead of preserving newlines, a MLL string with a special modifier would just concatenate all the lines into a single line? That sort of thing?

It's a compelling idea. I admit that I would find that very useful for regex expessions and Windows paths. Perhaps a solid case for it could be made here?

It would be better if a special character came before the triple quotes, so the parser (and other users!) could catch it right away. A symbol that suggests concatenation, like a + plus sign, would work better. The plus is commonly used by many languages to "add" strings to the ends of other strings.

Let's take my previous example and try this on. Would this look better to folks? Would it be more practical? Or could it be more confusing or just redundant?

mypath_perhaps = +'''
C:\src\RepoCity\bighunkymonorepo
\regional\JPJK
\rpts\Custom Reporting Projects
\eee\eee\Class1.cs'''

teucer · 2021-06-14T08:05:58Z

I was inspired by jinja2, where "-" is used to surpress whitespaces. Your proposal works as well, I would really like to see something like this.

One issue to signal the intentional whitespaces, e.g. consider,

key = +'''
blah 
bluh
'''

If my goal was to have 'blah bluh', how can I achieve that?

eksortso · 2021-06-14T13:37:18Z

One issue to signal the intentional whitespaces, e.g. consider,

[...]

If my goal was to have 'blah bluh', how can I achieve that?

All whitespace other than newlines ought to be preserved. So your example would yield a value of blahbluh. We shouldn't add spaces, because these strings must remain as literal as possible, without doing any more violence to that word than we're already doing.

So an explicit space must be included in order to get blah bluh. This would work, and it's obvlous what we're doing:

key = +'''
blah
 bluh
'''

Two things to note here. The first thing is that triple backtick at the end, on a line of its own. In a normal MLL string, that would put a newline on the very end of the string. I bring this up because I see it too often in examples. In the case of a concatenated MLL, it would make no difference.

The second thing is subtle, so read carefully. I am tempted to change the proposal so that instead of just newlines, all the trailing whitespace up to and including the newlines would be removed. That would discourage text that looks like this:

# NOTE: single space after blah.
key = +'''
blah 
bluh
'''

But that resembles and is not consistent with line trailing backslash syntax in MLB strings. And it's harder to explain. So let's just remove newlines, and remember that people who use significant trailing whitespace are only causing themselves trouble anyway.

teucer · 2021-06-14T15:23:17Z

Regarding the last bit, I agree that it is important to ignore all the trailing whitespaces:

The spec should not become highly sensitive to whitespaces
I would not have to change my settings in VS Code to disable the trimming of whitespaces 😄

I think we have a good proposal, what are the next steps?

eksortso · 2021-06-14T21:17:57Z

Personally, I would give the concept a brand new name, because it is different than anything seen so far. It's not single-line, based on form alone. It's not really multiline, because all trailing whitespace and newlines would be ignored when parsed. Lacking a better name, I'd call it a concatenated string, or concat string for short.

Then we'd need community feedback to determine if this is a viable concept or not. We already have four ways to write strings. Do we need this fifth way? We have to convince others that it's worth the additional overhead. @teucer and I are just two people, and although I've contributed time and code in the past, I still must vet every proposed change. Hoping more people will react, one way or another.

We must prepare to answer any and all questions that come our way, and adjust accordingly. Concat strings as we've so far defined them begin with +''', end with ''', and permit no escape sequences. Why shouldn't +""" and """, i.e. the same thing but with double quotes, allow for escape sequences or line ending backslashes? Why not "basic concat strings" and "literal concat strings"? Users would expect these things to be allowed, given the precedents.

Once there's some consensus, we'd compose a PR to match. This involves updates to spec text, examples, and the ABNF code. This, presumably, would be the easiest part, from my perspective at least. Perhaps just adding an optional concat-symbol token in the appropriate places would do the trick for the syntax. That can lead to further questions.

teucer · 2021-06-15T09:57:06Z

Is there an official process (like PEPs) to propose changes and ask the community for feedback? I don't know where to start.

PS: I like the name concat string

ChristianSi · 2021-06-15T11:14:21Z

One quick note: I would consider -''' a better start marker than +'''. + suggests that something is added, - that something is subtracted. We subtract, or remove, something here, namely linebreaks and possibly trailing whitespace, so let's not confuse people by using the wrong kind of sign.

eksortso · 2021-06-15T11:30:22Z

We're doing the process right now! We don't often talk about a proposal with a million little facets, so a formal PEP-like process doesn't exist. But I will admit that when we do (#499, #553), I wish it did!

We arguably could use a PR's first comment as a makeshift TOML Enhancement Proposal document. For a great idea with some momentum behind it, it gets the ball rolling. Or if there's no great support for the idea, it just kills the momentum.

But feature requests must start here, in the issues. It's bad practice in the open-source world to just dump code updates in folks' laps and expect a favorable reception. We've got to talk it out first.

We could appeal for action from the project leads, but it's way too early for that now.

That's enough meta-process. Hope this was a useful tangent.

arp242 · 2021-06-15T12:53:54Z

While I can certainly see how this is useful, the O ("obvious") part of TOML is becoming less and less with every addition like this. Especially when it comes to strings things are already somewhat non-obvious now (unfortunately, it's too late to change the semantics on that now, because compatibility).

key = +""" ... """ (or -""") is nice and concise, at the very least a more obvious signal wouldn't be a bad idea IMHO. I can't really think of something that I really like right now, but think in the direction of something like key = oneline|""" ... """. This can also be extended with e.g. key = trim|""" ... """.

Is this proposal a good trade-off of obviousness vs. usefulness? I don't know; I'm not so sure. Certainly for me one of the advantages of TOML is that you can send it to pretty much anyone vaguely computer-literate and they don't have too many problems with it (unlike, say, YAML).

teucer · 2021-06-16T06:41:57Z

I do not have any doubt any about the usefulness. This is IMHO a limitation of the current spec and a discrepancy between """ and '''.

While I like -'''...''' (or +'''...'''), key = concat|''' ... ''' is more explicit and might be extended for other purposes.

ChristianSi · 2021-06-16T10:46:53Z

My judgment is: nice idea, but too rare a use case to warrant further complication of the spec. This would be a rare beast indeed: a string that's multiline in the file, but represents a single line. If it's a single line, why not write it as such? Editors these days are quite hardy: they won't crash when encountering a line with 100 or 150 characters.

Also, when you want to add linebreaks, why not use a multi-line basic string, which has a facility explicitly designed for this purpose ("line ending backslash")? Backslashes will need to be escaped in such cases, but few strings include so many backslashes that this will be really inconvenient. One possible exception, already given as example, are long filenames – but filenames, even under Windows, can be written with slashes instead of backslashes, so that case doesn't count.

All in all my impression is that TOML's four string types should be flexible enough to cover all usual use cases. There may be exceptions where a fifth type may come handy – but these exception are too rare to justify yet another addition to the spec.

eksortso · 2021-06-16T19:47:06Z

Glad to see this activity. I had a very long response written up yesterday that I had to abandon. So please forgive me if I've missed an important conversation point.

@ChristianSi Complex regular expressions get long and hairy with an unwieldy number of backslashes. Since this directly affects me in non-TOML applications, I am inclined to keep pressing for something like this.

Also, string folding is a different but related thing which strips leading and trailing whitespace and replaces newlines with spaces. This is more like the examples that @teucer is using, and is addressed (very coarsely) by YAML.

So like it or not, there's a recognized user demand for concatenated string syntax that we can't easily dismiss.

arp242 · 2021-06-16T21:05:14Z

So like it or not, there's a recognized user demand for concatenated string syntax that we can't easily dismiss.

I'd argue that almost every feature that has ever been added to any language, config file, etc. is useful and driven by user demand. I think "do people want this?" is kind of the wrong question to be asking, because they can almost always be answered with "yes".

Better questions are:

What problems does it solve that are hard to solve otherwise?
How will it impact the learning curve?
How will it impact readability? Especially for people not familiar with TOML?
How much potential for confusion is there (i.e. TOML being parsed to strings people weren't expecting)?
How much harder will it be to implement?

And probably a few more. Designing these sort of things is an exercise in trade-offs where it's impossible to satisfy everyone and every use case.

A classic example are \-escapes. Useful? Clearly. But it also comes with downsides, such as people writing stuff like path = "C:\Users\martin" and then have trouble figuring out why it doesn't work. You can use path = 'C:\..', but overall it increases the learning curve and confusion. A good trade-off? Well, you can decide that for yourself; but it certainly is a trade-off. What's beneficial to one user can detrimental to another.

I don't want to "dismiss" user demand; but I'm not so sure the advantages of +""" .. """ are greater than the disadvantages.

teucer · 2021-06-17T11:17:56Z

Below is my attempt to answer the questions:

What problems does it solve that are hard to solve otherwise?

Currrently it is not possible to concatenate multiline literal strings, whereas it is possible with it is with basic strings. This feature would be really useful if one wants to concatenate and yet avoid escaping everything. Irrespective of the usefulness (whichI don't doubt), I think the lack of consistency is the bigger issue here.

How will it impact the learning curve?

This is mostly a documentation issue and the feature we would add does not come with a big overhead IMHO

How will it impact readability? Especially for people not familiar with TOML?

The syntax choice is crucial, key = concat|''' ... ''' might be preferable here.

How much potential for confusion is there (i.e. TOML being parsed to strings people weren't expecting)?

The syntax is clear enough that there would be not surprises IMHO

How much harder will it be to implement?

Knowing a little bit the internals of a Python library (tomlkit) it would be easy to implement.

eksortso · 2021-06-17T12:04:35Z

@arp242 Well put. I don't entirely agree, but you're right that there are more important questions to pose.

My time is limited, so I'll respond to your initial questions in a series of posts, and the trade-offs and their consequences will be front and center in each response.

What problems does it solve that are hard to solve otherwise?

The previous examples clearly indicate the increase in readability that line breaks offer without doubling up every backslash. With Windows paths, the advantage is certainly aesthetic. With regular expressions, this new syntax aids comprehension while enhancing efficiency. Imagine that users would never again have to write "\\\\" to express a single backslash, and this mechanism would allow for that clarity of expression. This would be a big win for everyone.

A new, obvious, expression now suggests itself: using a literal backslash as a string modifier! We could have \''' indicate automatic line-ending-backslash behavior. That single character would make a big difference to users dealing with complicated strings. Users would need to be aware of that character, but as already mentioned, those users already deal with these more technical string expressions. (I'll expand upon the "string modifier" concept in a later post, when we revisit the 'blah bluh' string.)

Which leads me to modify the proposal: let us trim both leading and trailing whitespace before concatenation. This allows for internal consistency, which would make learning the concept easier. As for implementation, we can reuse what we already have. Current parsers can handle line ending backslashes within triple-double quotes. Future parsers could use the same line-ending-backslash logic here, after identifying each line's trailing whitespace. Picking up subsequent leading whitespace then would already be accounted for.

I will need to write some examples to illustrate these points. More to come.

arp242 · 2021-06-17T15:48:30Z

Oh yes, I absolutely agree that this feature as such has a lot of value. If I would design anything like TOML I would make sure it worked right from the first version as I hate dealing with \ escapes and general muckery with whitespace. That's not something I need to be convinced of, because we can quickly agree on that particular point.

The question is whether adding on this new behaviour/feature on top of the existing behaviour is a good trade-off. My main concern is that a whole bunch of subtly different behaviours is just confusing; TOML already has too many IMHO, and I find the way TOML deals with strings in general is somewhat unfortunate. But it is what it is.

Overall, I feel that "suboptimal but simple" is a better trade-off, but a reasonable case can be made for the other side as well.

Also: I'm not super against this or anything; if this would be added tomorrow then I would have no strong issues with that, even though I'm skeptical it would really make TOML better I don't think it's a super important issue either. Just to clarify 😅

Imagine that users would never again have to write "\\" to express a single backslash

You can already use ' and ''' for that? I'm not sure if I follow how that relates to the +''' .. ''' proposal?

ChristianSi · 2021-06-18T11:46:19Z

@eksortso:

With regular expressions, this new syntax aids comprehension while enhancing efficiency.

I agree that regexes are a good use case for literal strings – but a regex that spans multiple lines? Anyone who frequently uses such a kind of thing may exaggerate the use of regex magic, I feel. More to the point, modern regex syntaxes usually support an "extended mode" (/x) where linebreaks and other whitespace are ignored. So for cases where regexes get really long, a multi-line literal string parsed in extended mode may be the way to go.

So no, I'm still not convinced that we have a convincing use case for yet another string type. TOML has long ago stopped being minimal, but I feel that we must nevertheless resist the urge to re-invent the M as "Maximal".

eksortso · 2021-08-17T04:04:04Z

Although I said a few months ago that I would respond to the questions that @arp242 put forward, I'm afraid that I've run out of steam. And since nobody's said anything recently, the appropriate approach may be just to go back to the original question.

@teucer Upon reflection, I think that @ChristianSi has the only sensible approach here:

If it's a single line, why not write it as such? Editors these days are quite hardy: they won't crash when encountering a line with 100 or 150 characters.

If we saw a lot more use cases, or examples in the wild of unwieldy long literal strings, then maybe attitudes towards a special syntax for long single-line literal strings will change, and somebody will suggest a more obvious, and less clever, means of expressing long lines of text across a single line than I've suggested.

@ChristianSi The "extended mode" (/x) isn't an option in the use cases that I had in mind. Most of the time, this regex mode ignores all whitespace, including the space between words. Explaining this to my relatively non-technical users would be more of a headache than it would be worth. They're already dealing with a regex engine imposed upon them. On top of all this, they'd said that they'd like to see new features that traditional regex substitution does not provide. But that was a long time ago, and the work issue is not in my hands any longer. So the one use case that I've got to apply to this issue is no longer any of my concern.

pradyunsg · 2022-03-05T12:44:02Z

Answering OP's question: No, there is not.

Regarding adding some sort of string prefixes that allow you to parse a string with difference characteristics... I don't think this is common enough to justify making such a change to strings. I'm going to close this eagerly, but if we a lot more of a similar concern being raised, we can revisit this then. :)

pradyunsg added new-syntax question labels Mar 5, 2022

pradyunsg closed this as completed Mar 5, 2022

arp242 mentioned this issue Apr 22, 2023

Proposal: trim one new line immediately before the closing delimiter in multiline string #969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long single line literal string #827

Long single line literal string #827

teucer commented Jun 10, 2021

eksortso commented Jun 12, 2021 •

edited

Loading

teucer commented Jun 13, 2021

eksortso commented Jun 14, 2021

teucer commented Jun 14, 2021

eksortso commented Jun 14, 2021

teucer commented Jun 14, 2021 •

edited

Loading

eksortso commented Jun 14, 2021

teucer commented Jun 15, 2021

ChristianSi commented Jun 15, 2021

eksortso commented Jun 15, 2021

arp242 commented Jun 15, 2021

teucer commented Jun 16, 2021

ChristianSi commented Jun 16, 2021 •

edited

Loading

eksortso commented Jun 16, 2021

arp242 commented Jun 16, 2021

teucer commented Jun 17, 2021

eksortso commented Jun 17, 2021

arp242 commented Jun 17, 2021

ChristianSi commented Jun 18, 2021

eksortso commented Aug 17, 2021

pradyunsg commented Mar 5, 2022

Long single line literal string #827

Long single line literal string #827

Comments

teucer commented Jun 10, 2021

eksortso commented Jun 12, 2021 • edited Loading

teucer commented Jun 13, 2021

eksortso commented Jun 14, 2021

teucer commented Jun 14, 2021

eksortso commented Jun 14, 2021

teucer commented Jun 14, 2021 • edited Loading

eksortso commented Jun 14, 2021

teucer commented Jun 15, 2021

ChristianSi commented Jun 15, 2021

eksortso commented Jun 15, 2021

arp242 commented Jun 15, 2021

teucer commented Jun 16, 2021

ChristianSi commented Jun 16, 2021 • edited Loading

eksortso commented Jun 16, 2021

arp242 commented Jun 16, 2021

teucer commented Jun 17, 2021

eksortso commented Jun 17, 2021

arp242 commented Jun 17, 2021

ChristianSi commented Jun 18, 2021

eksortso commented Aug 17, 2021

pradyunsg commented Mar 5, 2022

eksortso commented Jun 12, 2021 •

edited

Loading

teucer commented Jun 14, 2021 •

edited

Loading

ChristianSi commented Jun 16, 2021 •

edited

Loading