-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long single line literal string #827
Comments
Sadly, as long as you keep it a literal string, there isn't. There are no escapes in MLL strings, so the line-ending backslash syntax available in multiline basic strings is not available. You could convert your literal string to an MLB string by doubling up your backslashes and changing the quotes. Then you can split it into multiple lines with line-ending backslashes, making sure to keep significant whitespace before each backslash. Simple example, inspired by dozens of my former coworkers over the years: mypath = 'C:\src\RepoCity\bighunkymonorepo\Regional\JPJK\rpts\Custom Reporting Projects\eee\eee\Class1.cs'
mypath_readable = """
C:\\src\\RepoCity\\bighunkymonorepo\
\\regional\\JPJK\\rpts\\Custom \
Reporting Projects\\eee\\eee\
\\Class1.cs""" |
The whole value add of literal strings is to avoid escaping. I believe it would be useful to handle this case. Maybe a special character after the triple quotes, e.g. key = '''
blah
bluh
'''- |
@teucer By that, you mean you'd want it so that instead of preserving newlines, a MLL string with a special modifier would just concatenate all the lines into a single line? That sort of thing? It's a compelling idea. I admit that I would find that very useful for regex expessions and Windows paths. Perhaps a solid case for it could be made here? It would be better if a special character came before the triple quotes, so the parser (and other users!) could catch it right away. A symbol that suggests concatenation, like a Let's take my previous example and try this on. Would this look better to folks? Would it be more practical? Or could it be more confusing or just redundant?
|
I was inspired by jinja2, where "-" is used to surpress whitespaces. Your proposal works as well, I would really like to see something like this. One issue to signal the intentional whitespaces, e.g. consider, key = +'''
blah
bluh
''' If my goal was to have 'blah bluh', how can I achieve that? |
[...]
All whitespace other than newlines ought to be preserved. So your example would yield a value of So an explicit space must be included in order to get key = +'''
blah
bluh
''' Two things to note here. The first thing is that triple backtick at the end, on a line of its own. In a normal MLL string, that would put a newline on the very end of the string. I bring this up because I see it too often in examples. In the case of a concatenated MLL, it would make no difference. The second thing is subtle, so read carefully. I am tempted to change the proposal so that instead of just newlines, all the trailing whitespace up to and including the newlines would be removed. That would discourage text that looks like this: # NOTE: single space after blah.
key = +'''
blah
bluh
''' But that resembles and is not consistent with line trailing backslash syntax in MLB strings. And it's harder to explain. So let's just remove newlines, and remember that people who use significant trailing whitespace are only causing themselves trouble anyway. |
Regarding the last bit, I agree that it is important to ignore all the trailing whitespaces:
I think we have a good proposal, what are the next steps? |
Personally, I would give the concept a brand new name, because it is different than anything seen so far. It's not single-line, based on form alone. It's not really multiline, because all trailing whitespace and newlines would be ignored when parsed. Lacking a better name, I'd call it a concatenated string, or concat string for short. Then we'd need community feedback to determine if this is a viable concept or not. We already have four ways to write strings. Do we need this fifth way? We have to convince others that it's worth the additional overhead. @teucer and I are just two people, and although I've contributed time and code in the past, I still must vet every proposed change. Hoping more people will react, one way or another. We must prepare to answer any and all questions that come our way, and adjust accordingly. Concat strings as we've so far defined them begin with Once there's some consensus, we'd compose a PR to match. This involves updates to spec text, examples, and the ABNF code. This, presumably, would be the easiest part, from my perspective at least. Perhaps just adding an optional |
Is there an official process (like PEPs) to propose changes and ask the community for feedback? I don't know where to start. PS: I like the name concat string |
One quick note: I would consider |
We're doing the process right now! We don't often talk about a proposal with a million little facets, so a formal PEP-like process doesn't exist. But I will admit that when we do (#499, #553), I wish it did! We arguably could use a PR's first comment as a makeshift TOML Enhancement Proposal document. For a great idea with some momentum behind it, it gets the ball rolling. Or if there's no great support for the idea, it just kills the momentum. But feature requests must start here, in the issues. It's bad practice in the open-source world to just dump code updates in folks' laps and expect a favorable reception. We've got to talk it out first. We could appeal for action from the project leads, but it's way too early for that now. That's enough meta-process. Hope this was a useful tangent. |
While I can certainly see how this is useful, the O ("obvious") part of TOML is becoming less and less with every addition like this. Especially when it comes to strings things are already somewhat non-obvious now (unfortunately, it's too late to change the semantics on that now, because compatibility).
Is this proposal a good trade-off of obviousness vs. usefulness? I don't know; I'm not so sure. Certainly for me one of the advantages of TOML is that you can send it to pretty much anyone vaguely computer-literate and they don't have too many problems with it (unlike, say, YAML). |
I do not have any doubt any about the usefulness. This is IMHO a limitation of the current spec and a discrepancy between """ and '''. While I like |
My judgment is: nice idea, but too rare a use case to warrant further complication of the spec. This would be a rare beast indeed: a string that's multiline in the file, but represents a single line. If it's a single line, why not write it as such? Editors these days are quite hardy: they won't crash when encountering a line with 100 or 150 characters. Also, when you want to add linebreaks, why not use a multi-line basic string, which has a facility explicitly designed for this purpose ("line ending backslash")? Backslashes will need to be escaped in such cases, but few strings include so many backslashes that this will be really inconvenient. One possible exception, already given as example, are long filenames – but filenames, even under Windows, can be written with slashes instead of backslashes, so that case doesn't count. All in all my impression is that TOML's four string types should be flexible enough to cover all usual use cases. There may be exceptions where a fifth type may come handy – but these exception are too rare to justify yet another addition to the spec. |
Glad to see this activity. I had a very long response written up yesterday that I had to abandon. So please forgive me if I've missed an important conversation point. @ChristianSi Complex regular expressions get long and hairy with an unwieldy number of backslashes. Since this directly affects me in non-TOML applications, I am inclined to keep pressing for something like this. Also, string folding is a different but related thing which strips leading and trailing whitespace and replaces newlines with spaces. This is more like the examples that @teucer is using, and is addressed (very coarsely) by YAML. So like it or not, there's a recognized user demand for concatenated string syntax that we can't easily dismiss. |
I'd argue that almost every feature that has ever been added to any language, config file, etc. is useful and driven by user demand. I think "do people want this?" is kind of the wrong question to be asking, because they can almost always be answered with "yes". Better questions are:
And probably a few more. Designing these sort of things is an exercise in trade-offs where it's impossible to satisfy everyone and every use case. A classic example are I don't want to "dismiss" user demand; but I'm not so sure the advantages of |
Below is my attempt to answer the questions:
Currrently it is not possible to concatenate multiline literal strings, whereas it is possible with it is with basic strings. This feature would be really useful if one wants to concatenate and yet avoid escaping everything. Irrespective of the usefulness (whichI don't doubt), I think the lack of consistency is the bigger issue here.
This is mostly a documentation issue and the feature we would add does not come with a big overhead IMHO
The syntax choice is crucial,
The syntax is clear enough that there would be not surprises IMHO
Knowing a little bit the internals of a Python library (tomlkit) it would be easy to implement. |
@arp242 Well put. I don't entirely agree, but you're right that there are more important questions to pose. My time is limited, so I'll respond to your initial questions in a series of posts, and the trade-offs and their consequences will be front and center in each response.
The previous examples clearly indicate the increase in readability that line breaks offer without doubling up every backslash. With Windows paths, the advantage is certainly aesthetic. With regular expressions, this new syntax aids comprehension while enhancing efficiency. Imagine that users would never again have to write A new, obvious, expression now suggests itself: using a literal backslash as a string modifier! We could have Which leads me to modify the proposal: let us trim both leading and trailing whitespace before concatenation. This allows for internal consistency, which would make learning the concept easier. As for implementation, we can reuse what we already have. Current parsers can handle line ending backslashes within triple-double quotes. Future parsers could use the same line-ending-backslash logic here, after identifying each line's trailing whitespace. Picking up subsequent leading whitespace then would already be accounted for. I will need to write some examples to illustrate these points. More to come. |
Oh yes, I absolutely agree that this feature as such has a lot of value. If I would design anything like TOML I would make sure it worked right from the first version as I hate dealing with The question is whether adding on this new behaviour/feature on top of the existing behaviour is a good trade-off. My main concern is that a whole bunch of subtly different behaviours is just confusing; TOML already has too many IMHO, and I find the way TOML deals with strings in general is somewhat unfortunate. But it is what it is. Overall, I feel that "suboptimal but simple" is a better trade-off, but a reasonable case can be made for the other side as well. Also: I'm not super against this or anything; if this would be added tomorrow then I would have no strong issues with that, even though I'm skeptical it would really make TOML better I don't think it's a super important issue either. Just to clarify 😅
You can already use |
I agree that regexes are a good use case for literal strings – but a regex that spans multiple lines? Anyone who frequently uses such a kind of thing may exaggerate the use of regex magic, I feel. More to the point, modern regex syntaxes usually support an "extended mode" (/x) where linebreaks and other whitespace are ignored. So for cases where regexes get really long, a multi-line literal string parsed in extended mode may be the way to go. So no, I'm still not convinced that we have a convincing use case for yet another string type. TOML has long ago stopped being minimal, but I feel that we must nevertheless resist the urge to re-invent the M as "Maximal". |
Although I said a few months ago that I would respond to the questions that @arp242 put forward, I'm afraid that I've run out of steam. And since nobody's said anything recently, the appropriate approach may be just to go back to the original question. @teucer Upon reflection, I think that @ChristianSi has the only sensible approach here:
If we saw a lot more use cases, or examples in the wild of unwieldy long literal strings, then maybe attitudes towards a special syntax for long single-line literal strings will change, and somebody will suggest a more obvious, and less clever, means of expressing long lines of text across a single line than I've suggested. @ChristianSi The "extended mode" (/x) isn't an option in the use cases that I had in mind. Most of the time, this regex mode ignores all whitespace, including the space between words. Explaining this to my relatively non-technical users would be more of a headache than it would be worth. They're already dealing with a regex engine imposed upon them. On top of all this, they'd said that they'd like to see new features that traditional regex substitution does not provide. But that was a long time ago, and the work issue is not in my hands any longer. So the one use case that I've got to apply to this issue is no longer any of my concern. |
Answering OP's question: No, there is not. Regarding adding some sort of string prefixes that allow you to parse a string with difference characteristics... I don't think this is common enough to justify making such a change to strings. I'm going to close this eagerly, but if we a lot more of a similar concern being raised, we can revisit this then. :) |
I have a long single line literal string. Is there a way to input it multi-line and trim it, as with multi-line basic strings?
The text was updated successfully, but these errors were encountered: