-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paragraph indentation is lost after commands #105
Comments
Urgh. Working out which spaces should and shouldn't be ignored after commands seems like it is beyond me. :-(
should render as
should render as
and possibly even
But
should render as
(I think.) If this is correct, and someone could put the rule into words, we can write tests for it. At the moment I'm just too confused. :-( |
I think the logic should be absolutely simple: I think it should handle nicely all the test cases you have in your example, but I tried to change |
I think I got it; this is the diff against the current master: diff --git a/core/inputs-texlike.lua b/core/inputs-texlike.lua
index 4805208..fe7eb11 100644
--- a/core/inputs-texlike.lua
+++ b/core/inputs-texlike.lua
@@ -14,7 +14,7 @@ SILE.inputs.TeXlike.parser = function (_ENV)
local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
local parameters = (P("[") * list * P("]")) ^-1 / function (a) return type(a)=="table" and a or {} end
local anything = C( (1-lpeg.S("\\{}%\r\n")) ^1)
- local lineEndLineStartSpace = (lpeg.S(" ")^0 * lpeg.S("\r\n")^1 * lpeg.S(" ")^0)^-1
+ local lineEndLineStartSpace = (lpeg.S(" ")^0 * lpeg.S("\r\n"))^-1
START "document";
document = V("stuff") * (-1 + E("Unexpected character at end of input"))
@@ -22,10 +22,10 @@ SILE.inputs.TeXlike.parser = function (_ENV)
stuff = Cg(V"environment" +
((P("%") * (1-lpeg.S("\r\n"))^0 * lpeg.S("\r\n")^-1) /function () return "" end) -- Don't bother telling me about comments
+ V("text") + V"bracketed_stuff" + V"command")^0
- bracketed_stuff = P"{" * V"stuff" * (P"}" + E("} expected"))
- command =((P("\\")-P("\\begin")) * Cg(myID, "tag") * Cg(parameters,"attr") * V"bracketed_stuff"^0 * lineEndLineStartSpace)-P("\\end{")
+ bracketed_stuff = P"{" * V"stuff" * (P"}" + E("} expected")) * lineEndLineStartSpace
+ command =((P("\\")-P("\\begin")) * Cg(myID, "tag") * Cg(parameters,"attr") * V"bracketed_stuff"^0)-P("\\end{")
environment =
- P("\\begin") * Cg(parameters, "attr") * P("{") * Cg(myID, "tag") * P("}") * lineEndLineStartSpace
+ P("\\begin") * Cg(parameters, "attr") * P("{") * Cg(myID, "tag") * P("}")
* V("stuff")
* (P("\\end{") * (
Cmt(myID * Cb("tag"), function(s,i,a,b) return a==b end) + |
It isn't just paragraph breaks that are a problem here. There is a problem with trailing commands on all lines:
…typesets as "two words" whether or not you include a trailing space on the first line. On the other hand:
…typesets crammed together as "_two_words", again whether or not you manually include any spaces after the command. See one of the Greek reader page samples from @jtauber for an example of this being a problem. |
Good point. @simoncozens, is it possible to identify those commands that produce any output in the document? Maybe, as a post-processing step, drop empty lines that had commands on them? The logic would have only 3 steps:
It can be simplified further by removing not just empty lines, but also whitespace only lines that include commands, but I'm not sure about all implications. One good aspect is that it would make |
@alerque, I'm still thinking about this and it may be the case that There are several easy workarounds: (1) add a space before I'd prefer to handle this case "automatically", but it may not be possible without deeper analysis. @simoncozens, thoughts? |
Right now my thoughts are I would like to rip out the changes I made and return to the old, you-need-to-add-percent-signs-everywhere parser. It was less user-friendly but at least it was consistent and unsurprising... |
😢 Having more |
I wouldn't call it unsurprising as I was quickly bitten by
I'm with Caleb on this one. This is one of the things I'd prefer to see less in my text/code. @simoncozens, I'm proposing a very simple rule that should (1) handle the examples you listed, (2) handle those examples that Caleb mentioned, and (3) be simple to implement: When a line starts with a command without any leading spaces, treat the previous new line as a space unless it's an empty line (a line without non-whitespace characters). That's it. This means that
|
I've spent half a day today trying to figure out what exactly the problem is and why the current solution is not working. After trying the various examples and fixes, I came full circle. I agree with @simoncozens that the earlier functionality needs to be restored (revert 269963a and 069205f). If the functionality is restored, there are several "features" that (ideally) needs to be fixed:
Examples:
I've tried various fixes, but I don't think any of them are going to work simply because there are commands that need to ignore spaces and other commands that don't need to and the parser doesn't have that information. For example, in the example above Even the rendering of that If it's not possible to come up with a rule that is going to work automatically, I think we may need to introduce a new command that will mark a set of commands and "eat" all whitespaces. This is still better than I'm proposing I've tried to implement this command based on
Any help? |
I've been playing with scenarios myself and arrived at much the same conclusion you did. The parser needs to be fed more information. It's a crazy hard problem, but however it is solved I want a syntax I can actually teach to non-programmers and have them understand what the deal is. Trailing comment operators to solve a white space parsing issue seems like it will be a never ending debacle to explain. The
(Still trying on command names for size there.) Another possibility that occurred to be is simply an alternate syntax for all commands. If the idea is that the parser doesn't have enough information, some structural part of the syntax could act as a flag for whether or not any given command should or should not (and perhaps how) eat whitespace. |
Right; I also played with I didn't like
I prefer a specific tag, simply because it limits the number of commands and because it eats whitespaces between commands, which would still be difficult to explain by having alternative syntax. I got it working as a container |
@simoncozens, @alerque, I've implemented Simon, I simply hacked It achieves all the effects I wanted to; the following example works as I expect it:
The first If you want to see the "original/current" behavior, simply comment out line 97 in I played with making it enabled by default, but don't think it's a good idea. |
Note that whitespace handling in the SILE language is still not quite nailed down. This extra hack should work on both the last released version and the current git HEAD version, but there is no promise that it will be the preferred syntax down the road. However it works for now and as the backend is still experimental it seems like the smartest thing to do is use the syntax most likely to work. See upstream issue 105 for details: sile-typesetter/sile#105
Note that whitespace handling in the SILE language is still not quite nailed down. This extra hack should work on both the last released version and the current git HEAD version, but there is no promise that it will be the preferred syntax down the road. However it works for now and as the backend is still experimental it seems like the smartest thing to do is use the syntax most likely to work. See upstream issue 105 for details: sile-typesetter/sile#105
Simon, what do you think about the |
@simoncozens, @alerque, any further thoughts on this? The current state is sort of the middle of not-quite-tex-like and not-quite-working-in-some-cases. The |
Hi Paul, sorry about the delay; I've been prioritizing a lot of the internationalisation work and stuff with discretionaries - now that's sorted and the tests pass again I'd like to come back to this. I think the |
This sounds good. You can check my attempt to do this (https://github.com/pkulchenko/sile/tree/command-friendly-whitespace) as it includes the revert of the changes (one commit), plus the new command (another commit); I'm sure you can find a proper way to implement the needed logic ;). I've tried to make it work with both |
I've come up with another way to do this which I think works OK. Basically we revert the two erroneous changes and introduce a new rule: text which consists entirely of newlines is dropped. Please have a look at the latest commit, especially the new test |
Hmmmm, that's very interesting. So far all the random tests I through at it did the thing I expected. |
Seeing as how you've jinxed me twice today I'm asking: @simoncozens are you working on the tests or would you like me to fix them up? I have a previous test related commit to send and can send it with or without updated regression tests for this issue. |
I wasn't going to fix up the tests for this yet - was looking for a third opinion from @pkulchenko before declaring this done. |
Alright well I have a fixed up set (all but one that I'm not sure about) that I'll post in a minute but you can wait to merge it until we decide on this. If it turn out we do something different I'll split out the non-whitespace related changes from the set. |
One case where this handling is broken (relative to previous behavior) is in the
…leaves blank space at the top of a document where it didn't used to while
…works as expected. |
I toyed with the idea of never typesetting anything which is purely whitespace, but then you have the (very useful) I think the way around this is to give |
I noticed these two don't produce the same output:
text appears in two lines
text appears in one line In any case, I expected two lines. Would the approach of "discarding whitespace around non-output commands" yield the correct result here? Is |
@raphCode Yep... Weird parsing logic that we all face sooner or later (my very first issue here was quite similar, #1193)... Not to mention internal spaces, &c. Any change would be quite breaking, though, for lots of packages likely, and I am a bit frightened this is now marked for "0.15" (with the usual milestone shifting though...). "Non-output commands" is a concept I find highly dubious to define strictly... |
My milestone handling for "some future major version before 1.x" (as I see this issue fitting) vs. "scheduled for the next major version" has been inconsistent with how I've been marking and shifting minor version bumps to mean either "some future minor version" vs. "scheduled for the next minor version". I'll try to rectify that for consistency so that "scheduled for exact version N" means what it says and "some future version level X" can get shifted around. |
@raphCode The case of |
It occurred to me today while walking to the train that this issue is basically defining whether our input grammar will be context-free or not. It's not precisely the same stakes but in the ballpark. If the command itself determines whether whitespace will be eaten or not we loose a large element of parsability / predictability even if in some cases it would be more expected at first blush. We have four syntax variants to consider ( |
I absolutely agree with this goal. |
Quick question based on a variant of your comment. Would you expect these two blurbs to work the same or different?
Vs.
|
Assuming that an empty line translates into a new paragraph, I would expect them to work the same, or am I missing something? But really, as long as there is any system that is consistent and easy to use for the majority of use cases, I think I am fine. |
Another thing I noticed is that the following behaves differently in TeX and Sile:
Sile creates multiple spaces between the words, TeX doesn't. While this is a artificial example, in a real document something more subtle can happen:
Before printing, the Edit: |
I found these links interesting, they discuss whitespace handling after commands in TeX: I think we should pick some space-swallowing defaults for the 4 different forms of commands (as outlined by alerque) and additionally offer a way in the code to make this behavior overrideable. With careful consideration which built-in commands to override, this should offer a tradeoff between predictability and comfort of use. Ideally, it just works the way one expects. For example:
I realize that this proposal is probably not straightforward to implement since the question how to deal with a certain whitespace can not be answered purely based on parser grammar. |
Out of curiosity, what are you using |
I don't - this was just an example I picked up by reading this thread. I already looked the I figured that the details of Edit: I replaced my first bullet point with a better example, and kept the original one strike-trough to not disturb your quote. |
Kind of related (from #1691)
leads to an extra white space at the start of the quoted text. (It may perhaps hint, or not, towards ignoring trailing spaces at the end of an environment start) EDIT "or not": depends on how we consider the parity with XML.
Maybe not being the same thing as
|
Even though my comments there start heading the other way, I'm not 100% convinced the issue in #1691 is the same. In this case the difference between <foo>bar</foo> and <foo>
bar
</foo> should never equate to <foo> bar</foo> But only whether the outcome is SILE.process({ "bar" }) or possibly SILE.typesetter:leaveHmode()
SILE.process({ "bar" }) Does that make sense? |
Indeed, it's not the same as "paragraph" issues initially discussed here. To clarify, my mentioning here as "kind of related" is that whatever we adopt in the future for the paragraphing issue (incl. blank lines / end of lines), it might be good also to consider how it possibly affects spacing. But at the core, it's possibly a bit distinct, true. (I am not sure, by the way, that the outcome should depend on whether horizontal mode is left or not. It's a tough question, though.) |
This one has really started to bother me, even sitting up at 4am with a 10mo who can't breathe I'm stewing on how to fix it. I started work on a grammar. That's helping to clarify what the issues is, but the input grammar isn't the full story. |
This can be seen on this document:
More text here
continues on the same line as the horizontal rule, but should start on a new line.I've bisected it to this commit 269963a, more specifically to
lineEndLineStartSpace
incommand =((P("\\")-P("\\begin")) * Cg(myID, "tag") * Cg(parameters,"attr") * V"bracketed_stuff"^0 * lineEndLineStartSpace)-P("\\end{")
.The text was updated successfully, but these errors were encountered: