Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace '~' to ' ' as a non-breaking space in bibliography.lua #1860

Closed
wants to merge 1 commit into from

Conversation

leorosa
Copy link
Contributor

@leorosa leorosa commented Sep 8, 2023

The bibtex package, specifically the bibliography.lua file, has functions "borrowed from nbibtex". Along it, the '~' character (which is used is used to insert a non-breaking space in latex) is used to separate author's names. However, in sile '~' is an ordinary character and, therefore, the \reference command may produce undesired '~' symbols in the generated document.
This PR changes one occurrence of '~' to ' ' (0x00A0) character, which (hack or not) works just fine in sile.
I was tempted to also replace '~' in lines 241, 243 and 244, but since I could not trigger them in my documents, I left these lines unchanged.

@Omikhleia
Copy link
Member

Omikhleia commented Sep 8, 2023

(0x00A0) character, which (hack or not) works just fine in sile

Not quite. It works at shaping level, but is then rendered as a fixed-sized non-breaking space.
However these non-breaking spaces should be variable, for the purpose of text justification.

Unicode Line Breaking Algorithm

... then expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width.

In (La)TeX, ~ while non-breaking, is stretchable too, for that reason.

@Omikhleia
Copy link
Member

Omikhleia commented Sep 8, 2023

Additional note on how to get a stretchable non-breaking space:

In SILE, we can compute the size of an inter-word justifying space (which depends on several settings) as follows

local function interwordSpace()
  return SILE.shaper:measureSpace(SILE.font.loadDefaults({}))
end

(EDIT: of course the above is only true in a context where the current font is applied)

And then we must insert a kern of that size.
(I used that in various places in markdown.sile and resilient.sile)

@Omikhleia Omikhleia marked this pull request as draft September 8, 2023 07:40
@Omikhleia Omikhleia added the enhancement Software improvement or feature request label Sep 8, 2023
@alerque
Copy link
Member

alerque commented Sep 11, 2023

@Omikhleia Won't that hack give you the default size of a space but not the actual size in a given line (because it doesn't cause the line breaker to re-consider possible break points and adjust the stretch/shrink)?

It seems to me we should probably supply some \nbsp command that actually generates a special node that sizes the same as a regular space but isn't breakable, then fix the bibliography to use it. What you you think?

@alerque alerque added this to the v0.14.12 milestone Sep 11, 2023
@Omikhleia
Copy link
Member

Omikhleia commented Sep 11, 2023

Won't that hack give you the default size of a space but not the actual size in a given line (because it doesn't cause the line breaker to re-consider possible break points and adjust the stretch/shrink)?
It seems to me we should probably supply some \nbsp command that actually generates a special node that sizes the same as a regular space but isn't breakable, then fix the bibliography to use it. What you you think?

It has to be done via a command, yes, to get evaluated in the right font and break context.

Then I don't think we need a special node -- we just need that command to insert a simple kern for the line-breaking not to consider it as a breakpoint.

I've been running with this in various places:

And it does the right thing, it seems to me.

If we want to support ~ in bibliography entries, then we probably need to use an inputfilter to get it replaced by such a command (EDIT: that could be provided in the base or plain class)

@Omikhleia
Copy link
Member

Omikhleia commented Sep 11, 2023

Lol, and even used here as pseudo-unit iwsp in my style files: https://github.com/Omikhleia/resilient.sile/blob/c0035b9ad947ff5357a5738cfcb2c642e4093112/resilient/utils.lua#L7-L26

Apparently I like re-implementing those stretchable inter-word spaces in many different ways and places ;-)

@Omikhleia
Copy link
Member

Omikhleia commented Sep 12, 2023

(Bad example typed with a huge emergencyStretch)

image

First paragraph with abbr:nbsp (kern of width = interwordSpace() = stretchable and shrinkable)
Second paragraph with regular spaces (breakable),
Last paragraph with U+00A0 (unbreakable but fixed)

BTW, I wouldn't call the formula a "hack" -- it's, to my best understanding, more or less how is built the inter-word glue used for regular spaces (though the logic of shaper.makeSpaceNode()... is a bit convoluted and hard to read with all the possible settings)

@leorosa
Copy link
Contributor Author

leorosa commented Sep 12, 2023

(Bad example typed with a huge emergencyStretch)

By your example, I take back the "work just fine" from my first message.

In SILE, we can compute the size of an inter-word justifying space (...) And then we must insert a kern of that size.

So, we can not just replace one char for another, since the space size needs to be calculated every time. Therefore, the needed change is not trivial, since there are many lines in bibliography.lua that expect "space characters", e.g.:

local sep_chars = sep_and_not_tie .. '%~'

local parse_name
do
  local white_sep         = '[' .. sep_chars .. '%s]+'
  local white_comma_sep   = '[' .. sep_chars .. '%s%,]+'
  local trailing_commas   = '(,[' .. sep_chars .. '%s%,]*)$'
  local sep_char          = '[' .. sep_chars .. ']'
  local leading_white_sep = '^' .. white_sep

@Omikhleia
Copy link
Member

Omikhleia commented Sep 12, 2023

So, we can not just replace one char for another, since the space size needs to be calculated every time. Therefore, the needed change is not trivial, since there are many lines in bibliography.lua that expect "space characters", e.g.:

Hmm, quickly looking at the bibliography.lua code (I only dabbled there once, I don't know it very well), it assumes the content is a string in XML and just processes it:

SILE.processString(("<sile>%s</sile>"):format(cite), "xml")

(That's for the "cite" command, same goes for the "reference" command a bit further, etc.)

So cleverly replacing the ~ from the input with (say) <nbsp/> and implementing a "nbsp" command1, based on my above links, might just work. This is the path I would try, in any cases, as a first attempt ;)2

Footnotes

  1. We could have this command in the bibtex package itself, but it's interesting for other purposes, so could be part of the base or plain class. I'd tend to think the base class is more appropriate, maybe.

  2. It would avoid invoking an inputfilter... I am not fully convince this is good, but as ~ is probably unlikely to occur in XML commands or attributes (where it shouldn't be replaced...), that might be acceptable.

@leorosa
Copy link
Contributor Author

leorosa commented Sep 12, 2023

Replacing the ~ in bibliography.lua to <nbsp/>, and implementing a "nbsp" command in init.lua work, but not the way I expected: I had to change lines 241-244, because changing line 72, as I did earlier, messed the text. I think that I need to understand better this code before proposing changes to it. In the meantime, being a non-breaking space useful outside of bibtex, it makes sense to made it available in the base class.

In addition, it just comes into my mind that we may have some ~ from the bibfile itself...

@Omikhleia
Copy link
Member

Omikhleia commented Sep 16, 2023

For the record, another potential solution could be to use 0x00A0 indeed, but to have it processed (replaced by a kern) in SILE.nodeMakers.unicode:dealWith
EDIT and/or SILE.nodeMakers.unicode:handleWordBreak etc., this would have to be checked too. Anyhow, the idea there would be to always replace the non-breaking space by a kern, in a similar way as the regular space(s) are replaced by a glue.
A few languages might not use the unicode node breaker ("ja", IIRC) and would possibly have to be addressed separately, if needed.

@alerque
Copy link
Member

alerque commented Oct 11, 2023

I'm going to go ahead and merge this as a stop-gab because having a Unicode non-breaking space is an improvement on leaking a LaTeX symbol that we don't actually process that way. That being said none of the discussion here is actually addressed and there are good ideas and even links to implementations that should be considered. I opened #1889 to track that.

@alerque
Copy link
Member

alerque commented Oct 11, 2023

On second thought I'm bumping this past today's point release because it isn't clean to me what this is doing. This code is used to match X and either replace with nothing or split into tokens. In other words removing ~ here should never result in cleaning up a ~ in input, quite the opposite.

I'm happy to see a bug fixed, but I think I need to see an MWE that we can turn into a test before applying this.

@alerque alerque modified the milestones: v0.14.12, v0.x.y Oct 11, 2023
@alerque alerque modified the milestones: v0.x.y, v0.14.14 Dec 13, 2023
@alerque
Copy link
Member

alerque commented Dec 13, 2023

So I think this will make perfect sense and function as expected after #1918 lands.

@alerque alerque marked this pull request as ready for review December 16, 2023 15:41
@alerque
Copy link
Member

alerque commented Dec 16, 2023

I think we need to add a test case that actually shows this is working now. These changes didn't break anything in our tests.

Copy link
Member

@alerque alerque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a simple test case that shows this in action (either working or not working doesn't matter) that we can build out a test from with expectations so we know when the code does what we expect?

@alerque
Copy link
Member

alerque commented Dec 22, 2023

I played around with editing the bibtex test to hit this change and couldn't come up with anything. I'm happy for it to go through but I'm concerned we may not actually have fixed anything. We need a test to prove it if nothing else so that I feel comfortable with a change log going out saying we fixed something that something has at least changed.

If we don't get a working test case soon this might have to wait for the next release cycle as v0.14.14 is basically ready to go out other than waiting for this.

@alerque alerque modified the milestones: v0.14.14, v0.14.15 Dec 23, 2023
@alerque
Copy link
Member

alerque commented Jan 22, 2024

I'm still quite happy to get this enhancement landed, but I'm a bit confused about what is affecting what here and we still don't have a test case that shows "this thing that didn't work as expected/desired before does now". As soon as we can come up with such a test case this can land is the next patch release, but I can't hold up the release of other verified bug fixed for something we can't even show in action—hence this getting bumped again.

@alerque alerque modified the milestones: v0.14.15, v0.x.y Jan 22, 2024
@Omikhleia Omikhleia added the needs MWE Minimum working example needed for investigation label Jan 23, 2024
Omikhleia added a commit to Omikhleia/resilient-types that referenced this pull request Jun 15, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
Omikhleia added a commit to Omikhleia/resilient-types that referenced this pull request Jun 15, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
@Omikhleia
Copy link
Member

Closing this PR:
This will be implemented differently in #2048

@Omikhleia Omikhleia closed this Jun 15, 2024
Omikhleia added a commit to Omikhleia/resilient-types that referenced this pull request Jun 15, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
@alerque alerque modified the milestones: v0.x.y, v0.15.4 Jun 15, 2024
alerque pushed a commit to Omikhleia/resilient-types that referenced this pull request Jun 23, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
alerque pushed a commit to Omikhleia/resilient-types that referenced this pull request Jun 23, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Software improvement or feature request needs MWE Minimum working example needed for investigation packages:bibliography
Projects
Development

Successfully merging this pull request may close these issues.

3 participants