[fix] text.wrap now returns non-empty list for one character strings #400

Sygmei · 2021-12-28T23:42:00Z

text.wrap returns empty list for one-character strings, this PR fixes this by using <= instead of < for the iterator

alerque

Thanks for taking the time to contribute!

This looks reasonable, but can you clarify ... if it was returning the wrong result for one character strings before doesn't that imply it was returning the wrong result all along for all lengths because it was failing to evaluate the last position? Or does the nxt shenanigans mean that isn't true? If the latter, isn't this now looping one too many times for other cases? Perhaps a do ... until loop be more appropriate here.

Also how can we test this so that it is demonstrated to work as expected and we don't break it again?

Sygmei · 2021-12-29T21:23:13Z

@alerque sorry for yesterday's incomplete PR

I added more content :

Unit tests that checks that it works with old behaviour
New unit tests that tests for mono-letters and accented characters
Fixed old unit tests that did not respect the width limit (for wrap / fill)
Changed while <= to repeat until as recommended
wrap now uses %s as a "next word search" since old %W did not work with accented characters

Sygmei · 2021-12-29T21:24:30Z

tests/test-text.lua

@@ -46,18 +46,19 @@ batteries. That is because
 the goal of Lua is to
 produce a lean expressive
 language that will be
-used on all sorts of machines,


length of used on all sorts of = 20
should have jumped instead of putting machines, on same line

Tieske · 2021-12-30T08:31:26Z

This thing has more problems, and I think the proposed solution is just shifting them (didn't test).

Small test;

local ww = require("pl.text").wrap

local t = [[
hello "world" 'this' -is- a bb ccc dddd test... but wouldn't it pass??? final. word-that-can-be-broken
]]

for _, v in ipairs(ww(t,3)) do print(v) end

Output (without the changes in this PR):

hello
"world"
'this'
-is-
a bb              --> 'bb' on the next line?
ccc dddd          --> 'dddd' on the next line?
test.             --> three dots on this line
.. but
wouldn'           --> the issue @Sygmei is fixing
t it
pass?             --> three ? on this line
?? final.
word-
that-
can-be-           --> broken into 'can-' and 'be-'
broken

Tieske · 2021-12-30T08:34:02Z

this does remind me of this: #141

Sygmei · 2021-12-30T08:36:54Z

We can change the separator but %w and %W clearly are not ideal solutions
User might want to keep hyphened-words together as well ?

alerque · 2021-12-30T10:52:37Z

I agree with @Tieske that there are lots of problems and this is just shifting them, but I'm inclined to think that shifting them has some benefit. As long as we have a naive implementation the closer it works to what it says on the tin the better.

On the other hand I have lots of experience with typesetting and wrapping text context is such a hopelessly complex task I do think this naive implementation needs a disclaimer. In fact it should have been called naive_wrap() or naive_chunk() or something like that.

If we wanted to actually fix this we would have to hook into Unicode data –perhaps by using ICU– to actually determine what were word characters and where breaks are allowed in a locale dependent way. There really isn't any way to fake this and get it right more than a small percentage of the time. Given that we don't want to introduce external dependencies, I'm not sure there is anything for it but to just keep faking it or deprecate this entirely. Which brings us back to "we might as well fake it as best we can".

Sygmei · 2021-12-31T08:09:39Z

One advantage on splitting on space characters only is that the result is usually closer to what the user might expect.
Splitting on %w breaks unicode characters whereas breaking on space might make the line a little longer than expected (better downside imho)

Tieske · 2022-01-03T09:07:27Z

I rebased and force-pushed your branch @Sygmei

Added the test from a previous comment as well.

Tieske · 2022-01-03T09:10:05Z

The only problem left imo is that the function takes the width parameter, and treats it as the minimum. Whereas user expectations probably are that it would be the maximum width.

Sygmei · 2022-01-03T09:13:17Z

I did not change this behaviour since it was clearly documented :
"It will not break long words, so lines can be over the length to that extent."
Should that behaviour change ?

Tieske · 2022-01-03T15:13:38Z

I read that in a way that words longer than the limit would extend beyond it (they would not be forcefully broken). But that's different from the current behaviour of a minimum length.

Though that might as well be a different pr.

Opinions @alerque ?

Tieske · 2022-01-04T08:53:33Z

Added a commit that will honour the width given, and adds an option to forcefully break words longer than width.
This is what (I think) a user would expect from these functions.

It was harder than anticipated actually, it has soo many edge cases... added tests for all of them

## 1.12.0 (2022-Jan-10) - deprecate: module `pl.text` the contents have moved to `pl.stringx` (removal later) [#407](lunarmodules/Penlight#407) - deprecate: module `pl.xml`, please switch to a more specialized library (removal later) [#409](lunarmodules/Penlight#409) - feat: `utils.npairs` added. An iterator with a range that honours the `n` field [#387](lunarmodules/Penlight#387) - fix: `xml.maptags` would hang if it encountered text-nodes [#396](lunarmodules/Penlight#396) - fix: `text.dedent` didn't handle declining indents nor empty lines [#402](lunarmodules/Penlight#402) - fix: `dir.getfiles`, `dir.getdirectories`, and `dir.getallfiles` now have the directory optional, as was already documented [#405](lunarmodules/Penlight#405) - feat: `array2d.default_range` now also takes a spreadsheet range, which means also other functions now take a range. [#404](lunarmodules/Penlight#404) - fix: `lapp` enums allow [patterns magic characters](https://www.lua.org/pil/20.2.html) [#393](lunarmodules/Penlight#393) - fix: `text.wrap` and `text.fill` numerous fixes for handling whitespace, accented characters, honouring width, etc. [#400](lunarmodules/Penlight#400) - feat: `text.wrap` and `text.fill` have a new parameter to forcefully break words longer than the width given. [#400](lunarmodules/Penlight#400) - fix: `stringx.expandtabs` could error out on Lua 5.3+ [#406](lunarmodules/Penlight#406) - fix: `pl` the module would not properly forward the `newindex` metamethod on the global table. [#395](lunarmodules/Penlight#395) - feat: `utils.enum` added to create enums and prevent magic strings [#408](lunarmodules/Penlight#408) - change: `xml.new` added some sanity checks on input [#397](lunarmodules/Penlight#397) - added: `xml.xml_escape` and `xml.xml_unescape` functions (previously private) [#397](lunarmodules/Penlight#397) - feat: `xml.tostring` now also takes numeric indents (previously only strings) [#397](lunarmodules/Penlight#397) - fix: `xml.walk` now detects recursion (errors out) [#397](lunarmodules/Penlight#397) - fix: `xml.clone` now detects recursion (errors out) [#397](lunarmodules/Penlight#397) - fix: `xml.compare` now detects recursion (errors out) [#397](lunarmodules/Penlight#397) - fix: `xml.compare` text compares now work [#397](lunarmodules/Penlight#397) - fix: `xml.compare` attribute order compares now only compare if both inputs provide an order [#397](lunarmodules/Penlight#397) - fix: `xml.compare` child comparisons failing now report proper error [#397](lunarmodules/Penlight#397)

alerque reviewed Dec 29, 2021

View reviewed changes

Sygmei commented Dec 29, 2021

View reviewed changes

Tieske force-pushed the patch-1 branch from 57d13c2 to da6474b Compare January 3, 2022 08:41

Tieske force-pushed the patch-1 branch from b52a54b to 7b3efef Compare January 4, 2022 08:50

alerque mentioned this pull request Jan 5, 2022

deprecating the pl.text module #403

Closed

Sygmei and others added 2 commits January 5, 2022 14:15

fix(text) fixed text.wrap (mono-letter, accents, tests)

0d94139

feat(text) wrap/fill can break long words, honours width

b60cf6a

Tieske force-pushed the patch-1 branch from 7b3efef to b60cf6a Compare January 5, 2022 13:15

Tieske merged commit 70eb8f6 into lunarmodules:master Jan 5, 2022

Tieske mentioned this pull request Jan 10, 2022

deps(penlight) Bump to 1.12 Kong/kong#8278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] text.wrap now returns non-empty list for one character strings #400

[fix] text.wrap now returns non-empty list for one character strings #400

Sygmei commented Dec 28, 2021 •

edited

Loading

alerque left a comment •

edited

Loading

Sygmei commented Dec 29, 2021

Sygmei Dec 29, 2021

Tieske commented Dec 30, 2021

Tieske commented Dec 30, 2021

Sygmei commented Dec 30, 2021

alerque commented Dec 30, 2021

Sygmei commented Dec 31, 2021

Tieske commented Jan 3, 2022

Tieske commented Jan 3, 2022

Sygmei commented Jan 3, 2022 •

edited

Loading

Tieske commented Jan 3, 2022

Tieske commented Jan 4, 2022

[fix] text.wrap now returns non-empty list for one character strings #400

[fix] text.wrap now returns non-empty list for one character strings #400

Conversation

Sygmei commented Dec 28, 2021 • edited Loading

alerque left a comment • edited Loading

Choose a reason for hiding this comment

Sygmei commented Dec 29, 2021

Sygmei Dec 29, 2021

Choose a reason for hiding this comment

Tieske commented Dec 30, 2021

Tieske commented Dec 30, 2021

Sygmei commented Dec 30, 2021

alerque commented Dec 30, 2021

Sygmei commented Dec 31, 2021

Tieske commented Jan 3, 2022

Tieske commented Jan 3, 2022

Sygmei commented Jan 3, 2022 • edited Loading

Tieske commented Jan 3, 2022

Tieske commented Jan 4, 2022

Sygmei commented Dec 28, 2021 •

edited

Loading

alerque left a comment •

edited

Loading

Sygmei commented Jan 3, 2022 •

edited

Loading