typeanswer: [type:nc] – ignores combining characters #3422

twwn · 2024-09-16T15:53:34Z

Building on my previous PR's cleanup, adds a comparison variant to [type] which ignores when combining characters of the expected field are missing from the typed input. It still shows these characters in the "expected" line for reference.

It's useful for languages with e.g. diacritics that are required for reference (such as in dictionaries), but rarely actually learned or used in everyday writing. Among these languages: Arabic, Hebrew, Persian, Urdu.

Previously:

Now:

[This also removes the need to have both variants of words/sentences present as separate fields, to show them redundantly, etc.]

The bool combining controls it as new final parameter of both relevant compare_answer functions. In Python, it's set to true by default.

Use on the note templates: [type:nc:field]

dae · 2024-09-20T11:25:43Z

Thanks twwn, I'll try to review this soon. (For my own reference, previous PR: #680)

twwn · 2024-09-22T14:10:57Z

The minor commits are suggestions, each with a link to relevant Stack Overflow threads in their messages.

"simplify by using nfkd throughout" does so by removing the use of NFC in the original case as well. Though that required adjusting two testcases, this only reflects internal processing: I compared both before/after and the rendered end results are the same.
I'm partial to leaving in normalize_provided as member function as it means normalize (as normalize_expected) won't have to carry around the bool.

I'll close this PR & reopen a new one after review, the incremental rebases are rather tedious.

dae · 2024-09-25T10:18:40Z

If you could bring the branch up to date with main (either here or in a new PR if that's easier), that'll make it easier for me to review.

twwn · 2024-09-27T15:40:39Z

Done.

The small commits are just suggestions. I've split the NFKD commit further: The second one is only there if you want the extra function gone. But since that requires adding a bool parameter, I myself would rather not actually merge it.

dae · 2024-09-28T01:17:54Z

Thanks, I'll get back to this again soon. In case you didn't see it, there's been a report about issues with the recent changes: https://forums.ankiweb.net/t/anki-24-10-beta/49989/22

twwn · 2024-09-28T02:10:11Z

That report, it's since prepare_expected (which strips the html tags) is currently no longer run for the two "new" cases: when nothing is typed, or the typed=expected one.

I can…

ensure that HTML tags actually get evaluated rather than just dumped as text (that's definitively my omission)
add prepare_expected back in

… each option for both cases, or mix them. The original behavior is restored with prepare_expected for both cases.

It's a moot argument if changes are not to break anything, but IMHO it's really bad card design to clutter the field that you type with additional tags or even media.

That's why I feel like the most natural way (if that word makes sense here) is to add prepare_expected back in for the typed=expected case but leave in & instead evaluate the HTML for the nothing-typed one. As in: type:: during review becomes an offer—if the user skips it, Anki would now treat the field as if the template included it directly.

dae · 2024-09-28T03:14:34Z

instead evaluate the HTML for the nothing-typed one.

So HTML tags like bold would appear when nothing is typed, and not appear otherwise? I fear that inconsistency might lead to some additional "bug" reports.

it's really bad card design to clutter the field that you type with additional tags or even media.

If you're saying that you think users should put those things in a different field that's not being typed in, I fully agree, though sadly it seems to be quite common to put things like sound tags in the same field.

twwn · 2024-09-28T03:23:23Z

So HTML tags like bold would appear when nothing is typed, and not appear otherwise?

Yes, it would break with the old behavior. But it's just a suggestion, I entirely understand if it's not deemed good/worthwhile.

I guess I'll just restore prepare_expected for both cases for now, it's easier to do, anyway. It'll take a while since I really need to sleep 😃.

dae · 2024-09-28T03:33:39Z

Sounds good, and no worries; it'll likely be a few days before I can build another beta anyway.

Fix: Add prepare_expected back in for the 'nothing typed' & 'correctly typed' cases. This also makes expected_original redundant again. Style: %s/provided/typed/g Style: rename one ch → c Testcase: whitespace_is_trimmed: added a check for the "correctly typed" path and renamed it to tags_removed (there's no whitespace?) Testcase: empty_input_shows_as_code: changed to also check that tags get trimmed

Adds a comparison variant to [type] which ignores when combining characters of the expected field are missing from the provided input. It still shows these characters in the 'expected' line for reference. It's useful for languages with e.g. diacritics that are required for reference (such as in dictionaries), but rarely actually learned or used in everyday writing. Among these languages: Arabic, Hebrew, Persian, Urdu. The bool 'combining' controls it as new final parameter of both relevant compare_answer functions. On the Python side, it's set to true by default. Use on the note templates: [type:nc:field] (only the front needs to include :nc) This also removes the need to have both variants of words/sentences present as separate fields, to show them redundantly, etc.

Requires adjusting two testcases, but both render exactly the same in Anki itself. On NFC vs. NKFD: https://stackoverflow.com/a/77432079

…parameter) I'd prefer to keep this extra method.

Should get rid of most relocations, at the expense of over-allocating. On Vec's (String's) behavior: https://stackoverflow.com/a/72787776

typeCorrect is not marked as private either, but we can at least do the right thing for newly-added code.

This reverts commit 9fbacbf.

… a bool parameter)" This reverts commit df2dd33.

dae · 2024-09-30T13:08:07Z

Thank you for the atomic commits and multiple PRs, that has made my life a lot easier. :-) Regarding the NFKD change, I'm a bit worried this might have unintended side effects on other inputs, but hopefully if that's the case it'll become apparent during beta testing.

The micro optimisation was probably unnecessary here, but good practice for hotter code paths. I've only reverted it because it caused a conflict for the commit above it.

dae · 2024-10-08T10:15:52Z

Am I correct in assuming that https://forums.ankiweb.net/t/anki-24-10-beta/49989/143 is caused by the NFKD simplification? If that change has caused us to output a mix of NFC (note text) and NFD, that seems suboptimal. Thoughts?

twwn · 2024-10-08T14:39:34Z

I would have said: Although this deck's issue could be from unsupported Javascript, we can easily revert to NFC for the default comparison.

But that deck has another issue: It doesn't at all use type::Reading for the typing field, but type:sort_id. So it's comparing not to "ようがん" here but "7330", and then seems to "reimplement" the comparison in Javascript (minus tokinization). That's doubly unsupported. If the deck author used Anki's {{type::Reading}}, it would have just kept working fine.

Seems to me to be a serious issue of spaghetti code that banked on previous broken use not blowing up. Even if this unsupported special case is accounted for again, it's possible to just break again in other unrelated ways in the future.

Also, I don't think we're using a mix of NFC & NFKD to begin with? typeanswer.rs uses only form throughout & before the comparison starts. Before the simplification it was just a split of NFC for the default and NFKD for the new one.

edit: Rewritten.

twwn · 2024-10-08T19:33:49Z

Alright, I dunno why returning to NFC didn't work out the first time when I wrote the prior comment's versions. I've now recompiled a fresh checkout and it actually does seem to fix it.

So I'd for now propose to use NFC instead throughout:

Doos Anki run NFC when field content is changed/entered? Is that's what you meant by "output a mix"? In that case, it should indeed be NFC.
I don't actually recall if there was an algorithmic reason I picked NFKD way back when I originally wrote the noncombining comparison. Generally it should be a must (only NFKD breaks a string down—Decomposed vs Composed, after all), but from testing a few words right now, NFC seems to work fine as well.

So I'll revert (into the open PR) and just keep testing. Best case: It just works for both comparisons. Worst case: NFKD returns only for the NC comparison.

Unrelated: I asked the protobuf maintainers about an Anki startup crash. They say the bug is within Anki.

twwn · 2024-10-11T10:11:26Z

(PR #3482)

dae · 2024-10-11T10:39:37Z

I agree we don't want to be spending much time looking into issue with third-party JS, but I was concerned in this case that it might be indicative of a larger problem. Anki does normalize fields into NFC by default, so outputting the typing comparison in NFC is a good idea.

I'm not sure what the crash is about. Did you try Anki's binary build? You could try figuring out which exact Python protobuf call is triggering the crash, and then see if feeding the same data into the call manually (outside of Anki) recreates it.

twwn · 2024-10-13T15:49:03Z

I haven't observed the crash with the binary builds (rc2/rc3), they may yet be unaffected. But even on my distro build, the crash occurs frequently but not consistently, as if caused by some race condition.

I wish the error message gave an actual pointer as to where to look, but it's left me clueless.

Anyway, Arch and Anki seem to be using the same protobuf version? If I understand correctly, the only difference would be that the former builds from the general tarball, and the latter only pulls the python bindings from PyPi.

dae · 2024-10-15T16:47:30Z

IIRC, Arch's anki-bin uses our Python wheels, and anki builds them on the local machine. Do both exhibit the same problem?

twwn · 2024-10-15T19:26:21Z

Yes. I just tried with anki (my PKGBUILD is identical except I'm using the git repo @ c1a2b03 + -ffat-lto-objects), then replaced it with anki-bin (altered to install 24.10b3), and repeated that switch once more. Exact same crash.

The former invokes python directly, the latter python3, that's the only meaningless difference in the output.

edit: My only guess would be that it's related to the startup sync. Since when I kill my connection, Anki starts fine, only complaining…

sync status check failed: A network error occurred.

Error details: ⁨error sending request for url ()⁩
update check failed: A network error occurred.

Error details: ⁨error sending request for url ()⁩

… and when I then reestablish the connection, Anki (no matter if anki's or anki-bin's, works even interchangeably) for a certain period also will start fine (after exiting it entirely). Am I right in thinking that Anki skips the startup sync if it detects that its last exit was within #min?

twwn force-pushed the typeanswer-ignore-combining branch 2 times, most recently from 0cd34f0 to 1922d25 Compare September 16, 2024 16:00

twwn mentioned this pull request Sep 20, 2024

typeanswer: cleanups #3415

Merged

twwn force-pushed the typeanswer-ignore-combining branch 2 times, most recently from 331f027 to 817e719 Compare September 22, 2024 14:03

dae added this to the 24.10 milestone Sep 27, 2024

twwn force-pushed the typeanswer-ignore-combining branch 2 times, most recently from 32132e7 to 14df674 Compare September 27, 2024 15:36

twwn force-pushed the typeanswer-ignore-combining branch from 14df674 to 0734dff Compare September 29, 2024 23:54

twwn added 4 commits September 30, 2024 02:01

typeanswer: simplify by using nfkd throughout

2106c43

Requires adjusting two testcases, but both render exactly the same in Anki itself. On NFC vs. NKFD: https://stackoverflow.com/a/77432079

typeanswer: 'simplify' by removing normalize_typed (requiring a bool …

df2dd33

…parameter) I'd prefer to keep this extra method.

typeanswer: micro-optimize vectors

9fbacbf

Should get rid of most relocations, at the expense of over-allocating. On Vec's (String's) behavior: https://stackoverflow.com/a/72787776

twwn force-pushed the typeanswer-ignore-combining branch from 0734dff to 9fbacbf Compare September 30, 2024 00:05

dae added 3 commits September 30, 2024 22:50

Mark combining as private

cb4f25c

typeCorrect is not marked as private either, but we can at least do the right thing for newly-added code.

Revert "typeanswer: micro-optimize vectors"

4d9c5c6

This reverts commit 9fbacbf.

Revert "typeanswer: 'simplify' by removing normalize_typed (requiring…

f3e8f88

… a bool parameter)" This reverts commit df2dd33.

dae merged commit e2124cd into ankitects:main Sep 30, 2024
1 check passed

twwn deleted the typeanswer-ignore-combining branch October 1, 2024 19:35

voczi mentioned this pull request Oct 8, 2024

Anki 24.10rc2 support ankidroid/Anki-Android#17177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

typeanswer: [type:nc] – ignores combining characters #3422

typeanswer: [type:nc] – ignores combining characters #3422

twwn commented Sep 16, 2024

dae commented Sep 20, 2024

twwn commented Sep 22, 2024 •

edited

Loading

dae commented Sep 25, 2024

twwn commented Sep 27, 2024 •

edited

Loading

dae commented Sep 28, 2024

twwn commented Sep 28, 2024

dae commented Sep 28, 2024

twwn commented Sep 28, 2024

dae commented Sep 28, 2024

dae commented Sep 30, 2024 •

edited

Loading

dae commented Oct 8, 2024

twwn commented Oct 8, 2024 •

edited

Loading

twwn commented Oct 8, 2024 •

edited

Loading

twwn commented Oct 11, 2024

dae commented Oct 11, 2024

twwn commented Oct 13, 2024

dae commented Oct 15, 2024

twwn commented Oct 15, 2024 •

edited

Loading

typeanswer: [type:nc] – ignores combining characters #3422

typeanswer: [type:nc] – ignores combining characters #3422

Conversation

twwn commented Sep 16, 2024

dae commented Sep 20, 2024

twwn commented Sep 22, 2024 • edited Loading

dae commented Sep 25, 2024

twwn commented Sep 27, 2024 • edited Loading

dae commented Sep 28, 2024

twwn commented Sep 28, 2024

dae commented Sep 28, 2024

twwn commented Sep 28, 2024

dae commented Sep 28, 2024

dae commented Sep 30, 2024 • edited Loading

dae commented Oct 8, 2024

twwn commented Oct 8, 2024 • edited Loading

twwn commented Oct 8, 2024 • edited Loading

twwn commented Oct 11, 2024

dae commented Oct 11, 2024

twwn commented Oct 13, 2024

dae commented Oct 15, 2024

twwn commented Oct 15, 2024 • edited Loading

twwn commented Sep 22, 2024 •

edited

Loading

twwn commented Sep 27, 2024 •

edited

Loading

dae commented Sep 30, 2024 •

edited

Loading

twwn commented Oct 8, 2024 •

edited

Loading

twwn commented Oct 8, 2024 •

edited

Loading

twwn commented Oct 15, 2024 •

edited

Loading