Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typeanswer: [type:nc] – ignores combining characters #3422

Merged
merged 8 commits into from
Sep 30, 2024

Conversation

twwn
Copy link
Contributor

@twwn twwn commented Sep 16, 2024

Building on my previous PR's cleanup, adds a comparison variant to [type] which ignores when combining characters of the expected field are missing from the typed input. It still shows these characters in the "expected" line for reference.

It's useful for languages with e.g. diacritics that are required for reference (such as in dictionaries), but rarely actually learned or used in everyday writing. Among these languages: Arabic, Hebrew, Persian, Urdu.

Previously:
pre

Now:
after
[This also removes the need to have both variants of words/sentences present as separate fields, to show them redundantly, etc.]

The bool combining controls it as new final parameter of both relevant compare_answer functions. In Python, it's set to true by default.

Use on the note templates: [type:nc:field]

@twwn twwn force-pushed the typeanswer-ignore-combining branch 2 times, most recently from 0cd34f0 to 1922d25 Compare September 16, 2024 16:00
@dae
Copy link
Member

dae commented Sep 20, 2024

Thanks twwn, I'll try to review this soon. (For my own reference, previous PR: #680)

@twwn twwn mentioned this pull request Sep 20, 2024
@twwn twwn force-pushed the typeanswer-ignore-combining branch 2 times, most recently from 331f027 to 817e719 Compare September 22, 2024 14:03
@twwn
Copy link
Contributor Author

twwn commented Sep 22, 2024

The minor commits are suggestions, each with a link to relevant Stack Overflow threads in their messages.

  • "simplify by using nfkd throughout" does so by removing the use of NFC in the original case as well. Though that required adjusting two testcases, this only reflects internal processing: I compared both before/after and the rendered end results are the same.
  • I'm partial to leaving in normalize_provided as member function as it means normalize (as normalize_expected) won't have to carry around the bool.

I'll close this PR & reopen a new one after review, the incremental rebases are rather tedious.

@dae
Copy link
Member

dae commented Sep 25, 2024

If you could bring the branch up to date with main (either here or in a new PR if that's easier), that'll make it easier for me to review.

@dae dae added this to the 24.10 milestone Sep 27, 2024
@twwn twwn force-pushed the typeanswer-ignore-combining branch 2 times, most recently from 32132e7 to 14df674 Compare September 27, 2024 15:36
@twwn
Copy link
Contributor Author

twwn commented Sep 27, 2024

Done.

The small commits are just suggestions. I've split the NFKD commit further: The second one is only there if you want the extra function gone. But since that requires adding a bool parameter, I myself would rather not actually merge it.

@dae
Copy link
Member

dae commented Sep 28, 2024

Thanks, I'll get back to this again soon. In case you didn't see it, there's been a report about issues with the recent changes: https://forums.ankiweb.net/t/anki-24-10-beta/49989/22

@twwn
Copy link
Contributor Author

twwn commented Sep 28, 2024

That report, it's since prepare_expected (which strips the html tags) is currently no longer run for the two "new" cases: when nothing is typed, or the typed=expected one.

I can…

  • ensure that HTML tags actually get evaluated rather than just dumped as text (that's definitively my omission)
  • add prepare_expected back in

… each option for both cases, or mix them. The original behavior is restored with prepare_expected for both cases.

It's a moot argument if changes are not to break anything, but IMHO it's really bad card design to clutter the field that you type with additional tags or even media.

That's why I feel like the most natural way (if that word makes sense here) is to add prepare_expected back in for the typed=expected case but leave in & instead evaluate the HTML for the nothing-typed one. As in: type:: during review becomes an offer—if the user skips it, Anki would now treat the field as if the template included it directly.

@dae
Copy link
Member

dae commented Sep 28, 2024

instead evaluate the HTML for the nothing-typed one.

So HTML tags like bold would appear when nothing is typed, and not appear otherwise? I fear that inconsistency might lead to some additional "bug" reports.

it's really bad card design to clutter the field that you type with additional tags or even media.

If you're saying that you think users should put those things in a different field that's not being typed in, I fully agree, though sadly it seems to be quite common to put things like sound tags in the same field.

@twwn
Copy link
Contributor Author

twwn commented Sep 28, 2024

So HTML tags like bold would appear when nothing is typed, and not appear otherwise?

Yes, it would break with the old behavior. But it's just a suggestion, I entirely understand if it's not deemed good/worthwhile.

I guess I'll just restore prepare_expected for both cases for now, it's easier to do, anyway. It'll take a while since I really need to sleep 😃.

@dae
Copy link
Member

dae commented Sep 28, 2024

Sounds good, and no worries; it'll likely be a few days before I can build another beta anyway.

Fix: Add prepare_expected back in for the 'nothing typed' & 'correctly typed' cases. This also makes expected_original redundant again.

Style: %s/provided/typed/g

Style: rename one ch → c

Testcase: whitespace_is_trimmed: added a check for the "correctly typed" path and renamed it to tags_removed (there's no whitespace?)

Testcase: empty_input_shows_as_code: changed to also check that tags get trimmed
@twwn twwn force-pushed the typeanswer-ignore-combining branch from 14df674 to 0734dff Compare September 29, 2024 23:54
Adds a comparison variant to [type] which ignores when combining characters of the expected field are missing from the provided input. It still shows these characters in the 'expected' line for reference.

It's useful for languages with e.g. diacritics that are required for reference (such as in dictionaries), but rarely actually learned or used in everyday writing. Among these languages: Arabic, Hebrew, Persian, Urdu.

The bool 'combining' controls it as new final parameter of both relevant compare_answer functions. On the Python side, it's set to true by default.

Use on the note templates: [type:nc:field] (only the front needs to include :nc)

This also removes the need to have both variants of words/sentences present as separate fields, to show them redundantly, etc.
Requires adjusting two testcases, but both render exactly the same in Anki itself.

On NFC vs. NKFD: https://stackoverflow.com/a/77432079
…parameter)

I'd prefer to keep this extra method.
Should get rid of most relocations, at the expense of over-allocating.

On Vec's (String's) behavior: https://stackoverflow.com/a/72787776
@twwn twwn force-pushed the typeanswer-ignore-combining branch from 0734dff to 9fbacbf Compare September 30, 2024 00:05
typeCorrect is not marked as private either, but we can at least do
the right thing for newly-added code.
@dae
Copy link
Member

dae commented Sep 30, 2024

Thank you for the atomic commits and multiple PRs, that has made my life a lot easier. :-) Regarding the NFKD change, I'm a bit worried this might have unintended side effects on other inputs, but hopefully if that's the case it'll become apparent during beta testing.

The micro optimisation was probably unnecessary here, but good practice for hotter code paths. I've only reverted it because it caused a conflict for the commit above it.

@dae dae merged commit e2124cd into ankitects:main Sep 30, 2024
1 check passed
@twwn twwn deleted the typeanswer-ignore-combining branch October 1, 2024 19:35
@dae
Copy link
Member

dae commented Oct 8, 2024

Am I correct in assuming that https://forums.ankiweb.net/t/anki-24-10-beta/49989/143 is caused by the NFKD simplification? If that change has caused us to output a mix of NFC (note text) and NFD, that seems suboptimal. Thoughts?

@twwn
Copy link
Contributor Author

twwn commented Oct 8, 2024

I would have said: Although this deck's issue could be from unsupported Javascript, we can easily revert to NFC for the default comparison.

But that deck has another issue: It doesn't at all use type::Reading for the typing field, but type:sort_id. So it's comparing not to "ようがん" here but "7330", and then seems to "reimplement" the comparison in Javascript (minus tokinization). That's doubly unsupported. If the deck author used Anki's {{type::Reading}}, it would have just kept working fine.

Seems to me to be a serious issue of spaghetti code that banked on previous broken use not blowing up. Even if this unsupported special case is accounted for again, it's possible to just break again in other unrelated ways in the future.

Also, I don't think we're using a mix of NFC & NFKD to begin with? typeanswer.rs uses only form throughout & before the comparison starts. Before the simplification it was just a split of NFC for the default and NFKD for the new one.

edit: Rewritten.

@twwn
Copy link
Contributor Author

twwn commented Oct 8, 2024

Alright, I dunno why returning to NFC didn't work out the first time when I wrote the prior comment's versions. I've now recompiled a fresh checkout and it actually does seem to fix it.

So I'd for now propose to use NFC instead throughout:

  • Doos Anki run NFC when field content is changed/entered? Is that's what you meant by "output a mix"? In that case, it should indeed be NFC.
  • I don't actually recall if there was an algorithmic reason I picked NFKD way back when I originally wrote the noncombining comparison. Generally it should be a must (only NFKD breaks a string down—Decomposed vs Composed, after all), but from testing a few words right now, NFC seems to work fine as well.

So I'll revert (into the open PR) and just keep testing. Best case: It just works for both comparisons. Worst case: NFKD returns only for the NC comparison.

Unrelated: I asked the protobuf maintainers about an Anki startup crash. They say the bug is within Anki.

@twwn
Copy link
Contributor Author

twwn commented Oct 11, 2024

(PR #3482)

@dae
Copy link
Member

dae commented Oct 11, 2024

I agree we don't want to be spending much time looking into issue with third-party JS, but I was concerned in this case that it might be indicative of a larger problem. Anki does normalize fields into NFC by default, so outputting the typing comparison in NFC is a good idea.

I'm not sure what the crash is about. Did you try Anki's binary build? You could try figuring out which exact Python protobuf call is triggering the crash, and then see if feeding the same data into the call manually (outside of Anki) recreates it.

@twwn
Copy link
Contributor Author

twwn commented Oct 13, 2024

I haven't observed the crash with the binary builds (rc2/rc3), they may yet be unaffected. But even on my distro build, the crash occurs frequently but not consistently, as if caused by some race condition.

I wish the error message gave an actual pointer as to where to look, but it's left me clueless.

Anyway, Arch and Anki seem to be using the same protobuf version? If I understand correctly, the only difference would be that the former builds from the general tarball, and the latter only pulls the python bindings from PyPi.

@dae
Copy link
Member

dae commented Oct 15, 2024

IIRC, Arch's anki-bin uses our Python wheels, and anki builds them on the local machine. Do both exhibit the same problem?

@twwn
Copy link
Contributor Author

twwn commented Oct 15, 2024

Yes. I just tried with anki (my PKGBUILD is identical except I'm using the git repo @ c1a2b03 + -ffat-lto-objects), then replaced it with anki-bin (altered to install 24.10b3), and repeated that switch once more. Exact same crash.

The former invokes python directly, the latter python3, that's the only meaningless difference in the output.

edit: My only guess would be that it's related to the startup sync. Since when I kill my connection, Anki starts fine, only complaining…

sync status check failed: A network error occurred.

Error details: ⁨error sending request for url ()⁩
update check failed: A network error occurred.

Error details: ⁨error sending request for url ()⁩

… and when I then reestablish the connection, Anki (no matter if anki's or anki-bin's, works even interchangeably) for a certain period also will start fine (after exiting it entirely). Am I right in thinking that Anki skips the startup sync if it detects that its last exit was within #min?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants