Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of failing character tests (after #85) #90

Closed
Manishearth opened this issue Dec 19, 2022 · 9 comments · Fixed by #92
Closed

Analysis of failing character tests (after #85) #90

Manishearth opened this issue Dec 19, 2022 · 9 comments · Fixed by #92

Comments

@Manishearth
Copy link
Member

Manishearth commented Dec 19, 2022

This is where I'm tracking all of the failures left in the character tests after #85 and #91. This is not checking the 314 failures (250 after #87) for the basic tests.

I'm categorizing them by their section in BidiConformanceTest.txt, and filling in issue numbers as necessary. Investigations on the ??s would be appreciated!

Explicit directional overrides applied to paired brackets (#89)

8 tests
202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6
202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6
202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3
202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3
202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4
202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4
202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1
202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1

Nonspacing marks applied to paired brackets. These cases exercise the ignoring of bc=BN characters (#89, probably)

4 tests
0041 200F 005B 05D0 005D 200D 20D6;0;0;0 1 1 1 1 x 1;0 6 4 3 2 1
0041 200F 005B 200D 20D6 05D0 005D;0;0;0 1 1 x 1 1 1;0 6 5 4 2 1
0041 200F 005B 200D 20D6 05D0 005D 200D 20D6;0;0;0 1 1 x 1 1 1 x 1;0 8 6 5 4 2 1
0041 200F 005B 200D 200B 20D6 05D0 005D 200B 200D 20D6;0;0;0 1 1 x x 1 1 1 x x 1;0 10 7 6 5 2 1

Sequences containing directional formatting characters (#89)

6 tests
0061 202D 202C 0020 0031 0020 0032 002D 0033;1;1;2 x x 2 2 2 2 2 2;0 3 4 5 6 7 8
0061 202E 202C 0020 0031 0020 0032 002D 0033;1;1;2 x x 2 2 2 2 2 2;0 3 4 5 6 7 8
0627 202A 202C 0020 0031 002D 0032;0;0;1 x x 1 2 1 2;6 5 4 3 0
0627 202B 202C 0020 0031 002D 0032;0;0;1 x x 1 2 1 2;6 5 4 3 0
05D0 202A 202A 202C 202C 0020 0031 0020 0032;0;0;1 x x x x 1 2 1 2;8 7 6 5 0
0061 202B 202B 202C 202C 0020 0031 0020 0032;1;1;2 x x x x 2 2 2 2;0 5 6 7 8

Combinations of paired brackets, numbers, and directional formatting characters (probably involves some of #89)

11 tests
2066 0029 0029 0661 0028 0627 0029;1;1;1 2 2 4 3 3 3;1 2 6 5 4 3 0
2066 0029 0029 0661 0028 0662 0029;1;1;1 2 2 4 3 4 3;1 2 6 5 4 3 0
2066 0029 2066 0661 0028 05D0 0029;1;1;1 2 2 6 5 5 5;1 2 6 5 4 3 0
0061 0028 0062 005B 0063 2068 05D0 2069 0064 005D 0065 0029 0066;1;1;2 2 2 2 2 2 3 2 2 2 2 2 2;0 1 2 3 4 5 6 7 8 9 10 11 12
05D0 0028 05D1 005B 05D2 2068 0061 2069 05D3 005D 05D4 0029 05D5;0;0;1 1 1 1 1 1 2 1 1 1 1 1 1;12 11 10 9 8 7 6 5 4 3 2 1 0
0061 0028 0062 2067 05D0 005B 05D1 2066 0063 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 3 1 2 1 2 0 1 0 1;0 1 2 3 13 12 11 10 8 9 7 6 5 4 14 15 16 17
0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 2 2 x 3 3 3 x 3 3 3 1 2 1 2 0 1 0 1;0 1 2 3 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 23 24 25 26
0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;1;1;2 1 2 1 3 3 3 3 4 4 4 x 5 5 5 x 5 5 5 3 4 3 4 1 1 1 1;26 25 24 23 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 3 2 1 0
05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 4 2 3 2 3 1 2 1 2;17 16 15 14 4 5 6 7 9 8 10 11 12 13 3 2 1 0
05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;0;0;1 0 1 0 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 0 0 0 0;0 1 2 3 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 23 24 25 26
05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 1 2 1 2;26 25 24 23 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 3 2 1 0
@Manishearth
Copy link
Member Author

Also for people debugging these, https://util.unicode.org/UnicodeJsps/bidic.jsp?s=%D7%90%281%29&b=0&u=140&d=2 is amazing

@Manishearth
Copy link
Member Author

Manishearth commented Dec 20, 2022

Between #85 and #91, I think I've knocked out all of the failures that are not due to #89 (or will be hard to debug without #89). I might want to wait for #85 to merge before doing #89 since it's got involvement with everything.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 20, 2022

After #92, so far we have these failures:

Explicit directional overrides applied to paired brackets

8 tests
202A 05D0 0028 05D1 202C 202D 0029;2;1;x 3 3 3 x x 2;3 2 1 6
202A 05D0 0028 05D1 202C 202D 0029 202C;2;1;x 3 3 3 x x 2 x;3 2 1 6
202B 0061 0028 0062 202C 202E 0029;2;0;x 2 2 2 x x 1;6 1 2 3
202B 0061 0028 0062 202C 202E 0029 202C;2;0;x 2 2 2 x x 1 x;6 1 2 3
202D 0028 202C 202A 05D0 0029 05D1;2;1;x 2 x x 3 3 3;1 6 5 4
202D 0028 202C 202A 05D0 0029 05D1 202C;2;1;x 2 x x 3 3 3 x;1 6 5 4
202E 0028 202C 202B 0061 0029 0062;2;0;x 1 x x 2 2 2;4 5 6 1
202E 0028 202C 202B 0061 0029 0062 202C;2;0;x 1 x x 2 2 2 x;4 5 6 1

Combinations of paired brackets, numbers, and directional formatting characters

11 tests
2066 0029 0029 0661 0028 0627 0029;1;1;1 2 2 4 3 3 3;1 2 6 5 4 3 0
2066 0029 0029 0661 0028 0662 0029;1;1;1 2 2 4 3 4 3;1 2 6 5 4 3 0
2066 0029 2066 0661 0028 05D0 0029;1;1;1 2 2 6 5 5 5;1 2 6 5 4 3 0
0061 0028 0062 005B 0063 2068 05D0 2069 0064 005D 0065 0029 0066;1;1;2 2 2 2 2 2 3 2 2 2 2 2 2;0 1 2 3 4 5 6 7 8 9 10 11 12
05D0 0028 05D1 005B 05D2 2068 0061 2069 05D3 005D 05D4 0029 05D5;0;0;1 1 1 1 1 1 2 1 1 1 1 1 1;12 11 10 9 8 7 6 5 4 3 2 1 0
0061 0028 0062 2067 05D0 005B 05D1 2066 0063 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 3 1 2 1 2 0 1 0 1;0 1 2 3 13 12 11 10 8 9 7 6 5 4 14 15 16 17
0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;0;0;0 0 0 0 1 1 1 1 2 2 2 x 3 3 3 x 3 3 3 1 2 1 2 0 1 0 1;0 1 2 3 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 23 24 25 26
0061 0028 0062 2067 05D0 005B 05D1 2066 0063 007B 0064 202B 007D 0020 007B 202C 05D2 007D 05D3 2069 0065 005D 0066 2069 05D4 0029 05D5;1;1;2 1 2 1 3 3 3 3 4 4 4 x 5 5 5 x 5 5 5 3 4 3 4 1 1 1 1;26 25 24 23 22 21 20 19 8 9 10 18 17 16 14 13 12 7 6 5 4 3 2 1 0
05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 4 2 3 2 3 1 2 1 2;17 16 15 14 4 5 6 7 9 8 10 11 12 13 3 2 1 0
05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;0;0;1 0 1 0 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 0 0 0 0;0 1 2 3 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 23 24 25 26
05D0 0028 05D1 2066 0061 005B 0062 2067 05D2 007B 05D3 202A 007D 0020 007B 202C 0063 007D 0064 2069 05D4 005D 05D5 2069 0065 0029 0066;1;1;1 1 1 1 2 2 2 2 3 3 3 x 4 4 4 x 4 4 4 2 3 2 3 1 2 1 2;26 25 24 23 4 5 6 7 12 13 14 16 17 18 10 9 8 19 20 21 22 3 2 1 0

@Manishearth
Copy link
Member Author

Unfortunately #92 causes a massive pile of failures in the basic tests.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 21, 2022

Down to two failures in #92! And fixed the basic test failures it was causing. There are still ~100 failing basic tests though.

@Manishearth
Copy link
Member Author

Ah, the problem is that isolating run sequences can have gaps in them. I'm going to need to rearchitect some of the N0 work ....

@Manishearth
Copy link
Member Author

The last two failures are https://github.com/unicode-org/properties/issues/70, and it's a whopper.

@Manishearth
Copy link
Member Author

Ah, it's actually not that much of a whopper since it only affects things that I've done on this repo recently 😁, the existing code actually handled this pretty well.

The basic issue is that the weak and neutral rules must apply within an isolating run sequence only, even if it has gaps. This is mostly fine for all of our iterations, except for a couple cases of lookahead that I did incorrectly, and every case of lookbehind. This is fixable.

@Manishearth
Copy link
Member Author

.... and the character tests pass! still got a ways to go on the basic tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant