Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip non-ASCII tests if system does not support UTF-8 #749

Merged
merged 2 commits into from
Nov 8, 2022

Conversation

rossellhayes
Copy link
Contributor

@rossellhayes rossellhayes commented Nov 2, 2022

Previously, tests that involved non-ASCII text were skipped on Windows, with the assumption that all Unix builds of R support UTF-8. Tests now explicitly check if the system supports UTF-8 to avoid a CRAN failure on non-Unicode Unix builds. This fix is confirmed working on rhub's debian-clang-devel platform, which forces a Latin-1 (non-UTF-8) locale.

Closes #748.

@rossellhayes rossellhayes added the type: bug Maintainers have validated that it is a real bug in the project code label Nov 2, 2022
@rossellhayes rossellhayes self-assigned this Nov 2, 2022
@rossellhayes rossellhayes changed the title Skip non-ASCII tests is system does not support UTF-8 Skip non-ASCII tests if system does not support UTF-8 Nov 4, 2022
…h a non-ASCII character string and one test with a non-ASCII variable name

* Non-ASCII character strings should work on all platforms
* Non-ASCII variable names should only work on platforms with UTF-8 support
@rossellhayes
Copy link
Contributor Author

rossellhayes commented Nov 7, 2022

Okay, I've done some investigating on a non-UTF-8 Windows build, and here's what's going on:

The failing test involves the exercise_check_unparseable_unicode() function. That functions is called after a student's code fails to parse, and uses regex to find non-ASCII strings that may be responsible for the parse failure. I've tested it and it works correctly even on platforms that don't support Unicode, so the function doesn't require any edits.

This particular failing test occurred in a case where the code contains non-ASCII character but shouldn't return an error: it assigned a non-ASCII character vector to a non-ASCII (Greek) variable name, which is a valid operation. However, non-ASCII variable names can only be used on platforms with UTF-8 support.

To fix this, I've split the test into two tests. The non-ASCII character vector is tested on all platforms: it shouldn't produce a parse error on any system, since you can use Unicode characters in strings even on non-UTF-8 platforms. The non-ASCII variable name is now only tested on non-Windows platforms with UTF-8 support, since that will cause a parse error on non-UTF-8 systems.

I've tested this locally on Windows R 3.6.3 (no UTF-8 support) and non-UTF-8 Debian on R-hub with no errors.

Copy link
Member

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed investigation and the fix!

@gadenbuie gadenbuie changed the base branch from main to rc-v0.11.2 November 8, 2022 02:37
@gadenbuie gadenbuie merged commit 92e6b11 into rc-v0.11.2 Nov 8, 2022
@gadenbuie gadenbuie deleted the fix/support-non-utf-unix branch November 8, 2022 02:37
gadenbuie pushed a commit that referenced this pull request Nov 8, 2022
* fix: skip non-ASCII tests is system does not support UTF-8

* test(exercise_check_unparsable_unicode): split test into one test with a non-ASCII character string and one test with a non-ASCII variable name

* Non-ASCII character strings should work on all platforms
* Non-ASCII variable names should only work on platforms with UTF-8 support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Maintainers have validated that it is a real bug in the project code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure in strict Latin-1* locale
2 participants