Add messages for unparsable code caused by non-ASCII characters #642

rossellhayes · 2022-01-20T20:47:41Z

This PR adds new messages for cases where students submit unparsable code with non-ASCII characters. This should help students who have pasted from a source that applies automatic Unicode formatting. This code would otherwise be very difficult to debug, as the Unicode characters may be difficult to distinguish from ASCII counterparts.

In cases where students have copied code with Unicode quotation marks or dashes, the prompt will automatically suggest a replacement with straight quotations marks or the ASCII hyphen-minus. In other cases, the message will simply suggest students retype the offending character manually.

Note that the message will not display if students write parsable code that includes non-ASCII characters, such as in character strings or (otherwise valid) variable names.

Examples

Unicode ("curly") quotation marks

Unicode dash

Other special characters

Parsable non-ASCII characters

PR task list:

Update NEWS
Add tests (if possible)
Update documentation with devtools::document()

Closes #639.

…port on Windows)

data-raw/i18n_translations.yml

gadenbuie

This is a great start! I just have a few suggestions for a final round of refactoring and polish.

R/exercise.R

tests/testthat/test-exercise.R

NEWS.md

R/exercise.R

Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>

…e()`

R/exercise.R

Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>

…able_i18n_span()`

data-raw/i18n_translations.yml

R/exercise.R

gadenbuie · 2022-01-24T15:34:36Z

R/exercise.R

+  )
+}
+
+exercise_highlight_unparsable_unicode <- function(code, pattern) {


How does exercise_highlight_unparsable_unicode() handle matches on multiple lines? I think we should only present one line at a time.

Right now it shows all lines. I considered both, but leaned towards including all lines because:

Exercise input is generally only a few lines maximum.

If we show only one line, we might highlight a false positive, and the students won't get to see the later line that caused an error, e.g.

media <- mean(x) deviación <- sd(x) media ± 1.96 * deviación

would highlight the ó in deviación on line 2, missing the error that was actually caused by the ± on line 3.

It's more helpful to include all lines in the suggested replacement, since we'd expect e.g. all the quotes in the response to be curly instead of straight. It feels more symmetrical to include all lines in both the highlighted code and the suggested replacement.

That was my thinking, but I'm happy to change if you think it would be better to include just one line

My thinking was that exercises are frequently long enough that we should try not to put the whole exercise in the lint and suggestion. I've seen many examples where an exercise is a dozen or more lines long. If we end up repeating the input code twice, that starts to get out of hand.

I think we should see if we can use the parse error message to isolate the problematic line and produce more targeted feedback.

Updated in f11247a, the message now looks like this:

This looks great! Two thoughts, not necessarily blockers. We could maybe add in line numbers for the first snippet.

2: ”test”,

We might also want to provide a fix only for the line we show in the first snippet. Even in the screen shot this seems kind of long. We could change

- You can try replacing your input with this code: + You can try fixing the code on that line with the following. There may be other places that need to be fixed, too.

This will help ensure that we don't incorrectly highlight valid non-ASCII characters. If there are extra problematic characters, students either get additional practice fixing the problem or they will get another targeted piece of feedback if they miss a character on another line.

Should we include line numbers even if the student's code is only one line, or only for mutliline input?

I added line numbers in bf28dce (currently lines are numbered even if there is only a single line of code).
Updated the suggestions to only give a replacement for one line in bf28dce.

Yeah, if only to keep the text message consistent. If we can keep the text message the same for 1-line and n-line submissions, then it'd be fine to drop the line number.

Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>

…)` and move much of the logic of `exercise_check_unparsable_unicode()` into it

data-raw/i18n_translations.yml

Clarify from the function signature the purpose of this argument

gadenbuie

Excellent! This will be a very helpful feature for anyone who gets caught off guard by copy-paste shenanigans!

rossellhayes added 2 commits January 20, 2022 12:34

Add str_extract() to utils.R

af97fb6

Add messages for unparsable code caused by non-ASCII characters

deb934a

rossellhayes added the type: enhancement Adds a new, backwards-compatible feature label Jan 20, 2022

rossellhayes requested review from gadenbuie and dcossyleon January 20, 2022 20:47

rossellhayes self-assigned this Jan 20, 2022

rossellhayes added 6 commits January 20, 2022 14:19

Fix typo in i18n keys

aeab0f2

Escape non-ASCII characters in code

c8d223d

Add tests

2bd1378

Update NEWS

8e88e07

Skip test with parsable non-ASCII text on Windows

fd2e251

Remove expection for non-ASCII letter characters (because of poor sup…

f3920ab

…port on Windows)

dcossyleon reviewed Jan 20, 2022

View reviewed changes

data-raw/i18n_translations.yml Outdated Show resolved Hide resolved

Update messages to mention curly quotes

018cb10

rossellhayes marked this pull request as ready for review January 20, 2022 22:55

gadenbuie reviewed Jan 21, 2022

View reviewed changes

R/exercise.R Outdated Show resolved Hide resolved

rossellhayes and others added 10 commits January 21, 2022 10:32

Update NEWS.md

cfb4fc4

Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>

Pass exercise to exercise_check_unparsable_unicode()

4eaa880

Refactor exercise_check_code_is_parsable()

2815de4

Use variables for regex patterns in `exercise_check_unparsable_unicod…

a1e62da

…e()`

Add helper function build_unparsable_i18n_message()

4993584

Add helper function html_code_block()

29c797f

Escape non-ASCII characters in tests

d2cefd4

Remove very high codepoints from quotes regex pattern to avoid error

0e8333d

Fix typo in tests

e97825e

Add (*UTF8) flag to regex to fix error caused by high code points

aeb5fa1

rossellhayes requested a review from gadenbuie January 21, 2022 22:05

gadenbuie reviewed Jan 21, 2022

View reviewed changes

R/exercise.R Outdated Show resolved Hide resolved

Use code points directly in regex to avoid need for perl = TRUE

667ce95

gadenbuie reviewed Jan 21, 2022

View reviewed changes

R/exercise.R Outdated Show resolved Hide resolved

gadenbuie reviewed Jan 21, 2022

View reviewed changes

R/exercise.R Outdated Show resolved Hide resolved

rossellhayes and others added 2 commits January 21, 2022 16:01

Update R/exercise.R

c86c10e

Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>

Make character, lint, and suggestion arguments to `build_unpars…

d105baf

…able_i18n_span()`

gadenbuie reviewed Jan 24, 2022

View reviewed changes

data-raw/i18n_translations.yml Show resolved Hide resolved

gadenbuie reviewed Jan 24, 2022

View reviewed changes

rossellhayes and others added 6 commits January 24, 2022 12:43

Apply suggestions from code review

fffdad0

Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>

Add str_replace_all() with named vector pattern support to utils.R

39b03ed

Rename build_unparsable_i18n_span() to `unparsable_unicode_message(…

a4fbc33

…)` and move much of the logic of `exercise_check_unparsable_unicode()` into it

Add i18n_div()

4b36723

Use a <div> with <p> tags for multi-paragraph i18n messages

e4c15b7

Only highlight one line of code in error message

f11247a

rossellhayes requested a review from gadenbuie January 26, 2022 18:06

rossellhayes added 2 commits January 26, 2022 10:56

Add line number to highlighted unparsable code

bf28dce

Only suggest a replacement for one line of unparsable code

104ab32

gadenbuie reviewed Jan 26, 2022

View reviewed changes

data-raw/i18n_translations.yml Outdated Show resolved Hide resolved

"fixing" -> "replacing"

be3813e

rossellhayes requested a review from gadenbuie January 28, 2022 17:01

gadenbuie added 2 commits January 28, 2022 15:47

Small additional abstraction of i18n_span() into i18n_tag()

41785a4

Rename key -> i18n_key

8258fc9

Clarify from the function signature the purpose of this argument

gadenbuie approved these changes Jan 28, 2022

View reviewed changes

gadenbuie merged commit 73a5f56 into main Jan 28, 2022

gadenbuie deleted the curly-quotes branch January 28, 2022 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add messages for unparsable code caused by non-ASCII characters #642

Add messages for unparsable code caused by non-ASCII characters #642

rossellhayes commented Jan 20, 2022 •

edited

Loading

gadenbuie left a comment

gadenbuie Jan 24, 2022

rossellhayes Jan 24, 2022 •

edited

Loading

gadenbuie Jan 24, 2022 •

edited

Loading

rossellhayes Jan 25, 2022

gadenbuie Jan 26, 2022 •

edited

Loading

rossellhayes Jan 26, 2022

rossellhayes Jan 26, 2022 •

edited

Loading

gadenbuie Jan 26, 2022

gadenbuie left a comment

Add messages for unparsable code caused by non-ASCII characters #642

Add messages for unparsable code caused by non-ASCII characters #642

Conversation

rossellhayes commented Jan 20, 2022 • edited Loading

Examples

Unicode ("curly") quotation marks

Unicode dash

Other special characters

Parsable non-ASCII characters

gadenbuie left a comment

Choose a reason for hiding this comment

gadenbuie Jan 24, 2022

Choose a reason for hiding this comment

rossellhayes Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

gadenbuie Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

rossellhayes Jan 25, 2022

Choose a reason for hiding this comment

gadenbuie Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

rossellhayes Jan 26, 2022

Choose a reason for hiding this comment

rossellhayes Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

gadenbuie Jan 26, 2022

Choose a reason for hiding this comment

gadenbuie left a comment

Choose a reason for hiding this comment

rossellhayes commented Jan 20, 2022 •

edited

Loading

rossellhayes Jan 24, 2022 •

edited

Loading

gadenbuie Jan 24, 2022 •

edited

Loading

gadenbuie Jan 26, 2022 •

edited

Loading

rossellhayes Jan 26, 2022 •

edited

Loading