Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add messages for unparsable code caused by non-ASCII characters #642

Merged
merged 33 commits into from
Jan 28, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
af97fb6
Add `str_extract()` to `utils.R`
rossellhayes Jan 20, 2022
deb934a
Add messages for unparsable code caused by non-ASCII characters
rossellhayes Jan 20, 2022
aeab0f2
Fix typo in i18n keys
rossellhayes Jan 20, 2022
c8d223d
Escape non-ASCII characters in code
rossellhayes Jan 20, 2022
2bd1378
Add tests
rossellhayes Jan 20, 2022
8e88e07
Update NEWS
rossellhayes Jan 20, 2022
fd2e251
Skip test with parsable non-ASCII text on Windows
rossellhayes Jan 20, 2022
f3920ab
Remove expection for non-ASCII letter characters (because of poor sup…
rossellhayes Jan 20, 2022
018cb10
Update messages to mention curly quotes
rossellhayes Jan 20, 2022
cfb4fc4
Update NEWS.md
rossellhayes Jan 21, 2022
4eaa880
Pass `exercise` to `exercise_check_unparsable_unicode()`
rossellhayes Jan 21, 2022
2815de4
Refactor `exercise_check_code_is_parsable()`
rossellhayes Jan 21, 2022
a1e62da
Use variables for regex patterns in `exercise_check_unparsable_unicod…
rossellhayes Jan 21, 2022
4993584
Add helper function `build_unparsable_i18n_message()`
rossellhayes Jan 21, 2022
29c797f
Add helper function `html_code_block()`
rossellhayes Jan 21, 2022
d2cefd4
Escape non-ASCII characters in tests
rossellhayes Jan 21, 2022
0e8333d
Remove very high codepoints from quotes regex pattern to avoid error
rossellhayes Jan 21, 2022
e97825e
Fix typo in tests
rossellhayes Jan 21, 2022
aeb5fa1
Add `(*UTF8)` flag to regex to fix error caused by high code points
rossellhayes Jan 21, 2022
667ce95
Use code points directly in regex to avoid need for `perl = TRUE`
rossellhayes Jan 21, 2022
c86c10e
Update R/exercise.R
rossellhayes Jan 22, 2022
d105baf
Make `character`, `lint`, and `suggestion` arguments to `build_unpars…
rossellhayes Jan 22, 2022
fffdad0
Apply suggestions from code review
rossellhayes Jan 24, 2022
39b03ed
Add `str_replace_all()` with named vector pattern support to `utils.R`
rossellhayes Jan 25, 2022
a4fbc33
Rename `build_unparsable_i18n_span()` to `unparsable_unicode_message(…
rossellhayes Jan 25, 2022
4b36723
Add `i18n_div()`
rossellhayes Jan 25, 2022
e4c15b7
Use a `<div>` with `<p>` tags for multi-paragraph i18n messages
rossellhayes Jan 25, 2022
f11247a
Only highlight one line of code in error message
rossellhayes Jan 25, 2022
bf28dce
Add line number to highlighted unparsable code
rossellhayes Jan 26, 2022
104ab32
Only suggest a replacement for one line of unparsable code
rossellhayes Jan 26, 2022
be3813e
"fixing" -> "replacing"
rossellhayes Jan 26, 2022
41785a4
Small additional abstraction of `i18n_span()` into `i18n_tag()`
gadenbuie Jan 28, 2022
8258fc9
Rename key -> i18n_key
gadenbuie Jan 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@

- Users are now warned if their submission contains blanks they are expected to fill in. The default blank pattern is three or more underscores, e.g. `____`. The pattern for blanks can be set with the `exercise.blanks` chunk or tutorial option (@rossellhayes #547).

- Users are now warned if their submission contains unparsable code caused by non-ASCII characters. This can commonly occur when students copy-and-paste code from a source that applies automatic Unicode formatting to text. If the submission contains Unicode formatted quotation marks (e.g. curly quotes) or dashes, the student is given a suggested replacement with ASCII characters. In other cases, the student is simply prompted to delete the non-ASCII characters and retype them manually. No message is displayed if non-ASCII characters are included in parsable code (e.g. in characters strings or valid variable names) (@rossellhayes #642).
rossellhayes marked this conversation as resolved.
Show resolved Hide resolved

- Authors can choose to reveal (default) or hide the solution to an exercise. Set `exercise.reveal_solution` in the chunk options of a `*-solution` chunk to choose whether or not the solution is revealed to the user. The option can also be set globally with `tutorial_options()`. In a future version of learnr, the default will likely be changed to hide solutions (#402).

- Feedback messages can now be an `htmltools::tag()`, `htmltools::tagList()`, or a character message (#458) (#458)
Expand Down
100 changes: 100 additions & 0 deletions R/exercise.R
Original file line number Diff line number Diff line change
Expand Up @@ -918,6 +918,22 @@ exercise_check_code_is_parsable <- function(exercise) {
}
}

unicode_feedback <- exercise_check_unparsable_unicode(exercise$code)
if (!is.null(unicode_feedback)) {
return(
exercise_result(
list(
message = HTML(unicode_feedback),
correct = FALSE,
location = "append",
type = "error"
),
html_output = error_message_html(error$message),
error_message = error$message
)
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved
)
}

exercise_result(
list(
message = HTML(
Expand All @@ -935,6 +951,90 @@ exercise_check_code_is_parsable <- function(exercise) {
)
}

exercise_check_unparsable_unicode <- function(code) {
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved
# Early exit if code is made up of all ASCII characters
if (!grepl("[^\\x00-\\x7F]", code, perl = TRUE)) {
return(NULL)
}

# Check if code contains Unicode quotation marks
if (grepl("\\p{Pi}|\\p{Pf}", code, perl = TRUE)) {
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved
character <- str_extract(code, "\\p{Pi}|\\p{Pf}", perl = TRUE)
lint <- exercise_highlight_unparsable_unicode(code, "\\p{Pi}|\\p{Pf}")

# Replace curly single quotes with straight single quotes
suggestion <- gsub("[\\x{2018}\\x{2019}]", "'", code, perl = TRUE)
# Replace all other Unicode quotes with straight double quotes
suggestion <- gsub("\\p{Pi}|\\p{Pf}", '"', suggestion, perl = TRUE)
suggestion <- as.character(htmltools::pre(htmltools::code(suggestion)))
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved

return(
i18n_span(
"text.unparsablequotes",
HTML(i18n_translations()$en$translation$text$unparsablequotes),
opts = list(
character = character,
code = lint,
suggestion = suggestion,
interpolation = list(escapeValue = FALSE)
)
)
)
}

# Check if code contains Unicode dashes
if(grepl("[^\\P{Pd}-]", code, perl = TRUE)) {
character <- str_extract(code, "[^\\P{Pd}-]", perl = TRUE)
lint <- exercise_highlight_unparsable_unicode(code, "[^\\P{Pd}-]")

# Replace Unicode dashes with ASCII hyphen-minus
suggestion <- gsub("[^\\P{Pd}-]", "-", code, perl = TRUE)
suggestion <- as.character(htmltools::pre(htmltools::code(suggestion)))

return(
i18n_span(
"text.unparsableunicodesuggestion",
HTML(i18n_translations()$en$translation$text$unparsablequotes),
opts = list(
character = character,
code = lint,
suggestion = suggestion,
interpolation = list(escapeValue = FALSE)
)
)
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved
)
}

# Return simpler message for any other non-ASCII characters
character <- str_extract(code, "[^\\x00-\\x7F]", perl = TRUE)
lint <- exercise_highlight_unparsable_unicode(code, "[^\\x00-\\x7F]")

return(
i18n_span(
"text.unparsableunicode",
HTML(i18n_translations()$en$translation$text$unparsablequotes),
opts = list(
character = character,
code = lint,
interpolation = list(escapeValue = FALSE)
)
)
)
}

exercise_highlight_unparsable_unicode <- function(code, pattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does exercise_highlight_unparsable_unicode() handle matches on multiple lines? I think we should only present one line at a time.

Copy link
Contributor Author

@rossellhayes rossellhayes Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it shows all lines. I considered both, but leaned towards including all lines because:

  1. Exercise input is generally only a few lines maximum.
  2. If we show only one line, we might highlight a false positive, and the students won't get to see the later line that caused an error, e.g.
media <- mean(x)
deviación <- sd(x)
media ± 1.96 * deviación

would highlight the ó in deviación on line 2, missing the error that was actually caused by the ± on line 3.

  1. It's more helpful to include all lines in the suggested replacement, since we'd expect e.g. all the quotes in the response to be curly instead of straight. It feels more symmetrical to include all lines in both the highlighted code and the suggested replacement.

That was my thinking, but I'm happy to change if you think it would be better to include just one line

Copy link
Member

@gadenbuie gadenbuie Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that exercises are frequently long enough that we should try not to put the whole exercise in the lint and suggestion. I've seen many examples where an exercise is a dozen or more lines long. If we end up repeating the input code twice, that starts to get out of hand.

I think we should see if we can use the parse error message to isolate the problematic line and produce more targeted feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in f11247a, the message now looks like this:
Screen Shot 2022-01-25 at 3 35 33 PM

Copy link
Member

@gadenbuie gadenbuie Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Two thoughts, not necessarily blockers. We could maybe add in line numbers for the first snippet.

2:   ”test”,

We might also want to provide a fix only for the line we show in the first snippet. Even in the screen shot this seems kind of long. We could change

- You can try replacing your input with this code:
+ You can try fixing the code on that line with the following. There may be other places that need to be fixed, too.

This will help ensure that we don't incorrectly highlight valid non-ASCII characters. If there are extra problematic characters, students either get additional practice fixing the problem or they will get another targeted piece of feedback if they miss a character on another line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include line numbers even if the student's code is only one line, or only for mutliline input?

Copy link
Contributor Author

@rossellhayes rossellhayes Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added line numbers in bf28dce (currently lines are numbered even if there is only a single line of code).
Updated the suggestions to only give a replacement for one line in bf28dce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if only to keep the text message consistent. If we can keep the text message the same for 1-line and n-line submissions, then it'd be fine to drop the line number.

highlighted_code <- gsub(
pattern = paste0("(", pattern, ")"),
replacement = "<mark>\\1</mark>",
x = code,
perl = TRUE
)

as.character(
htmltools::pre(htmltools::code(htmltools::HTML(highlighted_code)))
)
}

exercise_result_timeout <- function() {
exercise_result_error(
"Error: Your code ran longer than the permitted timelimit for this exercise.",
Expand Down
3 changes: 3 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ str_replace <- function(x, pattern, replacement) {
str_remove <- function(x, pattern) {
str_replace(x, pattern, "")
}
str_extract <- function(x, pattern, ...) {
unlist(regmatches(x, regexpr(pattern, x, ...)))
}

is_tags <- function(x) {
inherits(x, "shiny.tag") ||
Expand Down
56 changes: 56 additions & 0 deletions data-raw/i18n_translations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,62 @@ text:
또는 <code>&quot;</code>, <code>'</code>, <code>(</code>
, <code>{</code>로 시작하는 구문을 닫는 <code>&quot;</code>, <code>'</code>,
<code>)</code>, <code>}</code>을 잊었을 수도 있습니다.
unparsablequotes:
en: >
It looks like your R code contains specially formatted quotation marks
or &quot;curly&quot; quotes (<code>{{character}}</code>)
around character strings, making your code invalid.
R requires character values to be contained in straight quotation
marks (<code>&quot;</code> or <code>'</code>).
{{code}}
Don't worry, this is a common source of errors when you copy code from
another app that applies its own formatting to text.
You can try replacing your input with this code:
{{suggestion}}
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved
fr: ~
es: ~
pt: ~
tr: ~
emo: ~
eu: ~
de: ~
ko: ~
unparsableunicode:
en: >
It looks like your R code contains an unexpected special character
(<code>{{character}}</code>) that makes your code invalid.
{{code}}
Sometimes your code may contain a special character that looks like a
regular character, especially if you copy and paste the code from
another app.
Try deleting the special character from your code and retyping
it manually.
fr: ~
es: ~
pt: ~
tr: ~
emo: ~
eu: ~
de: ~
ko: ~
unparsableunicodesuggestion:
en: >
It looks like your R code contains an unexpected special character
(<code>{{character}}</code>) that makes your code invalid.
{{code}}
Sometimes your code may contain a special character that looks like a
regular character, especially if you copy and paste the code from
another app.
You can try replacing your input with this code:
{{suggestion}}
fr: ~
es: ~
pt: ~
tr: ~
emo: ~
eu: ~
de: ~
ko: ~
and:
en: "and"
fr: "et"
Expand Down
Binary file modified inst/internals/i18n_random_phrases.rds
Binary file not shown.
Binary file modified inst/internals/i18n_translations.rds
Binary file not shown.
50 changes: 50 additions & 0 deletions tests/testthat/test-exercise.R
Original file line number Diff line number Diff line change
Expand Up @@ -1037,6 +1037,56 @@ test_that("Errors with global setup code result in an internal error", {
expect_match(conditionMessage(res$feedback$error), "boom")
})

# Unparsable Unicode ------------------------------------------------------

test_that("evaluate_exercise() returns message for unparsable non-ASCII code", {
ex <- mock_exercise(user_code = 'str_detect(“test”, “t.+t”)')
result <- evaluate_exercise(ex, new.env())
expect_equal(result$feedback, exercise_check_code_is_parsable(ex)$feedback)
expect_match(result$feedback$message, "text.unparsablequotes")
expect_match(
result$feedback$message,
i18n_translations()$en$translation$text$unparsablequotes,
fixed = TRUE
)

ex <- mock_exercise(user_code = 'str_detect(‘test’, ‘t.+t’)')
result <- evaluate_exercise(ex, new.env())
expect_equal(result$feedback, exercise_check_code_is_parsable(ex)$feedback)
expect_match(result$feedback$message, "text.unparsablequotes")
expect_match(
result$feedback$message,
i18n_translations()$en$translation$text$unparsablequotes,
fixed = TRUE
)

ex <- mock_exercise(user_code = '63 – 21')
result <- evaluate_exercise(ex, new.env())
expect_equal(result$feedback, exercise_check_code_is_parsable(ex)$feedback)
expect_match(result$feedback$message, "text.unparsableunicodesuggestion")
expect_match(
result$feedback$message,
i18n_translations()$en$translation$text$unparsablequotes,
fixed = TRUE
)

ex <- mock_exercise(user_code = '63 ± 21')
result <- evaluate_exercise(ex, new.env())
expect_equal(result$feedback, exercise_check_code_is_parsable(ex)$feedback)
expect_match(result$feedback$message, "text.unparsableunicode")
expect_match(
result$feedback$message,
i18n_translations()$en$translation$text$unparsablequotes,
fixed = TRUE
)
})

test_that("evaluate_exercise() does not return a message for parsable non-ASCII code", {
skip_on_os("windows")
ex <- mock_exercise(user_code = 'μεταβλητή <- "What‽"')
gadenbuie marked this conversation as resolved.
Show resolved Hide resolved
result <- evaluate_exercise(ex, new.env())
expect_null(result$feedback)
})

# Timelimit ---------------------------------------------------------------

Expand Down