Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add messages for unparsable code caused by non-ASCII characters #642

Merged
merged 33 commits into from
Jan 28, 2022

Conversation

rossellhayes
Copy link
Contributor

@rossellhayes rossellhayes commented Jan 20, 2022

This PR adds new messages for cases where students submit unparsable code with non-ASCII characters. This should help students who have pasted from a source that applies automatic Unicode formatting. This code would otherwise be very difficult to debug, as the Unicode characters may be difficult to distinguish from ASCII counterparts.

In cases where students have copied code with Unicode quotation marks or dashes, the prompt will automatically suggest a replacement with straight quotations marks or the ASCII hyphen-minus. In other cases, the message will simply suggest students retype the offending character manually.

Note that the message will not display if students write parsable code that includes non-ASCII characters, such as in character strings or (otherwise valid) variable names.

Examples

Unicode ("curly") quotation marks

Screen Shot 2022-01-20 at 2 52 46 PM

Unicode dash

Screen Shot 2022-01-20 at 12 33 00 PM

Other special characters

Screen Shot 2022-01-20 at 12 33 37 PM

Parsable non-ASCII characters

Screen Shot 2022-01-20 at 12 46 46 PM

PR task list:

  • Update NEWS
  • Add tests (if possible)
  • Update documentation with devtools::document()

Closes #639.

@rossellhayes rossellhayes added the type: enhancement Adds a new, backwards-compatible feature label Jan 20, 2022
@rossellhayes rossellhayes self-assigned this Jan 20, 2022
@rossellhayes rossellhayes marked this pull request as ready for review January 20, 2022 22:55
Copy link
Member

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start! I just have a few suggestions for a final round of refactoring and polish.

R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
tests/testthat/test-exercise.R Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
rossellhayes and others added 2 commits January 21, 2022 16:01
Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated Show resolved Hide resolved
R/exercise.R Outdated
)
}

exercise_highlight_unparsable_unicode <- function(code, pattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does exercise_highlight_unparsable_unicode() handle matches on multiple lines? I think we should only present one line at a time.

Copy link
Contributor Author

@rossellhayes rossellhayes Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it shows all lines. I considered both, but leaned towards including all lines because:

  1. Exercise input is generally only a few lines maximum.
  2. If we show only one line, we might highlight a false positive, and the students won't get to see the later line that caused an error, e.g.
media <- mean(x)
deviación <- sd(x)
media ± 1.96 * deviación

would highlight the ó in deviación on line 2, missing the error that was actually caused by the ± on line 3.

  1. It's more helpful to include all lines in the suggested replacement, since we'd expect e.g. all the quotes in the response to be curly instead of straight. It feels more symmetrical to include all lines in both the highlighted code and the suggested replacement.

That was my thinking, but I'm happy to change if you think it would be better to include just one line

Copy link
Member

@gadenbuie gadenbuie Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that exercises are frequently long enough that we should try not to put the whole exercise in the lint and suggestion. I've seen many examples where an exercise is a dozen or more lines long. If we end up repeating the input code twice, that starts to get out of hand.

I think we should see if we can use the parse error message to isolate the problematic line and produce more targeted feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in f11247a, the message now looks like this:
Screen Shot 2022-01-25 at 3 35 33 PM

Copy link
Member

@gadenbuie gadenbuie Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Two thoughts, not necessarily blockers. We could maybe add in line numbers for the first snippet.

2:   ”test”,

We might also want to provide a fix only for the line we show in the first snippet. Even in the screen shot this seems kind of long. We could change

- You can try replacing your input with this code:
+ You can try fixing the code on that line with the following. There may be other places that need to be fixed, too.

This will help ensure that we don't incorrectly highlight valid non-ASCII characters. If there are extra problematic characters, students either get additional practice fixing the problem or they will get another targeted piece of feedback if they miss a character on another line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include line numbers even if the student's code is only one line, or only for mutliline input?

Copy link
Contributor Author

@rossellhayes rossellhayes Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added line numbers in bf28dce (currently lines are numbered even if there is only a single line of code).
Updated the suggestions to only give a replacement for one line in bf28dce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if only to keep the text message consistent. If we can keep the text message the same for 1-line and n-line submissions, then it'd be fine to drop the line number.

Clarify from the function signature the purpose of this argument
Copy link
Member

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! This will be a very helpful feature for anyone who gets caught off guard by copy-paste shenanigans!

@gadenbuie gadenbuie merged commit 73a5f56 into main Jan 28, 2022
@gadenbuie gadenbuie deleted the curly-quotes branch January 28, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Adds a new, backwards-compatible feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add special unparseable message for curly quotes (and other Unicode characters?)
3 participants