reverse-string: multi-byte character strings #1175

ferhatelmas · 2018-02-08T00:59:06Z

In Go track, we are asked for them go#1068 and we were adding some tests on top of canonical data then thought suggesting upstream since it might be used by other tracks.

It's very easy to implement in Go but it can create problems in other tracks such as C. However, these tracks can set much higher difficulty to the exercise.

So, do you think if multi-byte character strings belong canonical data ?

coriolinus · 2018-02-08T02:34:57Z

This has been discussed: #428. The consensus at the time was that in general, multi-byte character strings shouldn't be canonical: there are too many languages in which they're hard, and too many exercises for which they're irrelevant.

On the other hand, in exercism/rust#429, we're just adding our own multi-byte test to the reverse-string exercise, because they're not particularly tedious in Rust and there does seem to be direct applicability to the exercise.

Part of the issue is that Unicode is fantastically complicated; combining characters make iteration over grapheme clusters a seriously non-trivial task. We just didn't use any combining characters in the Rust iteration, only wide ones, though we add a hint that interested students might try to add their own test reversing a combining-character string: uüu has 5 bytes, 4 chars, 3 graphemes; if you reverse over anything but grapheme clusters, you get mangled output.

Making a note that such a test string is available to interested students, while keeping the included tests simple in that reversing over unicode chars is sufficient, felt good to the Rust track maintainers, so that's what is getting merged. However, different languages will have different outlooks on the relative difficulty of these operations. I have no idea how to begin this exercise in unicode in, say, bash.

I believe that we might add some text notes with suggestions for strings for interested track maintainers, but unicode strings should not become part of the exercism-wide canonical data.

petertseng · 2018-02-08T20:20:27Z

So I wonder if there is any desire to make reverse-string satisfy the request of #455 . Neither the name reverse-string nor https://github.com/exercism/problem-specifications/blob/master/exercises/reverse-string/description.md imply that that was the original intent. Our intention may change. However it probably wouldn't be the only one since there are other operations to pay attention to such as upper/lowercase.

coriolinus · 2018-02-08T20:58:20Z

It's not clear to me what the actionable item in #455 is, or what precisely it would mean to make reverse-string satisfy the request. Perhaps I'm misreading something.

I would hazard a guess that for many tracks, but not all, reverse-string is an exercise for which wide character tests are a natural fit. If that guess is correct, then it can't hurt to have a few examples of wide character tests which have already been implemented; this thread already links to two. It also implies that we shouldn't add tests to the canonical data.

If all this is correct, then no further action is required; this issue simply becomes the canonical resource for questions of the form "wouldn't wide characters be a natural fit for reverse-string in track X?" If the track maintainers think so, then yes; we've already linked to some examples of how it's been done before.

Let me pose a question: does anyone feel that further action is warranted here? If not, I'd suggest simply closing the issue and retaining it for reference when someone asks the question in the future.

rpottsoh · 2018-02-08T21:21:08Z

#428 appears to be flagged as policy.... So I think this issue should be closed.

petertseng · 2018-02-09T14:37:02Z

All decisions made up to this point (and the implication by name and description that reverse-string isn't one of the exercises of #455) mean that adding such strings can't be justified.

This would lead me to posit that binary nature of a test's presence in the canonical data is not suiting us well and something similar to feature flags would better allow sharing of tests that only a subset of tracks want. I remind myself to write the issue about that. No need to wait for that issue to be written before closing this one; the reminder system is independent of this issue's state.

petertseng · 2018-02-09T15:28:30Z

something similar to feature flags would better allow sharing of tests that only a subset of tracks want. I remind myself to write the issue about that.

I started to write it and decided to abort on it.

In this specific case, I consider the act of acting the multi-byte cases to individual tracks' reverse-string test suites to only be a stopgap measure until the exercise mention in #455 is made; after it is made, I would no longer see the point in adding these cases, so I no longer have any motivation to suggest they be present in canonical-data.json in any form, hidden behind a feature flag or not.

Lacking any other use case for feature flags immediately at hand, there is now no point for me to expend effort to write any proposal on it. Feel free to take the idea if there is any use case that pops up in the future, of course.

rpottsoh · 2018-11-11T21:23:45Z

Closing because of #428. This issue may be reopened or a new one created if it is decided later on to incorporate multi-byte character based cases to this exercise.

ferhatelmas mentioned this issue Feb 8, 2018

reverse-string: add UTF-8 multibyte test case exercism/go#1069

Merged

rpottsoh closed this as completed Nov 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverse-string: multi-byte character strings #1175

reverse-string: multi-byte character strings #1175

ferhatelmas commented Feb 8, 2018

coriolinus commented Feb 8, 2018 •

edited

Loading

petertseng commented Feb 8, 2018 •

edited

Loading

coriolinus commented Feb 8, 2018

rpottsoh commented Feb 8, 2018

petertseng commented Feb 9, 2018 •

edited

Loading

petertseng commented Feb 9, 2018

rpottsoh commented Nov 11, 2018

reverse-string: multi-byte character strings #1175

reverse-string: multi-byte character strings #1175

Comments

ferhatelmas commented Feb 8, 2018

coriolinus commented Feb 8, 2018 • edited Loading

petertseng commented Feb 8, 2018 • edited Loading

coriolinus commented Feb 8, 2018

rpottsoh commented Feb 8, 2018

petertseng commented Feb 9, 2018 • edited Loading

petertseng commented Feb 9, 2018

rpottsoh commented Nov 11, 2018

coriolinus commented Feb 8, 2018 •

edited

Loading

petertseng commented Feb 8, 2018 •

edited

Loading

petertseng commented Feb 9, 2018 •

edited

Loading