Allow reverse to apply to a string #412

miracle2k · 2014-06-14T11:39:18Z

No description provided.

wtlangford · 2014-06-14T23:46:24Z

I like this, though, implementation will be difficult because of Unicode's combining character sequences. Things like "\u0061\u0304" (latin small letter a + combining macron) is technically two codepoints, but to properly reverse the string, those two need to stay in that order in the reversal, while "\u0101" (latin small letter a with macron) looks identical and would be simple to reverse. Anyone have any thoughts on this?

nicowilliams · 2014-06-15T01:11:33Z

My thoughts exactly. This really requires more of a Unicode library. Or
at least the ranges of combining codepoints (though that isn't quite
sufficient).

miracle2k · 2014-06-15T13:38:05Z

Is length handling unicode correctly?

wtlangford · 2014-06-15T16:52:20Z

That depends on your definition of correctly. Most libraries I've used count the number of codepoints, not the number of graphemes, which is what jq's length builtin does. I would say this is correct behavior.

wtlangford · 2014-06-15T18:55:26Z

My previous comment is misleading. I meant to say that jq, like the other libraries I've used (including the Objective-C and Java standard libraries, counts codepoints. So "\u0061\u0304" has a length of 2, while "\u0101" has a length of 1, even though both render as a single grapheme (looks like this: ā).

miracle2k · 2014-06-15T19:55:46Z

Sure. I was trying to make the point that if length doesn't handle graphemes, like most programming languages, it might be ok if reverse doesn't either (also like most programming languages).

nicowilliams · 2014-06-15T21:12:00Z

@wtlangford @miracle2k Most of the time counting codepoints is what you want, and anyways, it's the next cheapest operation after counting bytes. Counting characters is hard enough, and counting graphemes (if you include support for grapheme clusters) is more expensive still.

For string reversal you really want to distinguish characters, not codepoints. IMO anyways.

In the interim you can always do this:

def reverse_orig: reverse;
def reverse: if type == "string" then explode | reverse | implode else reverse_orig end;

and now you can reverse either strings or arrays without further ado. (And since we try to preserve object key order, we could even "reverse" objects, but let's not :)

This approach lets us off the hook for now.

In the longer term we might have a function that knows the combining codepoint ranges and deals with characters.

In the longer longer term we might want a bit of a Unicode library: for normalization, normalization-insensitive string comparison, grapheme cluster detection, grapheme counting, and so on. I'd rather not think about it for now :)

stedolan · 2014-07-31T13:43:36Z

Why does anyone ever want to reverse a string? Reversing a list, sure. Programming assignment to implement string reversing, sure. But in an actual program? It's not even a well-defined operation on a general (unicode) string.

If for some reason someone does want to reverse a string codepoint by codepoint, then converting to a list of codepoints, reversing that, and converting back doesn't seem like too much work.

wtlangford · 2014-07-31T14:03:57Z

Interestingly, I believe this is how most standard libraries do it anyways. Some of them have ways to make sure you're reversing composed character sequences properly (Objective-C's Foundation gives you substrings that represent each composed character sequence). But most just assume you know what you're getting into when you start reversing strings.

stedolan · 2014-07-31T14:14:53Z

The more I think about this, the more I like just providing conversion to and from a list of codepoints and list reversal. Programmers who reverse strings should acknowledge that they're doing something horrible by converting to a list of codepoints, rather than calling a library function that hides their sins :)

wtlangford · 2014-07-31T14:17:21Z

Yeah. Here there be demons.

nicowilliams · 2014-07-31T15:01:54Z

@stedolan We already have those converters: explode and implode.

nicowilliams · 2014-07-31T15:45:43Z

I'm thinking that it should be possible to write jq-coded functions to do this correctly by grouping codepoints that make up characters. It would require checking for all combining codepoint ranges, but that's not so bad. A filter on explode. I think a lot of advanced Unicode support, if we want it, could be jq-coded. (For the new import module facility, I've been thinking it'd be nice to have an import library data option, so that large Unicode tables could be stored as JSON instead of in .jq files.) I'd be much happier with that than with a dependency on some C Unicode library...

nicowilliams added this to the 2.0 release milestone Jun 15, 2014

nicowilliams added the feature request label Jun 16, 2014

itchyny mentioned this issue Jun 25, 2023

Make reverse more DWIM-y #1748

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reverse to apply to a string #412

Allow reverse to apply to a string #412

miracle2k commented Jun 14, 2014

wtlangford commented Jun 14, 2014

nicowilliams commented Jun 15, 2014

miracle2k commented Jun 15, 2014

wtlangford commented Jun 15, 2014

wtlangford commented Jun 15, 2014

miracle2k commented Jun 15, 2014

nicowilliams commented Jun 15, 2014

stedolan commented Jul 31, 2014

wtlangford commented Jul 31, 2014

stedolan commented Jul 31, 2014

wtlangford commented Jul 31, 2014

nicowilliams commented Jul 31, 2014

nicowilliams commented Jul 31, 2014

Allow reverse to apply to a string #412

Allow reverse to apply to a string #412

Comments

miracle2k commented Jun 14, 2014

wtlangford commented Jun 14, 2014

nicowilliams commented Jun 15, 2014

miracle2k commented Jun 15, 2014

wtlangford commented Jun 15, 2014

wtlangford commented Jun 15, 2014

miracle2k commented Jun 15, 2014

nicowilliams commented Jun 15, 2014

stedolan commented Jul 31, 2014

wtlangford commented Jul 31, 2014

stedolan commented Jul 31, 2014

wtlangford commented Jul 31, 2014

nicowilliams commented Jul 31, 2014

nicowilliams commented Jul 31, 2014