RFC: Generalized String Iteration #760

bmcq-0 · 2022-11-25T02:35:40Z

memcorrupt · 2022-11-25T04:14:51Z

This is the ~~worst~~ best PR I've ever seen! LGTM

+1

andyfriesen · 2022-11-28T23:16:03Z

Is this about walking the bytes of a bytestring or the characters of a string? One of the best ways to write bugs that only manifest in non-latin-1 writing systems is to treat each byte as a character.

I'd be more on-board with an explicit string method that has you specify what you want. eg for a, b in some_string:bytes() or for a, b in some_string:chars().

memcorrupt · 2022-11-29T15:43:08Z

Is this about walking the bytes of a bytestring or the characters of a string? One of the best ways to write bugs that only manifest in non-latin-1 writing systems is to treat each byte as a character.

I'd be more on-board with an explicit string method that has you specify what you want. eg for a, b in some_string:bytes() or for a, b in some_string:chars().

Luau doesn't explicitly use utf8 encoding, and provides a utf8 library for interacting with utf8 strings; therefore, I think the some_string:chars() implementation is already provided by utf8.codes. Given the assumption that by default Luau doesn't handle or deal with utf8 characters but just treats them as arbitrary bytes, I don't think it would be harmful to have this implementation walk through each byte of the bytestring.

note: i am speaking based on my deep knowledge of lua5.1 and from what i've read on the luau documentation

ccuser44 · 2022-12-11T22:53:04Z

I'm really not sure if this makes much sense. Also there is no clear cut perfect way to handle strings in lua so it's preferrable to have the developer decide which mode is best for them.

Also string iteration is no where near as common as table iteration, and a feature like this might just confuse new users and prevent them for learning about the string and utf8 libraries

Also the string namecall methods generally should not be relied upon and direct usage of string. & utf8. is preferred in Luau

There are too many ways to interpret a string:

Interpret each UTF8 codepoint number
Interpret each UTF8 string character
Interpret each UTF8 grapheme
Interpret each string byte
Interpret each string character
Interpret an amount of string bytes each time
Interpret an amount of string characters each time
Interpret each UTF8 codepoint number backwards
Interpret each UTF8 string character backwards
Interpret each UTF8 grapheme backwards
Interpret each string byte backwards
Interpret each string character backwards
Interpret an amount of string bytes each time backwards
Interpret an amount of string characters each time backwards
Iterate a string with a pattern match

Hence this shouldn't be added. Also new runtime functional syntax changes should be done with care.

zeux · 2023-10-30T16:55:16Z

This PR is closed as part of RFC migration; please see #1074 (comment) for context.

Additional notes: Strings currently do not support indexing or iteration natively; when the string represents Unicode contents, it can be iterated via utf8.codes or utf8.graphemes (in Roblox). It is not clear to me that we need the "default" way to iterate strings that iterates on bytes, as it feels like codes/graphemes should be more often useful. If there is a desire for idiomatic byte iteration, I could see something like string.bytes() that mirrors the behavior in the others, although that would be less performant than current alternatives. One additional complication with byte-wise iteration (or indexing!) in strings is that there's two ways to represent a character, using its ASCII numeric code, or using a single-character string (see string.sub). Overall it feels like there's just no single correct way to iterate (or index...) a string, and as such we actually should not add generalized iteration or indexing.

RFC: Generalized String Iteration

544aeef

alexmccord added the rfc Language change proposal label Feb 10, 2023

zeux closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Generalized String Iteration #760

RFC: Generalized String Iteration #760

bmcq-0 commented Nov 25, 2022

memcorrupt commented Nov 25, 2022

andyfriesen commented Nov 28, 2022

memcorrupt commented Nov 29, 2022

ccuser44 commented Dec 11, 2022 •

edited

Loading

zeux commented Oct 30, 2023

RFC: Generalized String Iteration #760

RFC: Generalized String Iteration #760

Conversation

bmcq-0 commented Nov 25, 2022

memcorrupt commented Nov 25, 2022

andyfriesen commented Nov 28, 2022

memcorrupt commented Nov 29, 2022

ccuser44 commented Dec 11, 2022 • edited Loading

zeux commented Oct 30, 2023

ccuser44 commented Dec 11, 2022 •

edited

Loading