Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Generalized String Iteration #760

Closed
wants to merge 1 commit into from
Closed

Conversation

bmcq-0
Copy link
Contributor

@bmcq-0 bmcq-0 commented Nov 25, 2022

@memcorrupt
Copy link

This is the worst best PR I've ever seen! LGTM

+1

@andyfriesen
Copy link
Collaborator

Is this about walking the bytes of a bytestring or the characters of a string? One of the best ways to write bugs that only manifest in non-latin-1 writing systems is to treat each byte as a character.

I'd be more on-board with an explicit string method that has you specify what you want. eg for a, b in some_string:bytes() or for a, b in some_string:chars().

@memcorrupt
Copy link

Is this about walking the bytes of a bytestring or the characters of a string? One of the best ways to write bugs that only manifest in non-latin-1 writing systems is to treat each byte as a character.

I'd be more on-board with an explicit string method that has you specify what you want. eg for a, b in some_string:bytes() or for a, b in some_string:chars().

Luau doesn't explicitly use utf8 encoding, and provides a utf8 library for interacting with utf8 strings; therefore, I think the some_string:chars() implementation is already provided by utf8.codes. Given the assumption that by default Luau doesn't handle or deal with utf8 characters but just treats them as arbitrary bytes, I don't think it would be harmful to have this implementation walk through each byte of the bytestring.

note: i am speaking based on my deep knowledge of lua5.1 and from what i've read on the luau documentation

@ccuser44
Copy link
Contributor

ccuser44 commented Dec 11, 2022

I'm really not sure if this makes much sense. Also there is no clear cut perfect way to handle strings in lua so it's preferrable to have the developer decide which mode is best for them.

Also string iteration is no where near as common as table iteration, and a feature like this might just confuse new users and prevent them for learning about the string and utf8 libraries

Also the string namecall methods generally should not be relied upon and direct usage of string. & utf8. is preferred in Luau

There are too many ways to interpret a string:

  • Interpret each UTF8 codepoint number
  • Interpret each UTF8 string character
  • Interpret each UTF8 grapheme
  • Interpret each string byte
  • Interpret each string character
  • Interpret an amount of string bytes each time
  • Interpret an amount of string characters each time
  • Interpret each UTF8 codepoint number backwards
  • Interpret each UTF8 string character backwards
  • Interpret each UTF8 grapheme backwards
  • Interpret each string byte backwards
  • Interpret each string character backwards
  • Interpret an amount of string bytes each time backwards
  • Interpret an amount of string characters each time backwards
  • Iterate a string with a pattern match

Hence this shouldn't be added. Also new runtime functional syntax changes should be done with care.

@alexmccord alexmccord added the rfc Language change proposal label Feb 10, 2023
@zeux
Copy link
Collaborator

zeux commented Oct 30, 2023

This PR is closed as part of RFC migration; please see #1074 (comment) for context.

Additional notes: Strings currently do not support indexing or iteration natively; when the string represents Unicode contents, it can be iterated via utf8.codes or utf8.graphemes (in Roblox). It is not clear to me that we need the "default" way to iterate strings that iterates on bytes, as it feels like codes/graphemes should be more often useful. If there is a desire for idiomatic byte iteration, I could see something like string.bytes() that mirrors the behavior in the others, although that would be less performant than current alternatives. One additional complication with byte-wise iteration (or indexing!) in strings is that there's two ways to represent a character, using its ASCII numeric code, or using a single-character string (see string.sub). Overall it feels like there's just no single correct way to iterate (or index...) a string, and as such we actually should not add generalized iteration or indexing.

@zeux zeux closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Language change proposal
Development

Successfully merging this pull request may close these issues.

6 participants