Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create wrapper object for ScriptExtensions array return value #1536

Closed
echeran opened this issue Jan 24, 2022 · 6 comments · Fixed by #1990
Closed

Create wrapper object for ScriptExtensions array return value #1536

echeran opened this issue Jan 24, 2022 · 6 comments · Fixed by #1990
Assignees
Labels
C-unicode Component: Props, sets, tries good first issue Good for newcomers help wanted Issue needs an assignee S-small Size: One afternoon (small bug fix or enhancement) T-techdebt Type: ICU4X code health and tech debt

Comments

@echeran
Copy link
Contributor

echeran commented Jan 24, 2022

This issue is spun off from a previous discussion.

The question is whether, when returning the Script_Extensions property value for a code point, which is an array of Script values represented as a ZeroSlice<Script>, it would be better to wrap the ZeroSlice<Script> return value in an object that can allow specific extra functionality (ex: helper methods) added to it.

This seems somewhat reminiscent of #1239 which regards returning Unicode property data in terms of interfaces rather than hard-coding concrete data structures.

Pros & cons identified so far:

Pros:

Cons:

  • Would lose the ZeroSlice functionality that we get for free (iterators, getters) unless we add methods to delegate accordingly

@iainireland @sffc

@markusicu
Copy link
Member

Could this simplify the implementation? For example, for a character that has no Script_Extensions, it could copy the Script value into the return value, rather than having to treat the trie data array element as a Script array slice. This then also frees the trie encoding from having to store a non-scx value as a Script with no extra bits. (This is currently the case, but it could prevent future extensions of what is stored in the trie.)

@sffc
Copy link
Member

sffc commented Jan 25, 2022

Yeah, what Markus describes is an example of the second bullet under "pros", which is that we have more control over the implementation if we change it in the future. We're not locking ourselves into having a ZeroSlice for all cases.

@sffc
Copy link
Member

sffc commented Jan 25, 2022

The thing that gives us the ultimate amount of flexibility actually is to return an iterator, like you're working on doing for the other case (script-to-cp-range).

@sffc
Copy link
Member

sffc commented Jan 25, 2022

There are two operations you may want from the return value:

  1. Take the list as a whole and do something with it
  2. Check to see if the list contains a particular value

(1) can be done with an iterator. (2) is more efficient with the ZeroSlice since you can binary search in it.

These two operations could perhaps be separate methods on ScriptExtensions. But, they could also be a single method, returning a struct that implements IntoIterator and also has a .contains() function. I think I lean toward separate methods. We'd keep the current method, make it private, and call it in each of these two convenience functions.

@sffc
Copy link
Member

sffc commented Feb 5, 2022

The more I think about this the more I think we should probably go ahead and wrap the ZeroSlice.

This issue has no triage labels. @echeran Are you happy with this conclusion, or do you want more input?

@sapriyag sapriyag added the C-unicode Component: Props, sets, tries label Apr 1, 2022
@sapriyag sapriyag added the S-medium Size: Less than a week (larger bug fix or enhancement) label Apr 14, 2022
@echeran echeran added the T-techdebt Type: ICU4X code health and tech debt label Apr 14, 2022
@sffc sffc added S-small Size: One afternoon (small bug fix or enhancement) good first issue Good for newcomers help wanted Issue needs an assignee and removed S-medium Size: Less than a week (larger bug fix or enhancement) labels Apr 28, 2022
@sffc sffc added this to the ICU4X 1.0 milestone Apr 28, 2022
@sffc
Copy link
Member

sffc commented May 3, 2022

@snktd may be interested in this.

@snktd snktd self-assigned this Jun 5, 2022
snktd added a commit to snktd/icu4x that referenced this issue Jun 5, 2022
Manishearth pushed a commit that referenced this issue Jun 7, 2022
…1990)

* Create wrapper object for ScriptExtensions array return value #1536

* #1536 Create wrapper object for ScriptExtensions array return value

* Resolving a minor issue with code comments.

* Update components/properties/src/script.rs

Co-authored-by: Shane F. Carr <shane@unicode.org>

* Addressing review comments.

* Making ScriptExtensionsSet.values private

* Addressing a minor review comment.

Co-authored-by: Shane F. Carr <shane@unicode.org>
yzhang1994 pushed a commit to yzhang1994/icu4x that referenced this issue Jun 8, 2022
…-org#1536 (unicode-org#1990)

* Create wrapper object for ScriptExtensions array return value unicode-org#1536

* unicode-org#1536 Create wrapper object for ScriptExtensions array return value

* Resolving a minor issue with code comments.

* Update components/properties/src/script.rs

Co-authored-by: Shane F. Carr <shane@unicode.org>

* Addressing review comments.

* Making ScriptExtensionsSet.values private

* Addressing a minor review comment.

Co-authored-by: Shane F. Carr <shane@unicode.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unicode Component: Props, sets, tries good first issue Good for newcomers help wanted Issue needs an assignee S-small Size: One afternoon (small bug fix or enhancement) T-techdebt Type: ICU4X code health and tech debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants