LibJS: Implement the Set Methods proposal #16279

IdanHo · 2022-12-01T21:05:44Z

https://github.com/tc39/proposal-set-methods

davidot

Looks great! Also very nice to immediately have some tests.
Just a couple of questions, with really only the [[Size]] thing being an issue for me

davidot · 2022-12-02T00:51:20Z

Userland/Libraries/LibJS/Runtime/SetPrototype.cpp

+// 8 Set Records, https://tc39.es/proposal-set-methods/#sec-set-records
+struct SetRecord {
+    NonnullGCPtr<Object> set;          // [[Set]]
+    double size { 0 };                 // [[Size]


Is a double the appropriate type here?

Oh I see it might be infinity, hmm that complicates things.

Yeah, I think just using the raw double result of toIntegerOrInfinity is cleaner.

davidot · 2022-12-02T00:53:05Z

Userland/Libraries/LibJS/Runtime/SetPrototype.cpp

+        return vm.throw_completion<TypeError>(ErrorType::IntlNumberIsNaN, "size"sv);
+
+    // 6. Let intSize be ! ToIntegerOrInfinity(numSize).
+    auto integer_size = MUST(number_size.to_integer_or_infinity(vm));


Maybe at least add a VERIFY here to ensure we're in safe integer range (below 2^53) and not negative? That will ensure that [[Size]] does have a a non-negative integer or +∞

I don't think we can know that we're in the safe integer range? to_integer_or_infinity doesn't check for that, and the spec text doesn't talk about it either AFAICT

You're right, I guess it would work outside of safe integer range since we only compare sizes.

Userland/Libraries/LibJS/Runtime/SetPrototype.cpp

linusg

awesome :)

Userland/Libraries/LibJS/Runtime/SetPrototype.cpp

linusg · 2022-12-02T01:33:03Z

Userland/Libraries/LibJS/Runtime/SetPrototype.cpp

+    // 4. NOTE: If rawSize is undefined, then numSize will be NaN.
+    // 5. If numSize is NaN, throw a TypeError exception.
+    if (number_size.is_nan())
+        return vm.throw_completion<TypeError>(ErrorType::IntlNumberIsNaN, "size"sv);


Guess we should drop the Intl prefix (later)

Userland/Libraries/LibJS/Runtime/Set.cpp

Userland/Libraries/LibJS/Runtime/CommonPropertyNames.h

Userland/Libraries/LibJS/Runtime/SetPrototype.cpp

linusg

CI failure is sus, but looks unrelated. tyvm for implementing this so quickly!

bakkot · 2022-12-02T16:42:38Z

Nice!

I'm interested to see the comment about

// FIXME: This is not possible with the current underlying m_values implementation

since the question of whether that step is actually implementable is one for which I definitely need feedback from implementations.

I'm trying to get a sense of whether LibJS has an otherwise-performant implementation of Set which is unable to support that operation, or if it's already known to be less than optimally big-O performant and this is a consequence of that. In other words: the spec assumes that this step would be possible because I had a particular implementation in mind for Set; is it impossible in LibJS because LibJS's implementation is different but equivalently good to the one I had in mind, or does it have less than optimal performance in other ways?

Poking around a little, it looks like Map.prototype.delete (which backs Set.prototype.delete as well) is already linear in the size of the Map, instead of ~constant, so it seems like the existing implementation is already less than optimal, and so the fact that this step cannot be done efficiently falls out of that? But I'd appreciate commentary from someone more familiar with the internals here.

IdanHo · 2022-12-02T20:11:38Z

@bakkot LibJS takes an approach of correctness first, performance later, so it's very possible our implementation is not fully up-to-par w/regards to that yet.
Specifically regarding Map.prototype.delete, you are correct that it is currently O(N), but I asked around and found out that is not intended, so we're definitely looking into that 😄

Regarding the implementation you linked, it's not obvious to me how it handles iterator invalidation, or rather, the lack-there-of. Specifically, an iterator over that implementation would not allow continuing the iteration if the deletion or addition of entry caused a rehash of the map. Handling that specific issue is the reason why the initial LibJS implementation was eventually replaced with a balanced-binary-tree based one.

To be honest I did not think too hard about how easy it would be to fix said FIXME, but it doesn't sound completely impossible to do in O(smaller N). The main issue I'm thinking of at the moment, is the question of how the original ordering of the elements in the larger set could be extracted without iterating over the larger set in the first place, is that readily available somehow in that implementation?

alimpfard · 2022-12-02T20:29:28Z

To further comment on the iterator matter, I don't see any way to achieve O(~const) under the existing constraints:

Live iterators should be capable of continuing once they reach the end
Deleting the current (or next, or both) entry in the map/set while iterating it should be well-behaved
Deleted entries should no longer show up in the iteration sequence

more concretely, I don't think O(const) can be achieved by using a linked-list to keep track of the order (while there are live iterators) as deletion of the current or next element will either explode, or force a re-seek from the first known-available entry - neither of which are O(const).

The best solution I can think of is to add a second hashmap to our existing rbtree/hashmap impl to make deletion O(lgn).

I'd love to know your thoughts on how live iterators should be handled (and whether any of those constraints are actually a misunderstanding on my part - or if I'm missing some obvious fact here).

bakkot · 2022-12-02T21:12:09Z

@IdanHo:

LibJS takes an approach of correctness first, performance later, so it's very possible our implementation is not fully up-to-par w/regards to that yet.

Towards that end, you can get the right ordering semantics (with worse performance) for .intersection by, when the time comes to do the sort, iterating over this and creating a temporary map from key->index, then doing a stable sort of the contents of the result according to lookups in the temporary map (with things missing from the temporary map mapping to infinity). That'll pass tests; the only downside is that it's O(size of receiver) instead of O(size of result).

Regarding the implementation you linked, it's not obvious to me how it handles iterator invalidation, or rather, the lack-there-of.

Yeah, V8's actual implementation, and SpiderMonkey's, both have a lot more details to get full spec compliance, some of which is for handling iterator invalidation.

I don't actually know the exact approaches they take. At a quick glance, V8's involves updating iterators to point to the new table, whereas SpiderMonkey's iterators keep track both of an index within the data table and a count of the actual elements which have been emitted so far and have the underlying map call appropriate methods to keep those values in sync when the map is rehashed or an element is removed.

The main issue I'm thinking of at the moment, is the question of how the original ordering of the elements in the larger set could be extracted without iterating over the larger set in the first place, is that readily available somehow in that implementation?

The thing you actually need is the ability to efficiently determine the relative order of any two items (so you can do a sort). And that's easy to get with the "Deterministic hash tables" implementation I linked (I believe): because the Entry objects are allocated in a linear, insertion-order array, you can do a normal lookup to get the Entry objects corresponding to your items and then compare the locations of those objects in memory.

@alimpfard:

more concretely, I don't think O(const) can be achieved by using a linked-list to keep track of the order

I don't think anyone actually uses a linked list to keep track of order, LibJS included, so I'm not sure of the relevance of this comment.

The best solution I can think of is to add a second hashmap to our existing rbtree/hashmap impl to make deletion O(lgn).

Yeah, I'm afraid I don't see a way short of an additional table given the current implementation. That said my data structure design skills are a bit rusty, so I might well be missing something.

(If you do add an extra table which lets you do key->index lookups so that you can make remove fast, you get the ability to do the .intersection sort step for free.)

alimpfard · 2022-12-03T04:43:28Z

I don't think anyone actually uses a linked list to keep track of order, LibJS included

We used to, with Idan's original implementation that used OrderedHashMap (which gives you O(const) but has an iterator invalidation problem).

bakkot · 2023-03-25T16:46:15Z

Re: the sort step, that's now been dropped, so you can just remove the fixme.

IdanHo added the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 1, 2022

davidot suggested changes Dec 2, 2022

View reviewed changes

davidot added ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author and removed 👀 pr-needs-review PR needs review from a maintainer or community member labels Dec 2, 2022

linusg reviewed Dec 2, 2022

View reviewed changes

IdanHo added 8 commits December 2, 2022 12:21

LibJS: Implement the Set Methods proposal abstract operations

34a9ea6

LibJS: Implement Set.prototype.union

2943eaa

LibJS: Implement Set.prototype.intersection

0b4ca5a

LibJS: Implement Set.prototype.difference

6feb0da

LibJS: Implement Set.prototype.symmetricDifference

6ab2e3b

LibJS: Implement Set.prototype.isSubsetOf

7d14c9e

LibJS: Implement Set.prototype.isSupersetOf

1e42c2f

LibJS: Implement Set.prototype.isDisjointFrom

8e08ef5

IdanHo force-pushed the proposal-set-methods branch from 34b4b13 to 8e08ef5 Compare December 2, 2022 10:22

IdanHo added 👀 pr-needs-review PR needs review from a maintainer or community member and removed ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author labels Dec 2, 2022

davidot approved these changes Dec 2, 2022

View reviewed changes

linusg approved these changes Dec 2, 2022

View reviewed changes

linusg merged commit 2e806da into SerenityOS:master Dec 2, 2022

davidot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 2, 2022

linusg mentioned this pull request Dec 2, 2022

Stage 4 tracking tc39/proposal-set-methods#78

Closed

11 tasks

IdanHo deleted the proposal-set-methods branch December 2, 2022 13:38

davidot mentioned this pull request Dec 2, 2022

LibJS: Reduce flakes?? #16287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LibJS: Implement the Set Methods proposal #16279

LibJS: Implement the Set Methods proposal #16279

IdanHo commented Dec 1, 2022

davidot left a comment

davidot Dec 2, 2022

davidot Dec 2, 2022

IdanHo Dec 2, 2022

davidot Dec 2, 2022

IdanHo Dec 2, 2022

davidot Dec 2, 2022

linusg left a comment

linusg Dec 2, 2022

linusg left a comment

bakkot commented Dec 2, 2022

IdanHo commented Dec 2, 2022 •

edited

Loading

alimpfard commented Dec 2, 2022 •

edited

Loading

bakkot commented Dec 2, 2022 •

edited

Loading

alimpfard commented Dec 3, 2022

bakkot commented Mar 25, 2023

LibJS: Implement the Set Methods proposal #16279

LibJS: Implement the Set Methods proposal #16279

Conversation

IdanHo commented Dec 1, 2022

davidot left a comment

Choose a reason for hiding this comment

davidot Dec 2, 2022

Choose a reason for hiding this comment

davidot Dec 2, 2022

Choose a reason for hiding this comment

IdanHo Dec 2, 2022

Choose a reason for hiding this comment

davidot Dec 2, 2022

Choose a reason for hiding this comment

IdanHo Dec 2, 2022

Choose a reason for hiding this comment

davidot Dec 2, 2022

Choose a reason for hiding this comment

linusg left a comment

Choose a reason for hiding this comment

linusg Dec 2, 2022

Choose a reason for hiding this comment

linusg left a comment

Choose a reason for hiding this comment

bakkot commented Dec 2, 2022

IdanHo commented Dec 2, 2022 • edited Loading

alimpfard commented Dec 2, 2022 • edited Loading

bakkot commented Dec 2, 2022 • edited Loading

alimpfard commented Dec 3, 2022

bakkot commented Mar 25, 2023

IdanHo commented Dec 2, 2022 •

edited

Loading

alimpfard commented Dec 2, 2022 •

edited

Loading

bakkot commented Dec 2, 2022 •

edited

Loading