Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using std maps in datagen #2152

Merged
merged 4 commits into from
Jul 11, 2022
Merged

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Jul 6, 2022

Currently we're using LiteMap with no clear benefit but several drawbacks:

  • insertion is less performant than HashMap or BTreeMap
  • The API is less powerful
  • Requires Ord where Hash would do

HelloWorldDataProvider can just use a static slice and doesn't have to allocate at all.

@robertbastian robertbastian force-pushed the hashmap branch 2 times, most recently from 455736e to c276f4e Compare July 6, 2022 15:13
Manishearth
Manishearth previously approved these changes Jul 6, 2022
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this. I migrated these from HashMap to LiteMap in #1341, and the reasons for doing it then haven't really changed.

#[allow(clippy::derive_hash_xor_eq)]

// Custom implementation to avoid hashing the path.
impl core::hash::Hash for ResourceKey {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I feel that ResourceKeyHash should be used instead of ResourceKey when a hash is needed, which is why there isn't already an impl Hash for ResourceKey.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ResourceKeyHash doesn't have the path, which I need for databake. This is like saying you should store f(x) is a hash set instead of x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot approve this PR given that my PR #2115 conflicts with the Hash impl. We can't merge both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the conflict exactly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide on what to do with the ResourceKeyMetadata.

  1. Does it affect PartialEq? (then it needs to go into Hash)
  2. Is it ignored for PartialEq? (then match statements get less efficient)
  3. Does it go into the path string itself? (then it gets into Hash and PartialEq automatically)

I don't want to lock ourselves into having a Hash impl if the impl would be inefficient.

tl;dr, let's reach consensus on #2115 and then we can move on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is independent of #2115. Hashing on hash (aka path) only is valid, even if PartialEq compares more fields. Hash is free to produce collisions as long as it produces equal hashes for PartialEq values. Hash collisions don't affect correctness, they're a performance problem, but that doesn't matter here because a) it's only used in datagen and b) we don't have keys that differ only by metadata anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@Manishearth
Copy link
Member

I think the reason was in part to get rid of Ord requirements

@robertbastian
Copy link
Member Author

One of the reasons you mention in #1341 was that we can add methods to LiteMap, but this PR shows that there's nothing that HashMap can't do. In fact HashMap (and BTreeMap) have more powerful APIs, like the entry methods that allow update-or-insert semantics.

The only valid point in #1341 imho is keeping LiteMap from bitrotting, but I don't think it's strong enough to justify jumping through hoops to make it work, because Ord is going away.

@robertbastian robertbastian requested a review from sffc July 8, 2022 06:38
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we decide on how to do metadata in ResourceKey, I'm okay merging this.

@robertbastian robertbastian requested a review from sffc July 10, 2022 17:24
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@robertbastian robertbastian merged commit 4f3c3f2 into unicode-org:main Jul 11, 2022
@robertbastian robertbastian deleted the hashmap branch July 11, 2022 06:13
samchen61661 pushed a commit to samchen61661/icu4x that referenced this pull request Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants