Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't perform unnecessary work when diffing maps #45

Merged
merged 2 commits into from
Jun 9, 2023

Conversation

latacora-paul
Copy link
Contributor

I was seeing some pretty bad performance when performing a large diff and tracked down the primary cause. I'm certain there are other improvements that can be made but I'll keep the change small for now :). This change runs ~15x faster for my use case.

Math identity illustrating why this change is okay:

(A ∩ B) ∪ (B - A) = B

Before:
Screenshot 2023-06-05 at 11 10 20 AM

After:
Screenshot 2023-06-05 at 11 23 14 AM

@@ -108,20 +108,10 @@
(map ->Insertion)
(remove #(contains? exp %) act)))

(let [exp {false 0, 0 0}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a stray top level form that probably should've been in a comment block. I just deleted it because it was a test of the impl below which is changing here and so it's no longer applicable.

@alysbrooks
Copy link
Member

Thanks for digging in and profiling!

I didn't intuitively understand the identity (perhaps due to unfamiliarity with the notation) until I thought about it more.

It's basically saying the intersection (i.e. what's in A and B):

Venn diagram with only the overlap of two circles highlighted

Combined with the difference (i.e., what's only in B):

Venn diagram showing only the right circle and not the overlap highlighted

is just B:

Venn diagram showing the entire right circle highlighted, including the part that overlaps

The code looks good and it passes our tests (including property-based tests), so this seems like a solid change. I have a nagging worry there's some implementation detail that would make these two not equivalent.

@latacora-paul
Copy link
Contributor Author

latacora-paul commented Jun 5, 2023

I also had some uneasiness (why was it like that in the first place if it could be so simple?).

However, as far as I can tell the order of act-ks doesn't matter since it's diffing maps that make no guarantees about entry order anyways. The passing tests gave me increased confidence in the safety of the change. @alysbrooks if there's something additional you think needs to be done just let me know :)

@plexus
Copy link
Member

plexus commented Jun 8, 2023

I think enough due diligence has been done here. @latacora-paul could you still add a CHANGELOG entry? then this is ready to go out.

@latacora-paul
Copy link
Contributor Author

could you still add a CHANGELOG entry?

done!

@alysbrooks
Copy link
Member

I will note that I didn't notice tests running any faster after this change, but there's a lot of noise on our tests and I don't think they involve many large maps. I did a quick test using criterium on my computer on a 100-element hash-map, and it was several times faster and somewhat faster for a small hash-map.

@alysbrooks alysbrooks merged commit 61c1a76 into lambdaisland:main Jun 9, 2023
@latacora-paul latacora-paul deleted the perf-improvements branch June 12, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants