Performance improvements #91

cfvescovo · 2022-10-31T18:33:42Z

I have tried to improve parsing performance by replacing the standard hash function with fnv and by replacing LocalNames with strings (in node.rs) as suggested in #45.
@teymour-aldridge are there any other changes we could make to improve parsing speed? Could someone that needs to parse multiple/huge HTMLs perform some benchmarks and report back?

cfvescovo · 2022-10-31T18:38:09Z

I had to amend my commit because cargo fmt was complaining the imports were not sorted...

adamreichold · 2023-01-10T18:16:05Z

I don't think using a unkeyed hash like FNV is a reasonable choice here as it opens programs using this library up to denial of service attacks. If better hashing speed is wanted, a keyed hash like ahash (default hasher used by hashbrown) would seem more appropriate IMHO.

cfvescovo · 2023-01-10T23:00:44Z

You are right. Good catch! I did not think of this as an attack surface but it could definitely be exploited. Will switch to ahash ASAP

adamreichold · 2023-01-11T13:29:35Z

src/node.rs


    /// The element classes.
-    pub classes: HashSet<LocalName>,
+    pub classes: HashSet<String>,


If we include #101, it might be preferable to keep this as LocalName as class names should be highly redundant seen over one or even multiple documents. Lazy initialization would then potentially allow avoiding some of the cost of deduplicating these strings when they are never matched for a given element.

You are right. Maybe we should consider merging #91 and #101 at the same time after #101 is ready. However, this would require a major update to the crate since #101 causes breaking changes to be made to the API.

teymour-aldridge · 2023-02-15T13:16:56Z

Hi, sorry I haven't been responsive as of late - don't really have time to look at this, but if both of you think that the changes are worthwhile then very happy to approve so that we can merge this.

cfvescovo · 2023-03-03T18:17:49Z

See #101

cfvescovo requested a review from teymour-aldridge October 31, 2022 18:33

Replace LocalNames with Strings, change hash function (see #45)

355d560

cfvescovo force-pushed the improve-perf branch from 100cf6f to 355d560 Compare October 31, 2022 18:36

cfvescovo requested a review from causal-agent November 5, 2022 17:18

Replace fnv with ahash

ae2c97c

adamreichold approved these changes Jan 11, 2023

View reviewed changes

cfvescovo added 2 commits January 11, 2023 10:16

Merge branch 'master' into improve-perf

4c71795

Apply clippy suggestion for bool to i32 type conversion

cbe9a6d

cfvescovo marked this pull request as ready for review January 11, 2023 09:17

cfvescovo removed request for causal-agent and teymour-aldridge January 11, 2023 09:19

cfvescovo self-assigned this Jan 11, 2023

adamreichold mentioned this pull request Jan 11, 2023

RFC: Lazily fetch id and classes #101

Merged

adamreichold reviewed Jan 11, 2023

View reviewed changes

adamreichold mentioned this pull request Mar 1, 2023

Use AHash instead of SipHash for storing attributes and classes. #117

Merged

cfvescovo closed this Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #91

Performance improvements #91

cfvescovo commented Oct 31, 2022 •

edited

Loading

cfvescovo commented Oct 31, 2022

adamreichold commented Jan 10, 2023

cfvescovo commented Jan 10, 2023

adamreichold Jan 11, 2023

cfvescovo Feb 25, 2023

teymour-aldridge commented Feb 15, 2023

cfvescovo commented Mar 3, 2023

Performance improvements #91

Performance improvements #91

Conversation

cfvescovo commented Oct 31, 2022 • edited Loading

cfvescovo commented Oct 31, 2022

adamreichold commented Jan 10, 2023

cfvescovo commented Jan 10, 2023

adamreichold Jan 11, 2023

Choose a reason for hiding this comment

cfvescovo Feb 25, 2023

Choose a reason for hiding this comment

teymour-aldridge commented Feb 15, 2023

cfvescovo commented Mar 3, 2023

cfvescovo commented Oct 31, 2022 •

edited

Loading