-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #91
Conversation
100cf6f
to
355d560
Compare
I had to amend my commit because |
I don't think using a unkeyed hash like FNV is a reasonable choice here as it opens programs using this library up to denial of service attacks. If better hashing speed is wanted, a keyed hash like |
You are right. Good catch! I did not think of this as an attack surface but it could definitely be exploited. Will switch to ahash ASAP |
|
||
/// The element classes. | ||
pub classes: HashSet<LocalName>, | ||
pub classes: HashSet<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we include #101, it might be preferable to keep this as LocalName
as class names should be highly redundant seen over one or even multiple documents. Lazy initialization would then potentially allow avoiding some of the cost of deduplicating these strings when they are never matched for a given element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry I haven't been responsive as of late - don't really have time to look at this, but if both of you think that the changes are worthwhile then very happy to approve so that we can merge this. |
See #101 |
I have tried to improve parsing performance by replacing the standard hash function with fnv and by replacing LocalNames with strings (in node.rs) as suggested in #45.
@teymour-aldridge are there any other changes we could make to improve parsing speed? Could someone that needs to parse multiple/huge HTMLs perform some benchmarks and report back?