-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashSet<LocalName> for classes is slow #45
Comments
It would probably also be faster to use a different hasher for the HashSet. Or getting Or using a prefix tree (also called "trie", lookup in |
These (pretty hacky) changes shave about 20% off my program's runtime: diff --git a/Cargo.toml b/Cargo.toml
index 56112ea..221e445 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -23,6 +23,7 @@ matches = "0.1.6"
selectors = "0.22"
smallvec = "1"
tendril = "0.4"
+fnv = "1.0.7"
[dependencies.getopts]
version = "0.2.18"
diff --git a/src/node.rs b/src/node.rs
index 103fba8..bd255a9 100644
--- a/src/node.rs
+++ b/src/node.rs
@@ -1,7 +1,6 @@
//! HTML nodes.
use std::collections::{hash_map, hash_set};
-use std::collections::{HashMap, HashSet};
use std::fmt;
use std::ops::Deref;
@@ -10,6 +9,8 @@ use html5ever::{Attribute, LocalName, QualName};
use selectors::attr::CaseSensitivity;
+use fnv::{FnvHashSet, FnvHashMap};
+
/// An HTML node.
#[derive(Clone, PartialEq, Eq)]
pub enum Node {
@@ -231,13 +232,13 @@ pub struct Element {
pub name: QualName,
/// The element ID.
- pub id: Option<LocalName>,
+ pub id: Option<String>,
/// The element classes.
- pub classes: HashSet<LocalName>,
+ pub classes: FnvHashSet<String>,
/// The element attributes.
- pub attrs: HashMap<QualName, StrTendril>,
+ pub attrs: FnvHashMap<QualName, StrTendril>,
}
impl Element {
@@ -246,16 +247,17 @@ impl Element {
let id = attrs
.iter()
.find(|a| a.name.local.deref() == "id")
- .map(|a| LocalName::from(a.value.deref()));
+ .map(|a| String::from(a.value.deref()));
- let classes: HashSet<LocalName> = attrs
+ let classes: FnvHashSet<String> = attrs
.iter()
.find(|a| a.name.local.deref() == "class")
- .map_or(HashSet::new(), |a| {
+ .map_or(FnvHashSet::default(), |a| {
a.value
.deref()
- .split_whitespace()
- .map(LocalName::from)
+ .split(' ')
+ .filter(|x| !x.is_empty())
+ .map(String::from)
.collect()
});
@@ -308,7 +310,7 @@ impl Element {
#[allow(missing_debug_implementations)]
#[derive(Clone)]
pub struct Classes<'a> {
- inner: hash_set::Iter<'a, LocalName>,
+ inner: hash_set::Iter<'a, String>,
}
impl<'a> Iterator for Classes<'a> { Most of the remaining time is taken by |
Now that I think about it, most programs bottlenecked by parsing will likely be parsing many HTML documents. I wonder if having a thread-local |
Note: You can just write |
I implemented this locally and it was only a very small improvement. Making a non-insignificant allocation for every element is very slow... |
Since this is now |
LocalName::new
called for something that isn't in https://github.com/servo/html5ever/blob/master/markup5ever/local_names.txt locks a global Mutex. AsLocalName::new
is called for every class inElement::new
, and most classes are unlikely to be in that list, this means that multithreading is much less of a win for HTML parsing than it should be. While it's a breaking change and probably not the best approach, I've switched to using Strings locally, which gives me about a 10% performance improvement. My program's performance is still dominated byHashSet
construction, though. It might be faster to intern the entireHashSet<String>
s per-document.The text was updated successfully, but these errors were encountered: