Skip to content

Commit

Permalink
Merge xmlparser into this crate.
Browse files Browse the repository at this point in the history
Closes #111
  • Loading branch information
RazrFalcon committed Nov 15, 2023
1 parent 30b8c81 commit 97737a1
Show file tree
Hide file tree
Showing 14 changed files with 2,453 additions and 295 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,17 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

## [Unreleased]
### Changed
- `xmlparser` is no longer a dependency and its fork is used internally.
- ~5% faster parsing.
- Fallback to `Rc` when `Arc` isn't available.
- Bump MSRV to 1.60
- `Error` variants have changed quite a lot.

### Fixed
- `ParsingOptions::allow_dtd = false` would not trigger an error when an empty DTD was present.

### Removed
- The `xmlparser` dependency.

## [0.18.1] - 2023-09-30
### Added
Expand Down
7 changes: 2 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,11 @@ rust-version = "1.60"

[workspace]
members = ["benches"]
exclude = ["test-apps", "testing-tools"]

[dependencies]
xmlparser = { version = "0.13.6", default-features = false }
exclude = ["testing-tools"]

[features]
default = ["std", "positions"]
std = ["xmlparser/std"]
std = []
# Enables Nodes and Attributes position in the original document preserving.
# Increases memory usage by `usize` for each Node and Attribute.
positions = []
41 changes: 15 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ assert!(elem.has_tag_name("rect"));
Because in some cases all you need is to retrieve some data from an XML document.
And for such cases, we can make a lot of optimizations.

As for *roxmltree*, it's fast not only because it's read-only, but also because
it uses [xmlparser], which is many times faster than [xml-rs].
See the [Performance](#performance) section for details.

## Parsing behavior

Sadly, XML can be parsed in many different ways. *roxmltree* tries to mimic the
Expand All @@ -48,7 +44,7 @@ For more details see [docs/parsing.md](https://github.com/RazrFalcon/roxmltree/b
| Writing | ||||
| No **unsafe** || || |
| Language | Rust | C | Rust | Rust |
| Dependencies | **1** | - | 2 | 2 |
| Dependencies | **0** | - | 2 | 2 |
| Tested version | 0.18.0 | Apple-provided | 0.10.3 | 0.3.2 |
| License | MIT / Apache-2.0 | MIT | MIT | MIT |

Expand Down Expand Up @@ -78,50 +74,50 @@ There is also `elementtree` and `treexml` crates, but they are abandoned for a l

## Performance

### Parsing
Here are some benchmarks comparing `roxmltree` to other XML tree libraries.

```text
test huge_roxmltree ... bench: 3,147,424 ns/iter (+/- 49,153)
test huge_roxmltree ... bench: 2,922,042 ns/iter (+/- 111,661)
test huge_libxml2 ... bench: 6,850,666 ns/iter (+/- 306,180)
test huge_sdx_document ... bench: 9,440,412 ns/iter (+/- 117,106)
test huge_xmltree ... bench: 41,662,316 ns/iter (+/- 850,360)
test large_roxmltree ... bench: 1,594,201 ns/iter (+/- 27,425)
test large_roxmltree ... bench: 1,449,773 ns/iter (+/- 98,596)
test large_libxml2 ... bench: 3,250,606 ns/iter (+/- 140,201)
test large_sdx_document ... bench: 4,242,162 ns/iter (+/- 99,740)
test large_xmltree ... bench: 13,980,228 ns/iter (+/- 229,363)
test medium_roxmltree ... bench: 418,929 ns/iter (+/- 4,843)
test medium_roxmltree ... bench: 401,220 ns/iter (+/- 6,064)
test medium_libxml2 ... bench: 950,984 ns/iter (+/- 34,099)
test medium_sdx_document ... bench: 1,618,270 ns/iter (+/- 23,466)
test medium_xmltree ... bench: 4,315,974 ns/iter (+/- 31,849)
test tiny_roxmltree ... bench: 2,654 ns/iter (+/- 103)
test tiny_roxmltree ... bench: 2,482 ns/iter (+/- 128)
test tiny_libxml2 ... bench: 8,931 ns/iter (+/- 235)
test tiny_sdx_document ... bench: 11,658 ns/iter (+/- 82)
test tiny_xmltree ... bench: 20,215 ns/iter (+/- 303)
```

*roxmltree* uses [xmlparser] internally,
while *sdx-document* uses its own implementation,
*xmltree* uses the [xml-rs].
Here is a comparison between *xmlparser*, *xml-rs* and *quick-xml*:
When comparing to streaming XML parsers `roxmltree` is slightly slower than `quick-xml`,
but still way faster than `xmlrs`.
Note that streaming parsers usually do not provide a proper string unescaping,
DTD resolving and namespaces support.

```text
test huge_xmlparser ... bench: 1,672,879 ns/iter (+/- 20,140)
test huge_quick_xml ... bench: 2,396,037 ns/iter (+/- 39,752)
test huge_quick_xml ... bench: 2,922,042 ns/iter (+/- 111,661)
test huge_roxmltree ... bench: 3,147,424 ns/iter (+/- 49,153)
test huge_xmlrs ... bench: 36,258,312 ns/iter (+/- 180,438)
test large_xmlparser ... bench: 730,787 ns/iter (+/- 22,924)
test large_quick_xml ... bench: 1,250,053 ns/iter (+/- 21,943)
test large_roxmltree ... bench: 1,449,773 ns/iter (+/- 98,596)
test large_xmlrs ... bench: 11,239,516 ns/iter (+/- 76,937)
test medium_quick_xml ... bench: 206,232 ns/iter (+/- 2,157)
test medium_xmlparser ... bench: 240,737 ns/iter (+/- 4,531)
test medium_roxmltree ... bench: 401,220 ns/iter (+/- 6,064)
test medium_xmlrs ... bench: 3,975,916 ns/iter (+/- 44,967)
test tiny_xmlparser ... bench: 1,078 ns/iter (+/- 17)
test tiny_quick_xml ... bench: 2,233 ns/iter (+/- 70)
test tiny_roxmltree ... bench: 2,482 ns/iter (+/- 128)
test tiny_xmlrs ... bench: 17,155 ns/iter (+/- 429)
```

Expand All @@ -135,7 +131,6 @@ You can try running the benchmarks yourself by running `cargo bench` in the `ben

[xml-rs]: https://crates.io/crates/xml-rs
[quick-xml]: https://crates.io/crates/quick-xml
[xmlparser]: https://crates.io/crates/xmlparser
[rust-libxml]: https://github.com/KWARC/rust-libxml

## Memory overhead
Expand All @@ -162,12 +157,6 @@ and at 6.8GB RAM when `positions` is disabled.
- This library must not panic. Any panic should be considered a critical bug and reported.
- This library forbids `unsafe` code.

## Non-goals

- Complete XML support
- Tree modification and writing
- XPath/XQuery

## API

This library uses Rust's idiomatic API based on iterators.
Expand Down
1 change: 0 additions & 1 deletion benches/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ quick-xml = "0.30"
roxmltree = { path = "../" }
sxd-document = "0.3"
xml-rs = "0.8"
xmlparser = "0.13"
xmltree = "0.10"

# Enables support for libxml for benchmarks. libxml requires native dependencies
Expand Down
49 changes: 3 additions & 46 deletions benches/xml.rs
Original file line number Diff line number Diff line change
@@ -1,42 +1,6 @@
use bencher::Bencher;
use bencher::{benchmark_group, benchmark_main};

fn tiny_xmlparser(bencher: &mut Bencher) {
let text = std::fs::read_to_string("fonts.conf").unwrap();
bencher.iter(|| {
for t in xmlparser::Tokenizer::from(text.as_str()) {
let _ = t.unwrap();
}
})
}

fn medium_xmlparser(bencher: &mut Bencher) {
let text = std::fs::read_to_string("medium.svg").unwrap();
bencher.iter(|| {
for t in xmlparser::Tokenizer::from(text.as_str()) {
let _ = t.unwrap();
}
})
}

fn large_xmlparser(bencher: &mut Bencher) {
let text = std::fs::read_to_string("large.plist").unwrap();
bencher.iter(|| {
for t in xmlparser::Tokenizer::from(text.as_str()) {
let _ = t.unwrap();
}
})
}

fn huge_xmlparser(bencher: &mut Bencher) {
let text = std::fs::read_to_string("huge.xml").unwrap();
bencher.iter(|| {
for t in xmlparser::Tokenizer::from(text.as_str()) {
let _ = t.unwrap();
}
})
}

fn tiny_xmlrs(bencher: &mut Bencher) {
let text = std::fs::read_to_string("fonts.conf").unwrap();
bencher.iter(|| {
Expand Down Expand Up @@ -114,7 +78,9 @@ fn huge_quick_xml(bencher: &mut Bencher) {

fn tiny_roxmltree(bencher: &mut Bencher) {
let text = std::fs::read_to_string("fonts.conf").unwrap();
bencher.iter(|| roxmltree::Document::parse(&text).unwrap())
let mut opt = roxmltree::ParsingOptions::default();
opt.allow_dtd = true;
bencher.iter(|| roxmltree::Document::parse_with_options(&text, opt).unwrap())
}

fn medium_roxmltree(bencher: &mut Bencher) {
Expand Down Expand Up @@ -343,13 +309,6 @@ benchmark_group!(
large_sdx_document,
huge_sdx_document,
);
benchmark_group!(
xmlparser,
tiny_xmlparser,
medium_xmlparser,
large_xmlparser,
huge_xmlparser
);
benchmark_group!(xmlrs, tiny_xmlrs, medium_xmlrs, large_xmlrs, huge_xmlrs);
benchmark_group!(
quick_xml,
Expand All @@ -372,7 +331,6 @@ benchmark_main!(
roxmltree,
xmltree,
sdx,
xmlparser,
xmlrs,
quick_xml,
roxmltree_iter,
Expand All @@ -384,7 +342,6 @@ benchmark_main!(
roxmltree,
xmltree,
sdx,
xmlparser,
xmlrs,
quick_xml,
libxml2,
Expand Down
32 changes: 29 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,12 @@ use core::ops::Range;

use alloc::vec::Vec;

pub use xmlparser::TextPos;

mod parse;
mod tokenizer;

#[cfg(test)]
mod tokenizer_tests;

pub use crate::parse::*;

/// The <http://www.w3.org/XML/1998/namespace> URI.
Expand All @@ -47,6 +50,29 @@ pub const NS_XMLNS_URI: &str = "http://www.w3.org/2000/xmlns/";
/// The string 'xmlns', which is used to declare new namespaces
const XMLNS: &str = "xmlns";

/// Position in text.
///
/// Position indicates a row/line and a column in the original text. Starting from 1:1.
#[allow(missing_docs)]
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct TextPos {
pub row: u32,
pub col: u32,
}

impl TextPos {
/// Constructs a new `TextPos`.
pub fn new(row: u32, col: u32) -> TextPos {
TextPos { row, col }
}
}

impl fmt::Display for TextPos {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}:{}", self.row, self.col)
}
}

/// An XML tree container.
///
/// A tree consists of [`Nodes`].
Expand Down Expand Up @@ -173,7 +199,7 @@ impl<'input> Document<'input> {
/// ```
#[inline]
pub fn text_pos_at(&self, pos: usize) -> TextPos {
xmlparser::Stream::from(self.text).gen_text_pos_from(pos)
tokenizer::Stream::new(self.text).gen_text_pos_from(pos)
}

/// Returns the input text of the original document.
Expand Down
Loading

0 comments on commit 97737a1

Please sign in to comment.