Use a better structure instead of `Map` in validators #91

Stranger6667 · 2020-05-24T11:45:12Z

E.g. in AdditionalPropertiesNotEmptyValidator. Because during the validation we use self.properties.contains_key(*property) which will be O(logN) (because of BTreeMap). We can use HashMap to have O(1).

As perf showed, significant time is spent on it

The text was updated successfully, but these errors were encountered:

Stranger6667 · 2020-05-24T12:19:02Z

The first tests showed a significant improvement in compilation but a slowdown in validation. But with FNV hasher it seems faster. Maybe we can avoid contains call at all, e.g. if we will validate over properties

Stranger6667 · 2020-05-24T12:49:35Z

Hm, for a small number of properties a simple Vec will be much faster. Actually I think that it will work for most of the cases, I doubt that there are usually many properties

macisamuele · 2020-05-24T13:09:55Z

@Stranger6667 I would be curious to get to know more how you're actually deriving these conclusions (out of genuine interest).
Could you please post here (or in a gist) the raw data such that I can eventually look at them?
As you've noticed I'm trying to support as much as I can on this project and being able to run similar checks on my side I can avoid to publish PRs for which I know that there is an hit.

macisamuele · 2020-05-24T13:46:48Z

Side note: checking the code specifically I see that we do only access to self.properties keys and never the values.
I think that an "easy" win (at least from memory prospective) might already be to save only the keys and eventually having having a more optimised set representation for small number of entries, maybe like

enum StringSet {
    None,
    Single(String),
    Small(Vec<String>),  // up to 8 values (assuming that micro benching confirms this threshold)
    Large(HashSet<String>),
}

Stranger6667 · 2020-05-24T14:05:34Z

Sure!

Some time ago I tried flame graphs to get a breakdown of what is being executed in the binary. My usual workflow:

I create main.rs with something I'd like to test:

use jsonschema::JSONSchema;
use serde_json::json;

fn main() {
    let schema = json!(...);
    let instance = json!(...);
    let compiled = JSONSchema::compile(&schema, None).unwrap();
    for _ in 0..10000000 {
        compiled.is_valid(&instance);
    }
}

Add debug to the release profile in Cargo.toml

[profile.release]
debug = true

Run perf record to record samples (path to binary is relative to the crate's root)

perf record --call-graph dwarf,16384 -F 999 -g  target/release/jsonschema

Then there will be perf.data file in the current folder

Convert it to a different format for better visualization

perf script | stackcollapse-perf.pl | ./rust-unmangle > output

And rust-unmangle is:

#!/usr/bin/sed -rf
# Unmangle Rust symbols
# See https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=cae15db74999edb96dd9f5bbd4d55849391dd92b
# Example, with [FlameGraph](https://github.com/brendangregg/FlameGraph):
#     perf record -g target/debug/bin
#     perf script | stackcollapse-perf | rust-unmangle | flamegraph > perf.svg

# Remove hash and address offset
s/::h[0-9a-f]{16}//g
s/\+0x[0-9a-f]+//g

# Convert special characters
s/\$C\$/,/g
s/\$SP\$/@/g
s/\$BP\$/*/g
s/\$RF\$/\&/g
s/\$LT\$/</g
s/\$GT\$/>/g
s/\$LP\$/(/g
s/\$RP\$/)/g
s/\$u20\$/ /g
s/\$u27\$/'/g
s/\$u5b\$/[/g
s/\$u5d\$/]/g
s/\$u7b\$/{/g
s/\$u7d\$/}/g
s/\$u7e\$/~/g

# Fix . and _
s/\.\./::/g
s/[^\.]\.[^\.]/./g
s/([;:])_/\1/g

I open this file in https://www.speedscope.app/

And there is a nice representation which is easy to navigate. Basically from that, I found that significant time is spent in contains_key, then it goes to the BTreeMap complexity for this action. Then there are some better options than O(logN) for data access, etc

Could you please post here (or in a gist) the raw data such that I can eventually look at them?

I will do it

As you've noticed I'm trying to support as much as I can on this project and being able to run similar checks on my side I can avoid to publish PRs for which I know that there is an hit.

I want to say that I appreciate your work very much!

a more optimised set representation for small number of entries, maybe like

Yep, it might work! But wouldn't it be more efficient to use a generic here to avoid one indirection level? Not sure how to do it, but if our validator can accept anything that does contains ...

macisamuele · 2020-05-24T21:07:58Z

@Stranger6667 FYI There is a cargo extension that helps you having a more direct access to the flamegraph.
You'll just need to run

cargo flamegraph [--output flamegraph.svg]

I'm considering to create a PR where I add a new directory (perf-helpers) which gives you a simple access to a pre-defined rust-executable which loads at compile time schema and instance from files.
Doing so would easily allow to always run the same command letting you obtaining the results that you're looking for

Stranger6667 · 2020-05-24T21:12:35Z

Thank you for bringing this! Unfortunately, those SVGs were not working properly on my machine (they were not interactive), so I found an alternative. I'll take a deeper look, most probably it is some local misconfiguration / missing package issue

macisamuele · 2020-05-24T21:31:59Z

Unfortunately, those SVGs were not working properly

That's sad ... I hope that you identify a simple tool ;)
The reason for using cargo flamegraph is mostly to have some sort of platform independence (I'm usually working on a MacOS and dtrace is the tool to use there).
But anything that might make streamlined processes would be worth to consider (as we might have PR templates that recommend a set of steps before publishing)

Stranger6667 added Priority: Low Type: Enhancement Topic: Performance labels May 24, 2020

Stranger6667 changed the title ~~Use a simpler structure instead of Map in validators~~ Use a better structure instead of Map in validators May 24, 2020

macisamuele mentioned this issue May 24, 2020

Create perf-helpers subproject to help in running code profiling #93

Merged

Stranger6667 closed this as completed Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a better structure instead of `Map` in validators #91

Use a better structure instead of `Map` in validators #91

Stranger6667 commented May 24, 2020

Stranger6667 commented May 24, 2020 •

edited

Loading

Stranger6667 commented May 24, 2020

macisamuele commented May 24, 2020

macisamuele commented May 24, 2020 •

edited

Loading

Stranger6667 commented May 24, 2020 •

edited

Loading

macisamuele commented May 24, 2020

Stranger6667 commented May 24, 2020

macisamuele commented May 24, 2020

Use a better structure instead of Map in validators #91

Use a better structure instead of Map in validators #91

Comments

Stranger6667 commented May 24, 2020

Stranger6667 commented May 24, 2020 • edited Loading

Stranger6667 commented May 24, 2020

macisamuele commented May 24, 2020

macisamuele commented May 24, 2020 • edited Loading

Stranger6667 commented May 24, 2020 • edited Loading

macisamuele commented May 24, 2020

Stranger6667 commented May 24, 2020

macisamuele commented May 24, 2020

Use a better structure instead of `Map` in validators #91

Use a better structure instead of `Map` in validators #91

Stranger6667 commented May 24, 2020 •

edited

Loading

macisamuele commented May 24, 2020 •

edited

Loading

Stranger6667 commented May 24, 2020 •

edited

Loading