Make various perf improvements #805

mtdowling · 2021-05-17T06:57:15Z

This commit makes various performance improvements that particularly help
when validating models with 100k+ shapes.

Fixed JMH benchmarks so they work with the updated Gradle version.
Model now uses a synchronized IdentityHashMap now to back the
blackboard cache rather than a ConcurrentHashMap. This will actually
now prevent duplicate work in creating KnowledgeIndexes. An
IdentityHashMap was used because it works well for the classes
used to cache knowledge indexes.
HttpBindingIndex now uses a WeakReference to Model. This was
previously an unnecessary cyclical reference.
HttpBindingIndex no longer throws when attempting to access the
HTTP bindings of shapes that don't exist or aren't operations.
This prevents it from having to store a List entry for every
single operation.
Model#getShapesWithTraits was used everywhere possible rather
than streaming over all shapes and looking for traits.
Made optimizations to NullableIndex to no longer need to
traverse every shape.
Removed unnecessary streaming from OperationIndex.
Removed unnecessary streaming from PaginatedIndex.
Removed unnecessary streaming from NeighborVisitor.
Updated Node expect methods to also accept a Supplier to create
error messages if the expectation fails. This prevents needing to
evaluate String.format even for valid node assertions.
AttributeSelector no longer uses a BiFunction key supplier, and instead
the attribute path is just passed in. This allows for the selector
to also perform optimizations when determining if a shape has a trait
by leveraging Model#getShapesWithTraits.
InternalSelectors used to implement Selectors now support more general
optimizations. This was previously hardcoded to only support an optimization
for selecting shapes by type, but now selecting shapes by trait is
optimized too.
Minor optimization to structure and union loading so that when validating
that members have a shape ID compatible with the container, an intermediate
shape ID no longer is constructed.
The ShapeId cache was increased from 1024 to 8192. This helps significantly
with large models. The ShapeId cache was also updated to implement the
LRA cache inside of the computeIfAbsent method.
NodeValidationVisitor now has a Context object that supports caching
selectors evaluated against a model. This helps significantly with IdRef
validation. To make this caching reusable, the visitor is now mutable
after it is constructed.
NodeValidationVisitor idRef now special cases "*" and uses the context
cache.
TraitTargetValidator has been simplified, special cases "*", and now
uses a cache to speed up evaluating traits that use the same selectors.
TraitValueValidator now reuses the same NodeValidationVisitor in order
to reuse the same selector cache.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

kstich

Lots of bits that look like they add up, nice!

smithy-model/src/main/java/software/amazon/smithy/model/selector/InternalSelector.java

This commit makes various performance improvements that particularly help when validating models with 100k+ shapes. * Fixed JMH benchmarks so they work with the updated Gradle version. * Model now uses a synchronized IdentityHashMap now to back the blackboard cache rather than a ConcurrentHashMap. This will actually now prevent duplicate work in creating KnowledgeIndexes. An IdentityHashMap was used because it works well for the classes used to cache knowledge indexes. * HttpBindingIndex now uses a WeakReference to Model. This was previously an unnecessary cyclical reference. * HttpBindingIndex no longer throws when attempting to access the HTTP bindings of shapes that don't exist or aren't operations. This prevents it from having to store a List entry for every single operation. * Model#getShapesWithTraits was used everywhere possible rather than streaming over all shapes and looking for traits. * Made optimizations to NullableIndex to no longer need to traverse every shape. * Removed unnecessary streaming from OperationIndex. * Removed unnecessary streaming from PaginatedIndex. * Removed unnecessary streaming from NeighborVisitor. * Updated Node expect methods to also accept a Supplier to create error messages if the expectation fails. This prevents needing to evaluate String.format even for valid node assertions. * AttributeSelector no longer uses a BiFunction key supplier, and instead the attribute path is just passed in. This allows for the selector to also perform optimizations when determining if a shape has a trait by leveraging Model#getShapesWithTraits. * InternalSelectors used to implement Selectors now support more general optimizations. This was previously hardcoded to only support an optimization for selecting shapes by type, but now selecting shapes by trait is optimized too. * Minor optimization to structure and union loading so that when validating that members have a shape ID compatible with the container, an intermediate shape ID no longer is constructed. * The ShapeId cache was increased from 1024 to 8192. This helps significantly with large models. The ShapeId cache was also updated to implement the LRA cache inside of the computeIfAbsent method. * NodeValidationVisitor now has a Context object that supports caching selectors evaluated against a model. This helps significantly with IdRef validation. To make this caching reusable, the visitor is now mutable after it is constructed. * NodeValidationVisitor idRef now special cases "*" and uses the context cache. * TraitTargetValidator has been simplified, special cases "*", and now uses a cache to speed up evaluating traits that use the same selectors. * TraitValueValidator now reuses the same NodeValidationVisitor in order to reuse the same selector cache. * The `Model#toSet(Class)` method is now public, so getting a set of shapes of a specific type no longer has to always go through a `Stream`.

This commit deprecates Trait#flatMapStream since it results in harder to read code that relies on unnamed tuples. Instead, Model#getShapesWithTraits, Shape#hasTrait, and Shape#expectTrait should be used.

Model validation is disabled when loading the prelude because the prelude is validated during unit tests and the prelude is immutable. However, if the prelude is broken for whatever reason, ERROR events encountered when performing model validation that uses the prelude will still cause an error, meaning the prelude is still validated when actually loading and using other models.

This commit updates the Selector implementation to run each shape through the selector in parallel when working with larger models. This adds around a ~6-10% performance improvement when loading models with 100k+ shapes. I unscientifically chose the threshold to be 10,000 shapes.

Model now contains methods for every shape type to get a Set of shapes or that type or a set of shapes of a type that have a specific trait. These methods are simpler to use than `shapes(Class)` and `toSet(Class)`, and they hopefully encourage the use of the caches that Model uses. These methods also eliminated a lot of boilerplat that getShapesWithTrait previously required when only shapes of a certain type were needed.

mtdowling requested a review from kstich May 17, 2021 06:57

mtdowling requested review from JordonPhillips and srchase as code owners May 17, 2021 06:57

mtdowling force-pushed the perf branch from d32a9a2 to 4f531ab Compare May 17, 2021 07:03

kstich approved these changes May 17, 2021

View reviewed changes

smithy-model/src/main/java/software/amazon/smithy/model/selector/InternalSelector.java Outdated Show resolved Hide resolved

mtdowling force-pushed the perf branch from 4f531ab to d73eeea Compare May 17, 2021 18:56

mtdowling changed the base branch from cli-output-improvements to main May 17, 2021 18:59

mtdowling force-pushed the perf branch from d73eeea to e56a891 Compare May 17, 2021 18:59

mtdowling added 5 commits May 17, 2021 17:25

Deprecate Trait#flatMapStream

ee58d67

This commit deprecates Trait#flatMapStream since it results in harder to read code that relies on unnamed tuples. Instead, Model#getShapesWithTraits, Shape#hasTrait, and Shape#expectTrait should be used.

Synchronize NeighborProviderIndex caches

1ad6556

Synchronize TraitCache in Model to prevent recreating it

ffb6c0a

kstich approved these changes May 18, 2021

View reviewed changes

mtdowling merged commit be0cc88 into main May 18, 2021

mtdowling deleted the perf branch May 21, 2021 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make various perf improvements #805

Make various perf improvements #805

mtdowling commented May 17, 2021

kstich left a comment

Make various perf improvements #805

Make various perf improvements #805

Conversation

mtdowling commented May 17, 2021

kstich left a comment

Choose a reason for hiding this comment