Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make various perf improvements #805

Merged
merged 7 commits into from
May 18, 2021
Merged

Make various perf improvements #805

merged 7 commits into from
May 18, 2021

Conversation

mtdowling
Copy link
Member

This commit makes various performance improvements that particularly help
when validating models with 100k+ shapes.

  • Fixed JMH benchmarks so they work with the updated Gradle version.
  • Model now uses a synchronized IdentityHashMap now to back the
    blackboard cache rather than a ConcurrentHashMap. This will actually
    now prevent duplicate work in creating KnowledgeIndexes. An
    IdentityHashMap was used because it works well for the classes
    used to cache knowledge indexes.
  • HttpBindingIndex now uses a WeakReference to Model. This was
    previously an unnecessary cyclical reference.
  • HttpBindingIndex no longer throws when attempting to access the
    HTTP bindings of shapes that don't exist or aren't operations.
    This prevents it from having to store a List entry for every
    single operation.
  • Model#getShapesWithTraits was used everywhere possible rather
    than streaming over all shapes and looking for traits.
  • Made optimizations to NullableIndex to no longer need to
    traverse every shape.
  • Removed unnecessary streaming from OperationIndex.
  • Removed unnecessary streaming from PaginatedIndex.
  • Removed unnecessary streaming from NeighborVisitor.
  • Updated Node expect methods to also accept a Supplier to create
    error messages if the expectation fails. This prevents needing to
    evaluate String.format even for valid node assertions.
  • AttributeSelector no longer uses a BiFunction key supplier, and instead
    the attribute path is just passed in. This allows for the selector
    to also perform optimizations when determining if a shape has a trait
    by leveraging Model#getShapesWithTraits.
  • InternalSelectors used to implement Selectors now support more general
    optimizations. This was previously hardcoded to only support an optimization
    for selecting shapes by type, but now selecting shapes by trait is
    optimized too.
  • Minor optimization to structure and union loading so that when validating
    that members have a shape ID compatible with the container, an intermediate
    shape ID no longer is constructed.
  • The ShapeId cache was increased from 1024 to 8192. This helps significantly
    with large models. The ShapeId cache was also updated to implement the
    LRA cache inside of the computeIfAbsent method.
  • NodeValidationVisitor now has a Context object that supports caching
    selectors evaluated against a model. This helps significantly with IdRef
    validation. To make this caching reusable, the visitor is now mutable
    after it is constructed.
  • NodeValidationVisitor idRef now special cases "*" and uses the context
    cache.
  • TraitTargetValidator has been simplified, special cases "*", and now
    uses a cache to speed up evaluating traits that use the same selectors.
  • TraitValueValidator now reuses the same NodeValidationVisitor in order
    to reuse the same selector cache.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@kstich kstich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of bits that look like they add up, nice!

@mtdowling mtdowling changed the base branch from cli-output-improvements to main May 17, 2021 18:59
This commit makes various performance improvements that particularly help
when validating models with 100k+ shapes.

* Fixed JMH benchmarks so they work with the updated Gradle version.
* Model now uses a synchronized IdentityHashMap now to back the
  blackboard cache rather than a ConcurrentHashMap. This will actually
  now prevent duplicate work in creating KnowledgeIndexes. An
  IdentityHashMap was used because it works well for the classes
  used to cache knowledge indexes.
* HttpBindingIndex now uses a WeakReference to Model. This was
  previously an unnecessary cyclical reference.
* HttpBindingIndex no longer throws when attempting to access the
  HTTP bindings of shapes that don't exist or aren't operations.
  This prevents it from having to store a List entry for every
  single operation.
* Model#getShapesWithTraits was used everywhere possible rather
  than streaming over all shapes and looking for traits.
* Made optimizations to NullableIndex to no longer need to
  traverse every shape.
* Removed unnecessary streaming from OperationIndex.
* Removed unnecessary streaming from PaginatedIndex.
* Removed unnecessary streaming from NeighborVisitor.
* Updated Node expect methods to also accept a Supplier to create
  error messages if the expectation fails. This prevents needing to
  evaluate String.format even for valid node assertions.
* AttributeSelector no longer uses a BiFunction key supplier, and instead
  the attribute path is just passed in. This allows for the selector
  to also perform optimizations when determining if a shape has a trait
  by leveraging Model#getShapesWithTraits.
* InternalSelectors used to implement Selectors now support more general
  optimizations. This was previously hardcoded to only support an optimization
  for selecting shapes by type, but now selecting shapes by trait is
  optimized too.
* Minor optimization to structure and union loading so that when validating
  that members have a shape ID compatible with the container, an intermediate
  shape ID no longer is constructed.
* The ShapeId cache was increased from 1024 to 8192. This helps significantly
  with large models. The ShapeId cache was also updated to implement the
  LRA cache inside of the computeIfAbsent method.
* NodeValidationVisitor now has a Context object that supports caching
  selectors evaluated against a model. This helps significantly with IdRef
  validation.  To make this caching reusable, the visitor is now mutable
  after it is constructed.
* NodeValidationVisitor idRef now special cases "*" and uses the context
  cache.
* TraitTargetValidator has been simplified, special cases "*", and now
  uses a cache to speed up evaluating traits that use the same selectors.
* TraitValueValidator now reuses the same NodeValidationVisitor in order
  to reuse the same selector cache.
* The `Model#toSet(Class)` method is now public, so getting a set of
  shapes of a specific type no longer has to always go through a
  `Stream`.
This commit deprecates Trait#flatMapStream since it results in harder to
read code that relies on unnamed tuples. Instead,
Model#getShapesWithTraits, Shape#hasTrait, and Shape#expectTrait should
be used.
Model validation is disabled when loading the prelude because the prelude is
validated during unit tests and the prelude is immutable. However, if the
prelude is broken for whatever reason, ERROR events encountered when
performing model validation that uses the prelude will still cause an error,
meaning the prelude is still validated when actually loading and using other
models.
This commit updates the Selector implementation to run each shape
through the selector in parallel when working with larger models. This
adds around a ~6-10% performance improvement when loading models with 100k+
shapes. I unscientifically chose the threshold to be 10,000 shapes.
Model now contains methods for every shape type to get a Set of shapes
or that type or a set of shapes of a type that have a specific trait.
These methods are simpler to use than `shapes(Class)` and
`toSet(Class)`, and they hopefully encourage the use of the caches that
Model uses. These methods also eliminated a lot of boilerplat that
getShapesWithTrait previously required when only shapes of a certain
type were needed.
@mtdowling mtdowling merged commit be0cc88 into main May 18, 2021
@mtdowling mtdowling deleted the perf branch May 21, 2021 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants