-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding optional Configuration to Machine that can be used to enable a… #125
Conversation
…dditional NameState re-use where each key subsequence has a single NameState
public class Configuration { | ||
|
||
/** | ||
* Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that this would need quite a bit of work before it went in the README. I found it really hard to understand. I suggest that the easiest way to make it comprehensible to mere mortals is to include a few examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's "quite a bit of work", but I added to the readme with an example.
*/ | ||
private final boolean additionalNameStateReuse; | ||
|
||
private Configuration(boolean additionalNameStateReuse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see this being used for configuration beyond Machine? If not, we can be explicit by calling this class MachingConfiguration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that some configuration will not be related directly to Machine, but interface-wise, it will always be plumbed in via the Machine. So the name change is reasonable. Although to be fully correct, it should be GenericMachineConfiguration. I will perform this rename, although personally, I don't think the length and clunkiness is worth the explicitness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it hadn't cross my mind that we'd implement anything other than MachineConfiguration
or Configuration
} | ||
|
||
@Test | ||
public void testApproximateObjectCountEachKeyHasThreePatternsAddedOneAtATimeWithAdditionalNameStateReuse() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a new benchmark test to check for memory usage https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/Benchmarks.java#L452
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok added. Got these results over three runs:
High NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 274.4 (223380)
Per rule: 270502 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 608.6 (2625460)
Per rule: 705709 (3418)
High NameState Reuse Memory Benchmark
Before: 45.7 (1)
After: 209.3 (223380)
Per rule: 213056 (290)
Low NameState Reuse Memory Benchmark
Before: 66.6 (1)
After: 1438.9 (2625460)
Per rule: 1786864 (3418)
High NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 221.9 (223380)
Per rule: 213154 (290)
Low NameState Reuse Memory Benchmark
Before: 58.2 (1)
After: 1718.1 (2625460)
Per rule: 2161282 (3418)
which can cause a modest runtime performance regression. This defaults to false for backwards compatibility, but likely, | ||
all but the most latency sensitive of applications would benefit from setting this to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth setting this true
by default since that's the applicable state for most users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I know of at least one case where a team will be impacted by setting this to true, I erred on making it false by default to make version upgrades safer/easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 . In that case, it makes sense to be vocal bout the feature in the release notes & any upgrade docs we write; otherwise there's a chance that folks may not use this behaviour as often as we'd like them to.
@@ -56,7 +61,13 @@ public class GenericMachine<T> { | |||
*/ | |||
private final SubRuleContext.Generator subRuleContextGenerator = new SubRuleContext.Generator(); | |||
|
|||
public GenericMachine() {} | |||
public GenericMachine() { | |||
this(new GenericMachineConfiguration.Builder().build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see value in accepting the pattern similar to AWS SDK where rather than exposing GenericMachineConfiguation
, we make the publicly avilable builder()
method within this class and let folks use it to build their object. It reduces the need for folks to learn about one more class and let us change our minds for class name between GenericMachineConfiguration
, Configuration
, or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that suggestion in that it hides a class. That said, I'm not too worried about reserving the ability to change the name later. One thing that seems weird to me about the suggested approach is that we would need to still expose the default constructor for backwards compatibility and then also have the builder. Feels cleaner to me to have two constructors - one default and one that accepts configuration. So I'd err on leaving it as-is, but I'm happy to change to your approach if you think it's worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can mark the GenericMachine()
as deprecated to ensure folks are gradually moving to the golden-path. I don't have a good data around it, but this line from the benchmarks tests made me wonder if we'd regret the class name in future and want to change it.
Machine machine = new Machine(
new GenericMachineConfiguration.Builder()
.withAdditionalNameStateReuse(true).build());
If that feels overkill, then let's keep things as is. Otherwise, we can use a builder() method for time-being.
#125) * Adding optional Configuration to Machine that can be used to enable additional NameState re-use where each key subsequence has a single NameState * Updating readme, performing rename, adding benchmark test * Moving Configuration builder into GenericMachine/Machine * Bumping version
…dditional NameState re-use where each key subsequence has a single NameState
Description of changes:
Introducing Configuration for Machines, which can be extended in the future with additional functionality. Right now Configuration offers a single flag. Copying Javadoc for flag as it summarizes the change:
Normally, NameStates are re-used for a given key subsequence and pattern if this key subsequence and pattern have been previously added, or if the current rule has already added a pattern for the given key subsequence. Hence, by default, NameState re-use is opportunistic. But by setting this flag to true, NameState re-use will be forced for a key subsequence. This means that the first pattern being added for a key subsequence for a rule will re-use a NameState if that key subsequence has been added before. Meaning each key subsequence has a single NameState. This improves memory utilization exponentially in some cases but does lead to more sub-rules being stored in individual NameStates, which Ruler sometimes iterates over, which can cause a modest runtime performance regression.
Long previously pushed me in this direction when making memory improvements (#88 (comment)). I made a similar change to this one before (#88 (comment)) but quickly reverted after discovering that some consumers are highly sensitive to the modest runtime increase (#88 (comment)). This is why I am now guarding this improvement behind Configuration. Although I expect the average consumer to benefit from enabling this flag, it will start off as disabled, so as not to break any current consumers who are highly latency sensitive.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.