-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expect regex extracted tokens in database bloom filters #103
expect regex extracted tokens in database bloom filters #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"Null field while creating bloom filter expected <{}>, fpp <{}>, pattern <{}>, search term <{}>", | ||
expected, fpp, pattern, searchTerm | ||
); | ||
throw new RuntimeException("Object field was null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this exception message could be a bit clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the exception messages, added tests for excetpions, removed use of .longValue()
method in constructor which would lead to NPE.
@@ -78,7 +78,7 @@ public IndexStatementCondition(String value, ConditionConfig config, Condition c | |||
|
|||
public Condition condition() { | |||
if (!config.bloomEnabled()) { | |||
LOGGER.debug("Indexstatement reached with bloom disabled"); | |||
LOGGER.warn("Indexstatement reached with bloom disabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be more suitable to be a debug log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is better I lowered to debug
|
||
public TokenizedValue(String value) { | ||
this( | ||
value, | ||
new HashSet<>(new Tokenizer(32).tokenize(new ByteArrayInputStream(value.getBytes(StandardCharsets.UTF_8)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 100% sure if calling tokenizer in constructor is optimal, perhaps it should only be done when tokens are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored that tokenizer is called only when tokens are needed.
Table<?> target = DSL.table(DSL.name("target")); | ||
String searchTerm = "Pattern"; | ||
BloomFilterFromRecord filter = new BloomFilterFromRecord(dynamicRecord, target, searchTerm); | ||
Assertions.assertDoesNotThrow(filter::bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no assertions for bytes() result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed since test was redundant, bytes()
method is tested on the other tests.
RegexExtractedValue regexValue = new RegexExtractedValue(value, regex); | ||
Set<String> tokens = regexValue.tokens(); | ||
Assertions.assertEquals(2, tokens.size()); | ||
Assertions.assertTrue(tokens.contains("(important)") && tokens.contains("(very important)")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate into two assertions for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated into two assertions.
…rify exception messages. Add tests for exceptions.
public byte[] bytes() { | ||
final BloomFilter filter = create(); | ||
final ByteArrayOutputStream filterBAOS = new ByteArrayOutputStream(); | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not try-with-resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to use try-with-resources
src/main/java/com/teragrep/pth_06/planner/bloomfilter/TableFilterTypesFromMetadata.java
Outdated
Show resolved
Hide resolved
* Filter types of a table that can be inserted into the tables category table | ||
*/ | ||
public final class TableFilters { | ||
public final class FilterFromRecordToCategoryTableConsumer implements Consumer<Record> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use object way instead of functional way for producing an iterator meaning for loop instead of a consumer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to use for loop
|
||
public final class RegexExtractedValue { | ||
|
||
private final Matcher matcher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matcher is stateful https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#find-- this objects is therefore mutable and mutability is a no-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced Matcher with Pattern that is stateless and immutable.
} | ||
final BloomFilter filter = BloomFilter.create(expected.longValue(), fpp); | ||
// if no pattern use tokenized value (currently BLOOMDB.FILTERTYPE.PATTERN is NOT NULL) | ||
if (pattern == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object is configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored object to be not configurable
src/main/java/com/teragrep/pth_06/planner/bloomfilter/TokenizedValue.java
Show resolved
Hide resolved
@@ -64,18 +65,6 @@ void testSingleToken() { | |||
Assertions.assertEquals(e, condition.toString()); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing tests is not a good idea. please comment why tests were removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously depending on the input for the PatternMatchCondition(input) the result of condition()
varied depending on the number of tokens generated from input. After change to support regex extracted tokens the input is no longer tokenized, making the removed test a duplicate of the first test.
src/test/java/com/teragrep/pth_06/planner/bloomfilter/TableFiltersTest.java
Show resolved
Hide resolved
…d and make it unconfigurable, make matcher immutable
"Trying to insert empty filter, pattern match joined table should always have tokens" | ||
); | ||
} | ||
final BloomFilter filter = BloomFilter.create(1000, 0.01); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000 and 0.01 why not expected and fpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some request changes. Also some classes are missing equals or hashcode methods, at least CategoryTableImpl and CategoryTableWithFilters.
return filterBAOS.toByteArray(); | ||
} | ||
catch (IOException e) { | ||
throw new UncheckedIOException(new IOException("Error writing filter bytes: " + e.getMessage())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new IOException here, the existing one could be passed to the UncheckedIOException using the other constructor: UncheckedIOException(String message, IOException cause)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to use constructor
DSL.val(record.getValue(BLOOMDB.FILTERTYPE.ID), ULong.class), | ||
DSL.val(filter.bytes(), byte[].class) | ||
}; | ||
ctx.insertInto(categoryTable).columns(insertFields).values(valueFields).execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the docs, I wonder if it's possible to return a Query from this function (probably what the values() function returns before the execute function is called. Currently it feels wrong that the filters are executed here, which makes this kind of a utility class for the CategoryTable. If it would return the Query (in my mind this is the tablefilter itself) then it could be executed in the CategoryTableWithFilters object. Does this make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableFilters now returns a SafeBatch class that wraps a jooq.Batch class, CategoryTableWithFilters will execute the SafeBatch.
...a/com/teragrep/pth_06/planner/walker/conditions/RegexLikeFiltertypePatternConditionTest.java
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/bloomfilter/ConditionMatchBloomDBTables.java
Show resolved
Hide resolved
...est/java/com/teragrep/pth_06/planner/bloomfilter/TableFilterTypesFromMetadataResultTest.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/bloomfilter/SafeBatch.java
Outdated
Show resolved
Hide resolved
…d, remove logcaptor dependency
src/test/java/com/teragrep/pth_06/planner/bloomfilter/TokensAsStringsTest.java
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/bloomfilter/TableFiltersTest.java
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/bloomfilter/TableFiltersTest.java
Outdated
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/bloomfilter/TableFiltersTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
DatabaseTables
interfaceTokenizable
interface and decoratorTokensAsStrings
CategoryTable
interface refactored to only havecreate()
method