Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern acceleration support #65

Closed
wants to merge 13 commits into from

Conversation

elliVM
Copy link
Contributor

@elliVM elliVM commented Aug 12, 2024

Pattern acceleration

  • Replace fixed number of bloom filter tables with the support for dynamic number of tables
  • Get bloom filter tables from SQL metadata
  • Check if regex patterns stored in each bloom filter table matches archive query search term tokenized values
  • Left join only those bloom filter tables that had pattern match
  • Filter query using bloommatch function for each logfile or check if filter is null on all joined tables
  • Some refactoring on com.teragrep.pth_06.planner.walker and tests

@elliVM elliVM requested a review from eemhu August 12, 2024 10:28
database/bloomdb.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@eemhu eemhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the objects are not immutable, is it fixable or perhaps out of the scope of this PR?

@elliVM elliVM requested a review from eemhu August 15, 2024 07:40
@elliVM
Copy link
Contributor Author

elliVM commented Aug 16, 2024

All new classes now have only final values and public methods always return the same value after instantiation (weakly immutable). I included some refactoring of old classes but I think the project would require a larger refactoring to be more object-oriented. Made project issues regarding refactoring

eemhu
eemhu previously approved these changes Aug 16, 2024
Copy link
Contributor

@eemhu eemhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

@ronja-ui ronja-ui added the review Issues or pull requests waiting for a review label Aug 21, 2024
import static com.teragrep.pth_06.jooq.generated.bloomdb.Bloomdb.BLOOMDB;
import static org.jooq.impl.SQLDataType.*;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write detailed javadoc about the logic behind the table structure and the query pattern. It is not very obvious to decode the schema and the query from the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

39adb13 Added better javadoc for BloomFilterTempTable in an effort to clear up the logic.

.leftJoin(BLOOMDB.FILTERTYPE)
.on(BLOOMDB.FILTERTYPE.ID.eq((Field<ULong>) t.field("filter_type_id")))
.where(finalPatternCondition)
.limit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why limit 1 is present? this limits the search token to make only one of the possible bloomfilters as a candidate instead of having all as possible candidates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment the code to indicate the purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tables in bloomdb are checked in turn by the filterTables() function, limit is just for the single table to avoid loading all of it's rows to memory, since it makes the filter if it has even just 1 matching row.

I will add the comment to clear the purpose

@eemhu eemhu dismissed their stale review August 23, 2024 12:06

Not knowledgeable enough about this functionality to give explicit approval

.leftJoin(BLOOMDB.FILTERTYPE)
.on(BLOOMDB.FILTERTYPE.ID.eq((Field<ULong>) t.field("filter_type_id")))
.where(finalPatternCondition)
.limit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment the code to indicate the purpose

private final ConditionConfig config;
private final Set<Table<?>> tableSet;
private final long bloomTermId;
private final List<Condition> conditionCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a cache is being used?

private final Table<Record> tableName;
private final long bloomTermId;
private final Set<Token> tokenSet;
private final List<Condition> cache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a cache is being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to limit the SQL operations if the Condition is already generated. Should I refactor them to work without caches?

}

@Test
void fromStringIntendTest() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing existing test will not do, please revert the changes.

@ronja-ui ronja-ui removed the review Issues or pull requests waiting for a review label Aug 26, 2024
@elliVM
Copy link
Contributor Author

elliVM commented Sep 2, 2024

Refactoring of old code split into another PR

@elliVM
Copy link
Contributor Author

elliVM commented Sep 9, 2024

Moving PR to another branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants