Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kafka Streams state stores to cache scanner results #382

Closed
nscuro opened this issue Mar 11, 2023 · 1 comment
Closed

Use Kafka Streams state stores to cache scanner results #382

nscuro opened this issue Mar 11, 2023 · 1 comment
Labels
domain/vuln-analysis enhancement New feature or request p3 Nice-to-have features size/M Medium effort

Comments

@nscuro
Copy link
Member

nscuro commented Mar 11, 2023

We currently cache scanner results in-memory using quarkus-cache. It is simple and "just works™️".

The downside is that cached results will be lost when the vulnerability-analyzer instance is restarted, or a rebalance is happening and partitions are re-assigned between consumer group members.

As an alternative, we can use Kafka Streams' state stores to cache results. Implementation-wise, it will be similar to the stores we already have for retries and batching.

The benefit of state stores is that cache entries will be "backed up" by Kafka changelog topics. So when app instances restart, or partitions get re-assigned, cache entries are not lost.

  • The maximum number of entries must be configurable
  • The maximum validity time of entries must be configurable (TTL semantics)
  • (Optional) It should be possible to choose between using in-memory, or persistence (RocksDB-backed) state stores
@nscuro nscuro added enhancement New feature or request p3 Nice-to-have features domain/vuln-analysis size/M Medium effort labels Mar 11, 2023
@nscuro
Copy link
Member Author

nscuro commented Aug 4, 2023

Reflecting on this, I don't believe this to be a good idea anymore.

Local stores backed by changelog topics is nice in theory, but in practice adds a resource overhead we should much rather avoid.

Since Quarkus 3.x, quarkus-cache supports Redis as external backend. The point about cache entries being lost upon restart is thus not really accurate anymore.

In order to resolve #215, I think we should much rather refactor scanners to use Confluent's parallel consumer instead. The logic of coordinating and aggregating scanner results will still be performed by Kafka Streams, but the heavy lifting of handling retries, batching, and parallelism for scans should be delegated to Parallel Consumer. See also #346.

@nscuro nscuro closed this as not planned Won't fix, can't repro, duplicate, stale Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain/vuln-analysis enhancement New feature or request p3 Nice-to-have features size/M Medium effort
Projects
None yet
Development

No branches or pull requests

1 participant