Use Kafka Streams state stores to cache scanner results #382

nscuro · 2023-03-11T15:02:52Z

We currently cache scanner results in-memory using quarkus-cache. It is simple and "just works™️".

The downside is that cached results will be lost when the vulnerability-analyzer instance is restarted, or a rebalance is happening and partitions are re-assigned between consumer group members.

As an alternative, we can use Kafka Streams' state stores to cache results. Implementation-wise, it will be similar to the stores we already have for retries and batching.

The benefit of state stores is that cache entries will be "backed up" by Kafka changelog topics. So when app instances restart, or partitions get re-assigned, cache entries are not lost.

The maximum number of entries must be configurable
The maximum validity time of entries must be configurable (TTL semantics)
(Optional) It should be possible to choose between using in-memory, or persistence (RocksDB-backed) state stores

The text was updated successfully, but these errors were encountered:

nscuro · 2023-08-04T10:18:46Z

Reflecting on this, I don't believe this to be a good idea anymore.

Local stores backed by changelog topics is nice in theory, but in practice adds a resource overhead we should much rather avoid.

Since Quarkus 3.x, quarkus-cache supports Redis as external backend. The point about cache entries being lost upon restart is thus not really accurate anymore.

In order to resolve #215, I think we should much rather refactor scanners to use Confluent's parallel consumer instead. The logic of coordinating and aggregating scanner results will still be performed by Kafka Streams, but the heavy lifting of handling retries, batching, and parallelism for scans should be delegated to Parallel Consumer. See also #346.

nscuro added enhancement New feature or request p3 Nice-to-have features domain/vuln-analysis size/M Medium effort labels Mar 11, 2023

nscuro closed this as not planned Won't fix, can't repro, duplicate, stale Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Kafka Streams state stores to cache scanner results #382

Use Kafka Streams state stores to cache scanner results #382

nscuro commented Mar 11, 2023

nscuro commented Aug 4, 2023

Use Kafka Streams state stores to cache scanner results #382

Use Kafka Streams state stores to cache scanner results #382

Comments

nscuro commented Mar 11, 2023

nscuro commented Aug 4, 2023