NormalizeCaching cache grows without bound #2484

easel · 2022-05-19T19:35:14Z

Version: 3.16.5
Module: quill-jdbc-zio
Database: postgres

Expected behavior

JVM does not run OOM after running a lot of dynamic queries

Actual behavior

JVM runs OOM

Steps to reproduce the behavior

I've somehow managed to construct a dynamic query. My assumption is that because the query is dynamic, it's being compiled and run through NormalizedCaching and sticking some changing element (probably a date range from the query) into the cache. This is all fine except the cache never evicts anything and the jvm runs out of heap.

Workaround

Don't use dynamic queries?

Notes

I assume the thing to do here is put a bound on the cache size, possibly even evicting the entire thing in case the bound is met. This seems like a sufficient edge case that it's not worth introducing a proper LRU cache. The ConcurrentHashMap at

zio-quill/quill-engine/src/main/scala/io/getquill/norm/NormalizeCaching.scala

Line 7 in 52011e6

private val cache = new ConcurrentHashMap[Ast, Ast]

is the element that's growing without bound for me.

An alternative solution would be to somehow prevent high entropy nodes from going into the cache. Not sure what this would look like as of yet, but I'm going to try (again) to make this query compile statically and may learn something in the process.

@getquill/maintainers

easel · 2022-05-19T19:47:24Z

As an aside, I was able to get the dynamic query to compile statically by removing a type ascription to Quoted[Query[T]] from a sub-query. In general, I have found that IntelliJ fails to fully infer the correct types for quoted queries to be composed correctly, which can cause some confusion. Seems like an issue that might be worth calling out in the docs, although perhaps it's there and I simply haven't noticed it?

I will report back if removing the dynamic query resolves the memory leak. I'm optimistic it will.

easel · 2022-05-20T01:40:31Z

For reference "don't ascribe types to quotations" best practice is reference in the docs, although not that loudly. https://github.com/zio/zio-quill#compile-time-quotations

easel · 2022-05-20T01:41:47Z

And in other news, the root cause was indeed the dynamic query. Now that it's static the memory usage is completely stable. I'd argue this issue should still be fixed, however, it's quite unexpected for a library such as quill to have a fixed and growing multi-gb memory footprint.

deusaquilus · 2022-05-20T02:11:19Z

I think this is a perfect use-case for ZIO cache since it has a fixed size eviction ordering. Will mention this to @adamgfraser who will hopefully have some helpful advice.

jilen · 2022-05-20T04:47:33Z

I've used a lot of dynamic query and haven't see the OOM yet.
Do you mind show me what kind of dynamic query do you have ?

easel · 2022-05-20T08:55:39Z

@deusaquilus The query looks like the following. Note that removing the errant type ascription on selectNextQuery allows it to compile statically, which seems to resolve the cache explosion. The only input value to the query that changes regularly is completedBefore, but this query gets run to poll for new work so any memory leak compounds quite quickly.

  implicit class Ext[T](q: Query[T]) {
    def forUpdateSkipLocked() = quote(
      infix"$q FOR UPDATE SKIP LOCKED".pure.as[Query[T]]
    )
  }

  implicit class InstantOps(left: Instant) {
    /* The types for these operators are some sort of complex existential. Don't
     * try to use IntelliJ's 'add explicit type' refactoring on then, it won't get
     * the full type and will break quoted expressions.
     */
    def >(right: Instant) = quote(infix"$left > $right".pure.as[Boolean])
    def >=(right: Instant) = quote(infix"$left >= $right".pure.as[Boolean])
    def <(right: Instant) = quote(infix"$left < $right".pure.as[Boolean])
    def <=(right: Instant) = quote(infix"$left <= $right".pure.as[Boolean])
  }

  /** For a given connector id, deterministically compute it's bucket out of
    * `buckets` using a given `salt`. This allows N nodes where N < buckets to
    * pull the 'next' record biased towards each node pulling the oldest
    * connector in it's top bucket instead of always getting the oldest across
    * all nodes.
    * @param s1
    */
  implicit private class ConnectorIdOps(s1: ConnectorId) {
    def bucket(buckets: Int, salt: String) = quote(
      infix"substr('x' || md5(cast($s1 as text) || $salt), 1, 8)::bit(32)::int % $buckets".pure
        .as[String]
    )
  }

  case class State(
    accountId: AccountId,
    connectorId: ConnectorId,
    eventClassStates: Option[Json] = None,
    lastStart: Option[Instant] = None,
    lastFinish: Option[Instant] = None,
    lastForwardComplete: Option[Instant] = None,
    lastBackwardComplete: Option[Instant] = None,
    stored: Long = 0
  )

  private def selectNextQuery(
    id: String,
    completedBefore: Instant
  ): Quoted[Query[ConnectorId]] =
    quote(
      query[state]
        .filter { x =>
          (x.lastStart.isEmpty
            || x.lastStart.exists(s => s < lift(completedBefore))
            || x.lastStart.exists(s => x.lastFinish.exists(f => f >= s))) &&
          (x.lastBackwardComplete.isEmpty ||
            x.lastForwardComplete.isEmpty ||
            x.lastForwardComplete.exists(x => x < lift(completedBefore)))
        }
        .sortBy(x =>
          (x.connectorId.bucket(lift(buckets), lift(id)), x.lastStart)
        )
        .map(_.connectorId)
        .forUpdateSkipLocked()
        .take(1)
    )

  def getNext(
    id: String,
    completedBefore: Instant,
    asOf: Instant
  ): ZIO[Has[DataSource], SQLException, Option[State]] =
    run(quote {
      query[State]
        .filter(x =>
          selectNextQuery(id, completedBefore).contains(x.connectorId)
        )
        .update(x => x.lastStart -> lift(Option(asOf)))
        .returning(r => r)
    })
      .catchSomeDefect {
        case e: IllegalStateException
            if e.getMessage.contains("Expected a single result") =>
          ZIO.none
      }

jilen · 2022-05-20T09:33:13Z

I'll take a look at it.
In theory it should generate very few Ast elements.

jilen · 2022-05-27T02:06:52Z

Can you provide a minimal self-contained reproduce example? I try you example but it lack some context and won't compile

Hiroki6 · 2023-02-13T11:05:32Z

I also saw a similar issue with quill-cassandra version 3.12.0 and 4.6.0 even though I didn't see this issue when I used quill-cassandra version 3.4.0.
After I changed all dynamic queries into static ones. The issue was gone.
I saw scala.collection.mutable.LinkedEntry used a lot of memory

matwojcik · 2023-06-07T09:19:47Z

We've been hit by this bug. Complex dynamic query with 3 filterOpt parameters. Constant 24/7 traffic of 1k rpm per instance results in the cache growth to 350 MBs in couple of hours.

After rewrite to compiled query the issue was resolved.

wb14123 · 2023-09-27T00:26:38Z

I'm having the same issue. My dynamic query is quite simple: just update the row based on whether the passed in parameters are None or not:

  override def update(id: ID, updater: SourceUpdater): IO[Boolean] = {
    Clock[IO].realTimeInstant.flatMap { nowInstant =>
      val now = ZonedDateTime.ofInstant(nowInstant, ZoneId.systemDefault())
      val q = dynamicQuery[Source]
        .filter(_.id == lift(id))
        .update(
          setOpt(_.description, updater.description),
          setOpt(_.fetchCompletedAt, updater.fetchCompletedAt),
          setOpt(_.fetchStatus, updater.fetchStatus),
          setOpt(_.fetchDelayMillis, updater.fetchDelayMillis),
          setOpt(_.fetchFailedMsg, updater.fetchFailedMsg),
          setOpt(_.fetchScheduledAt, updater.fetchScheduledAt),
          setOpt(_.htmlUrl, updater.htmlUrl),
          setOpt(_.title, updater.title),
          setOpt(_.fetchStartedAt, updater.fetchStartedAt),
          setOpt(_.articleOrder, updater.articleOrder),
          setOpt(_.iconUrl, updater.iconUrl),
          setOpt(_.updatedAt, Some(now)),
          setOpt(_.fetchErrorCount, updater.fetchErrorCount),
          setOpt(_.showTitle, updater.showTitle),
          setOpt(_.showMedia, updater.showMedia),
          setOpt(_.showFullArticle, updater.showFullArticle),
        )
      run(q).transact(xa).map(_ > 0)
    }
  }

This fixes zio#2484. Added a capacity to normalize cache. Default to 40960 and can be override by `quill.query.cacheDynamicMaxSize`. Added a new dependency Google Guava to use its cache implementation. I know it's better to avoid new dependency but Google Guava is a pretty widely adopted library. It's cache implementation is widely tested. For such a critical use case in quill, I think it's better to have a widely tested implementation than reinventing the wheel.

This fixes zio#2484. Added a capacity to normalize cache. Default to 1024 and can be override by `quill.query.cacheDynamicMaxSize`. Added a new dependency Google Guava to use its cache implementation. I know it's better to avoid new dependency but Google Guava is a pretty widely adopted library. It's cache implementation is widely tested. For such a critical use case in quill, I think it's better to have a widely tested implementation than reinventing the wheel.

This fixes zio#2484. Added a capacity to normalize cache. Default to 1024 and can be override by `quill.query.cacheDynamicMaxSize`. Added a new dependency Caffeine to use its cache implementation, which is recommended by Guava. For a critical use case in quill, I think it's better to have a widely tested implementation than reinventing the wheel.

This fixes #2484. Added a capacity to normalize cache. Default to 1024 and can be override by `quill.query.cacheDynamicMaxSize`. Added a new dependency Caffeine to use its cache implementation, which is recommended by Guava. For a critical use case in quill, I think it's better to have a widely tested implementation than reinventing the wheel. Co-authored-by: Jules Ivanic <guizmaii@users.noreply.github.com>

easel mentioned this issue May 20, 2022

WIP Clear out NoramlizeCache if size exceeds 100 entries #2485

Closed

5 tasks

deusaquilus mentioned this issue May 24, 2022

To consider making back the quill-core free of ZIO dependency #2493

Closed

Hiroki6 mentioned this issue Feb 9, 2023

use static query as much as possible instead of dynamic query ubirch/ubirch-event-log#91

Merged

guizmaii added easy good first issue labels Jun 13, 2023

wb14123 mentioned this issue Sep 27, 2023

Add a max size for normalize cache #2878

Merged

5 tasks

guizmaii closed this as completed in #2878 Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NormalizeCaching cache grows without bound #2484

NormalizeCaching cache grows without bound #2484

easel commented May 19, 2022 •

edited

Loading

easel commented May 19, 2022

easel commented May 20, 2022

easel commented May 20, 2022

deusaquilus commented May 20, 2022 •

edited

Loading

jilen commented May 20, 2022 •

edited

Loading

easel commented May 20, 2022

jilen commented May 20, 2022 •

edited

Loading

jilen commented May 27, 2022

Hiroki6 commented Feb 13, 2023 •

edited

Loading

matwojcik commented Jun 7, 2023

wb14123 commented Sep 27, 2023

NormalizeCaching cache grows without bound #2484

NormalizeCaching cache grows without bound #2484

Comments

easel commented May 19, 2022 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce the behavior

Workaround

Notes

easel commented May 19, 2022

easel commented May 20, 2022

easel commented May 20, 2022

deusaquilus commented May 20, 2022 • edited Loading

jilen commented May 20, 2022 • edited Loading

easel commented May 20, 2022

jilen commented May 20, 2022 • edited Loading

jilen commented May 27, 2022

Hiroki6 commented Feb 13, 2023 • edited Loading

matwojcik commented Jun 7, 2023

wb14123 commented Sep 27, 2023

easel commented May 19, 2022 •

edited

Loading

deusaquilus commented May 20, 2022 •

edited

Loading

jilen commented May 20, 2022 •

edited

Loading

jilen commented May 20, 2022 •

edited

Loading

Hiroki6 commented Feb 13, 2023 •

edited

Loading