Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NormalizeCaching cache grows without bound #2484

Closed
easel opened this issue May 19, 2022 · 11 comments · Fixed by #2878
Closed

NormalizeCaching cache grows without bound #2484

easel opened this issue May 19, 2022 · 11 comments · Fixed by #2878

Comments

@easel
Copy link

easel commented May 19, 2022

Version: 3.16.5
Module: quill-jdbc-zio
Database: postgres

Expected behavior

JVM does not run OOM after running a lot of dynamic queries

Actual behavior

JVM runs OOM

Steps to reproduce the behavior

I've somehow managed to construct a dynamic query. My assumption is that because the query is dynamic, it's being compiled and run through NormalizedCaching and sticking some changing element (probably a date range from the query) into the cache. This is all fine except the cache never evicts anything and the jvm runs out of heap.

Workaround

Don't use dynamic queries?

Notes

I assume the thing to do here is put a bound on the cache size, possibly even evicting the entire thing in case the bound is met. This seems like a sufficient edge case that it's not worth introducing a proper LRU cache. The ConcurrentHashMap at

private val cache = new ConcurrentHashMap[Ast, Ast]
is the element that's growing without bound for me.

An alternative solution would be to somehow prevent high entropy nodes from going into the cache. Not sure what this would look like as of yet, but I'm going to try (again) to make this query compile statically and may learn something in the process.

@getquill/maintainers

@easel
Copy link
Author

easel commented May 19, 2022

As an aside, I was able to get the dynamic query to compile statically by removing a type ascription to Quoted[Query[T]] from a sub-query. In general, I have found that IntelliJ fails to fully infer the correct types for quoted queries to be composed correctly, which can cause some confusion. Seems like an issue that might be worth calling out in the docs, although perhaps it's there and I simply haven't noticed it?

I will report back if removing the dynamic query resolves the memory leak. I'm optimistic it will.

@easel
Copy link
Author

easel commented May 20, 2022

For reference "don't ascribe types to quotations" best practice is reference in the docs, although not that loudly. https://github.com/zio/zio-quill#compile-time-quotations

@easel
Copy link
Author

easel commented May 20, 2022

And in other news, the root cause was indeed the dynamic query. Now that it's static the memory usage is completely stable. I'd argue this issue should still be fixed, however, it's quite unexpected for a library such as quill to have a fixed and growing multi-gb memory footprint.

@deusaquilus
Copy link
Collaborator

deusaquilus commented May 20, 2022

I think this is a perfect use-case for ZIO cache since it has a fixed size eviction ordering. Will mention this to @adamgfraser who will hopefully have some helpful advice.

@jilen
Copy link
Collaborator

jilen commented May 20, 2022

I've used a lot of dynamic query and haven't see the OOM yet.
Do you mind show me what kind of dynamic query do you have ?

@easel
Copy link
Author

easel commented May 20, 2022

@deusaquilus The query looks like the following. Note that removing the errant type ascription on selectNextQuery allows it to compile statically, which seems to resolve the cache explosion. The only input value to the query that changes regularly is completedBefore, but this query gets run to poll for new work so any memory leak compounds quite quickly.

  implicit class Ext[T](q: Query[T]) {
    def forUpdateSkipLocked() = quote(
      infix"$q FOR UPDATE SKIP LOCKED".pure.as[Query[T]]
    )
  }

  implicit class InstantOps(left: Instant) {
    /* The types for these operators are some sort of complex existential. Don't
     * try to use IntelliJ's 'add explicit type' refactoring on then, it won't get
     * the full type and will break quoted expressions.
     */
    def >(right: Instant) = quote(infix"$left > $right".pure.as[Boolean])
    def >=(right: Instant) = quote(infix"$left >= $right".pure.as[Boolean])
    def <(right: Instant) = quote(infix"$left < $right".pure.as[Boolean])
    def <=(right: Instant) = quote(infix"$left <= $right".pure.as[Boolean])
  }

  /** For a given connector id, deterministically compute it's bucket out of
    * `buckets` using a given `salt`. This allows N nodes where N < buckets to
    * pull the 'next' record biased towards each node pulling the oldest
    * connector in it's top bucket instead of always getting the oldest across
    * all nodes.
    * @param s1
    */
  implicit private class ConnectorIdOps(s1: ConnectorId) {
    def bucket(buckets: Int, salt: String) = quote(
      infix"substr('x' || md5(cast($s1 as text) || $salt), 1, 8)::bit(32)::int % $buckets".pure
        .as[String]
    )
  }

  case class State(
    accountId: AccountId,
    connectorId: ConnectorId,
    eventClassStates: Option[Json] = None,
    lastStart: Option[Instant] = None,
    lastFinish: Option[Instant] = None,
    lastForwardComplete: Option[Instant] = None,
    lastBackwardComplete: Option[Instant] = None,
    stored: Long = 0
  )

  private def selectNextQuery(
    id: String,
    completedBefore: Instant
  ): Quoted[Query[ConnectorId]] =
    quote(
      query[state]
        .filter { x =>
          (x.lastStart.isEmpty
            || x.lastStart.exists(s => s < lift(completedBefore))
            || x.lastStart.exists(s => x.lastFinish.exists(f => f >= s))) &&
          (x.lastBackwardComplete.isEmpty ||
            x.lastForwardComplete.isEmpty ||
            x.lastForwardComplete.exists(x => x < lift(completedBefore)))
        }
        .sortBy(x =>
          (x.connectorId.bucket(lift(buckets), lift(id)), x.lastStart)
        )
        .map(_.connectorId)
        .forUpdateSkipLocked()
        .take(1)
    )

  def getNext(
    id: String,
    completedBefore: Instant,
    asOf: Instant
  ): ZIO[Has[DataSource], SQLException, Option[State]] =
    run(quote {
      query[State]
        .filter(x =>
          selectNextQuery(id, completedBefore).contains(x.connectorId)
        )
        .update(x => x.lastStart -> lift(Option(asOf)))
        .returning(r => r)
    })
      .catchSomeDefect {
        case e: IllegalStateException
            if e.getMessage.contains("Expected a single result") =>
          ZIO.none
      }

@jilen
Copy link
Collaborator

jilen commented May 20, 2022

I'll take a look at it.
In theory it should generate very few Ast elements.

@jilen
Copy link
Collaborator

jilen commented May 27, 2022

Can you provide a minimal self-contained reproduce example? I try you example but it lack some context and won't compile

@Hiroki6
Copy link

Hiroki6 commented Feb 13, 2023

I also saw a similar issue with quill-cassandra version 3.12.0 and 4.6.0 even though I didn't see this issue when I used quill-cassandra version 3.4.0.
After I changed all dynamic queries into static ones. The issue was gone.
I saw scala.collection.mutable.LinkedEntry used a lot of memory

@matwojcik
Copy link

We've been hit by this bug. Complex dynamic query with 3 filterOpt parameters. Constant 24/7 traffic of 1k rpm per instance results in the cache growth to 350 MBs in couple of hours.

After rewrite to compiled query the issue was resolved.

@wb14123
Copy link
Contributor

wb14123 commented Sep 27, 2023

I'm having the same issue. My dynamic query is quite simple: just update the row based on whether the passed in parameters are None or not:

  override def update(id: ID, updater: SourceUpdater): IO[Boolean] = {
    Clock[IO].realTimeInstant.flatMap { nowInstant =>
      val now = ZonedDateTime.ofInstant(nowInstant, ZoneId.systemDefault())
      val q = dynamicQuery[Source]
        .filter(_.id == lift(id))
        .update(
          setOpt(_.description, updater.description),
          setOpt(_.fetchCompletedAt, updater.fetchCompletedAt),
          setOpt(_.fetchStatus, updater.fetchStatus),
          setOpt(_.fetchDelayMillis, updater.fetchDelayMillis),
          setOpt(_.fetchFailedMsg, updater.fetchFailedMsg),
          setOpt(_.fetchScheduledAt, updater.fetchScheduledAt),
          setOpt(_.htmlUrl, updater.htmlUrl),
          setOpt(_.title, updater.title),
          setOpt(_.fetchStartedAt, updater.fetchStartedAt),
          setOpt(_.articleOrder, updater.articleOrder),
          setOpt(_.iconUrl, updater.iconUrl),
          setOpt(_.updatedAt, Some(now)),
          setOpt(_.fetchErrorCount, updater.fetchErrorCount),
          setOpt(_.showTitle, updater.showTitle),
          setOpt(_.showMedia, updater.showMedia),
          setOpt(_.showFullArticle, updater.showFullArticle),
        )
      run(q).transact(xa).map(_ > 0)
    }
  }

wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 27, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 40960
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Google Guava to use its cache implementation. I
know it's better to avoid new dependency but Google Guava is a pretty
widely adopted library. It's cache implementation is widely tested. For
such a critical use case in quill, I think it's better to have a widely
tested implementation than reinventing the wheel.
wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 27, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 40960
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Google Guava to use its cache implementation. I
know it's better to avoid new dependency but Google Guava is a pretty
widely adopted library. It's cache implementation is widely tested. For
such a critical use case in quill, I think it's better to have a widely
tested implementation than reinventing the wheel.
wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 27, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Google Guava to use its cache implementation. I
know it's better to avoid new dependency but Google Guava is a pretty
widely adopted library. It's cache implementation is widely tested. For
such a critical use case in quill, I think it's better to have a widely
tested implementation than reinventing the wheel.
wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 29, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Caffeine to use its cache implementation, which
is recommended by Guava. For a critical use case in quill, I think
it's better to have a widely tested implementation than reinventing the wheel.
wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 29, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Caffeine to use its cache implementation, which
is recommended by Guava. For a critical use case in quill, I think
it's better to have a widely tested implementation than reinventing the wheel.
wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 29, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Caffeine to use its cache implementation, which
is recommended by Guava. For a critical use case in quill, I think
it's better to have a widely tested implementation than reinventing the wheel.
wb14123 added a commit to wb14123/zio-quill that referenced this issue Sep 29, 2023
This fixes zio#2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Caffeine to use its cache implementation, which
is recommended by Guava. For a critical use case in quill, I think
it's better to have a widely tested implementation than reinventing the wheel.
guizmaii added a commit that referenced this issue Sep 29, 2023
This fixes #2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Caffeine to use its cache implementation, which
is recommended by Guava. For a critical use case in quill, I think
it's better to have a widely tested implementation than reinventing the wheel.

Co-authored-by: Jules Ivanic <guizmaii@users.noreply.github.com>
jilen pushed a commit that referenced this issue Jun 11, 2024
This fixes #2484. Added a capacity to normalize cache. Default to 1024
and can be override by `quill.query.cacheDynamicMaxSize`.

Added a new dependency Caffeine to use its cache implementation, which
is recommended by Guava. For a critical use case in quill, I think
it's better to have a widely tested implementation than reinventing the wheel.

Co-authored-by: Jules Ivanic <guizmaii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants