-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native + sqlite WAL] Read-your-writes consistency violation under stress, due to SqlCursor escaping mutual exclusion of its parent connection #2123
Comments
Thinking through this. My understanding of the underlying Android sqlite driver would imply this may also be an issue in that context. I'd have to digest the detail to understand better, though. When working on the driver initially, the issue was as you've pointed out. The cursor needs to hold onto the statement because the user may be scrolling through, but the driver needs to move on with life and do other things. Passing in a lambda would simplify also simplify the driver implementation. The statement cache currently removes any statement in use and re-adds them when done. We don't need to do that for write statements, because This also means |
It seems JDBC and Android has a higher level As for whether the JDBC/Android drivers are prone to the same issue, it depends on whether their
In any case, this is really an opinioniated question, on whether there is a value in exposing this JDBC/Android magic cursor behavior. The android-paging extension seems to indicate a NO, since it does require users to write such query explicitly in a |
Thought about this some more. Commented on the lambda proposal, which I like conceptually. As an alternative, what if the connection remained aligned to the thread until the cursor is returned? That would prevent new queries on that connection until the previous one was closed. Still need to rethink the connection management somewhat, but that's been in progress anyway (at least conceptually). |
Yeah, I'm familiar with the Android driver's magic window. I have no experience with the JDBC driver.
Was thinking about that. Don't know how that and auto-commit would play out. I'd have to refresh how that driver works. Not implementing something like that would obviously be ideal. |
That would help, but a reader pool with a default max > 1 across all drivers might become essential. That or the reader pool needs to impose a soft cap* on reader concurrency instead to guarantee forward progress, i.e., not to be blocked by threads having long-running cursors. * e.g., serve one-off throwaway reader connections when the pool is exhausted of ready-to-use connections. |
I would agree, but assuming the dev isn't holding onto the cursor on purpose and just looping through, wouldn't the lambda have the same issue with a reader pool? |
I mean, the reader pool also needs some thinking, but holding onto the connection or passing the lambda would have the same problem (I think, unless I'm missing something). |
Indeed yes, #2132 can still block progress if the stepping takes long, just that the cursor is guaranteed to close, before the connection is returned to the pool for reuse. So "guaranteeing forward progress" is more in the context of users that will indeed hold onto a cursor. Having thought again about aligning threads to a connection until cursor closure, I now think that won't be able to address the read-your-writes consistency violation. It seems in Android, cursors can be taken across thread, or even on the main thread (?). Most importantly, when one works in a thread pool or non-blocking environment (e.g., coroutines), work that holds onto a cursor might be interleaved with other work reusing the same thread. So we'd end up where we are at today. |
Yeah, that's an issue. Again, I like the lambda idea, but thinking through alternatives. We'd have to rely on the user not dragging the cursor across threads, and always closing it appropriately, which is problematic.
This is definitely a pending issue, and it will need to be addressed, but native coroutines aren't thread pooled (today), so we've kicked that a bit into the future to resolve. Detailed post from an Android engineer discussing the problem and resolution in the driver. The Android driver also used to align transactions to threads, but that doesn't work with coroutines thread pooling. |
A possible approach to keep long-running cursor support on top of #2132 could be that:
In other words, keep the long-running cursors away from the connection pools. |
It would be good to understand the use cases for the cursor. If it's not to pass around, but to avoid creating a big list of data objects, the lambda route makes more sense. In any case, see if anybody else weighs in. |
Runtime Environment
SQLDelight version: 1.4.3
Application OS: iOS 14.x
Proposed change: #2132
TL;DR
SQLite Cursors/Statements are allowed to escape the critical section for SQLite connections. This causes intermittent read-your-writes consistency violation at least with the native driver in WAL mode.
Query listeners and after commit callbacks fail to see the written data, when the system is stressing SQLDelight.
Identified issue
In
executeQuery
ofNativeSqlDatabase
, theSqlCursor
escapes the critical section provided byaccessConnection()
.sqldelight/drivers/native-driver/src/nativeMain/kotlin/com/squareup/sqldelight/drivers/native/NativeSqlDatabase.kt
Lines 53 to 63 in 51e0621
This means other threads can potentially acquire and start using the query connection, while the earlier owner is still stepping the result set through the escaped
SqlCursor
. While concurrently accessing the SQLite connection itself isn't problematic, the escapedSqlCursor
has implications in WAL mode.WAL readers cannot move its end mark when it still has active statements. In other words, if the read/query connection is prematurely yielded from a 1st thread 1 to a 2nd thread, the 2nd thread may not see any new data written by itself through the "transction pool" write connection.
It is a race condition that violates read-your-writes consistency, in that the data are visible if and only if the 1st thread by chance closes the
SqlCursor
, before than the 2nd thread starts any read. Such race condition can be unintuitive to isolate and fix — imagine an application with lots of read and write queries issued from 2+ threads, that unit tests are passing with small test data sets, and that hiccups only start to emerge as it processes growing user generated data.Attached below is a code snippet to illustrate the issue. Note that the cursor lifetime is exaggerated by using
execute(): SqlCursor
directly. In practice, we usually use the convenientexecuteAs*()
methods, whose cursor lifetime beyond the critical section scales with the result set size.This issue has to be solved at framework level, since:
Note that this affects not only the
afterCommit
hook, but also any query listeners as they are both called out at the same place. Moreover, this issue is applicable to SQLite usage on any platform, since this is rooted in SQLite's API design choice.Proposed solution
SqlCursor
and any other resource should be bound to the critical section. They should be closed before the current thread releases its lock on the connection.The signature of
SqlDriver.executeQuery
andQuery.execute()
should be changed to accept a(SqlCursor) -> R
block, so that the block can be evaluated and immediately followed up with cleanup actions, right inside theaccessConnection()
critical section. The result of the block is passed back to the caller.This is however an API breaking change.
Our proposed patch for the issue: #2132
The text was updated successfully, but these errors were encountered: