*: internal executor paradigm is bad for global latency (parallel txn) #60968

ajwerner · 2021-02-23T01:36:35Z

Is your feature request related to a problem? Please describe.

This ends up being a big issue arguing that the lack of support for parallelism in our *kv.Txn is making it very hard for us to write reasonable-latency global features. At one point the *kv.Txn was believed to be safe, but that was buggy (#17197).

Effectively all new stateful features added to cockroach use sql tables and, increasingly, use the sql to interact with that state. The problem is we tend to build this abstractions without thinking about latency implications and latency implications are really freaking hard to think about in the programming model.

One thing we sort of have to help but has not thus far been super reliable is the ddl_analysis benchmark suite (#50953). This isn't general and isn't really useful for internal things. It also doesn't deal with round-trips due to replication which system tables almost always experience.

This use of SQL is pretty great for many reasons; we get to use sql abstractions and we get to dogfood (to an extent; the internal executor isn't exactly the same thing the client experiences, but close). Then, if you look at all of the above packages, we end up building the wrappers around these sql tables as reasonably clean and well defined abstractions. The problem with these abstractions is they almost always entail executing at least one sql statement synchronously. That means, in the normal course of operation when we need to do something in a loop involving one of these subsystems, we incur at least N global round-trips.

There are a few different an important points to this:

The *kv.Txn API does not allow parallel writes (and only allows parallel reads if you jump through some hoops)
We tend to scatter the placement of leases and replicas over all regions for system tables.
- This may be getting better with multi-region.

Describe the solution you'd like

I think there's a few different approaches we could take. They range in how radical they are.

Exploit parallelism within a transaction.

This could do quite a bit. It doesn't give an opportunity for 1PC but that's probably okay.
The problem with this approach is that not all code is easily structured to be parallel.

Ship code close to the leaseholders.

This doesn't fundamentally solve the problem

Change the programming model to enable asynchronous batching.

This might look to be loosely related to sql,kv,storage: Deferred writes #31055
There's a lot of rethinking to do to make this sort of thing work where you'd enqueue some work to be done and then have some callback or other sort of cooperative continuation to put it together.
There isn't really much evidence that something callback oriented is any better than just going parallel and the coordinating between parallel processors (I think there's some theorem about equivalence here 🤔).

So, parallelism is probably our best way out; it certainly is the best fit to the programming model we're used to.

Additional context

Another important point is that these long-running transactions have a tendency to interfere with user queries to introspect this state. Use of global transactions may be the answer here!

There are some approaches to mitigate that like #35712 which might enable some amount of user introspection. That's what we're ultimately doing in a more hacky way in #60953.

Subsystems I have in mind:

jobs
protected timestamps
sql liveness
table stats
replication reports
statement bundles
... the list goes on

Jira issue: CRDB-3094

The text was updated successfully, but these errors were encountered:

cucaroach · 2022-10-25T11:56:26Z

Adding fuel to the fire here. I'm trying to optimize COPY and I don't think I can do a decent job w/o parallel writes. For a well dressed table like tpch.lineitem (8 indexes) I think we'll only achieve meaningful speedups with parallel writes. Parsing strings into datums is ~%5 of the time, massaging datums into kv requests is like ~%10 and %85 is kv (%5 kvclient/%20 kvserver/%50 storage/%10 "other"). So to get meaningful speedups I need to push parallel kv requests. IMPORT gets the job done by slicing and dicing and building SSTs which works at big scales but we want small and medium sized COPYs to be fast and be able to overlap with existing keys. We also don't want COPY go crazy with concurrency and resource consumption for chunks of work that don't justify all the coordination. So current thinking is that a nice middle ground would be to use separate goroutines to write to primary table and each index. This should minimize work spent splitting up kv batches into replica traffic and maximize speedups from overlapping writes to separate replicas. No reason not to exploit same approach for all batched inserts I think.

The only alternative is to go back to the ugly days of non-atomic COPY and just parallelize writing chunks of the COPY rows but that's a step backwards and will require an opt in with scary THIS ISNT TRANSACTIONAL warnings that will probably scare off most users.

Note that parallel writes achieved through a nested transaction or root/leaf model would be just fine as long as the COPY is atomic.

rafiss · 2022-10-25T13:42:51Z

The only alternative is to go back to the ugly days of non-atomic COPY and just parallelize writing chunks of the COPY rows but that's a step backwards and will require an opt in with scary THIS ISNT TRANSACTIONAL warnings that will probably scare off most users.

Can we make the non-transactional behavior opt-in? For example, with #85573 or some other CRDB-specific option for COPY.

ajwerner · 2022-10-25T14:27:18Z

I think there's a bright future behind deferring the writes. I don't quite know how to do it, but it seems possible in the case of copy to on some level coax the execution to just buffering the writes and running them later. I think it gets complex for fancier executions like fk checks.

cucaroach · 2022-10-25T15:17:38Z

Can we make the non-transactional behavior opt-in? For example, with #85573 or some other CRDB-specific option for COPY.

We entertained that notion for 22.2 but dismissed it, I really think its a step in the wrong direction but if parallel local-only writes are too big a lift for the near future we can reconsider.

Need to bone up on this deferred write concept...

ajwerner · 2022-10-25T15:21:08Z

I think we need to get the KV team involved in the discussions.

ajwerner · 2024-01-30T01:01:37Z

This discussion is interesting enough as a historical artifact, but this issue isn't helping anybody at this point.

ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 23, 2021

ajwerner mentioned this issue Apr 29, 2021

sql/catalog: provide batched access to descriptors #64388

Closed

jlinder added the T-kv KV Team label Jun 16, 2021

ajwerner mentioned this issue Nov 10, 2021

kv: buffered writes #72614

Open

ajwerner closed this as completed Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: internal executor paradigm is bad for global latency (parallel txn) #60968

*: internal executor paradigm is bad for global latency (parallel txn) #60968

ajwerner commented Feb 23, 2021 •

edited by cockroach-jira-scripts

Loading

cucaroach commented Oct 25, 2022

rafiss commented Oct 25, 2022

ajwerner commented Oct 25, 2022

cucaroach commented Oct 25, 2022

ajwerner commented Oct 25, 2022

ajwerner commented Jan 30, 2024

*: internal executor paradigm is bad for global latency (parallel txn) #60968

*: internal executor paradigm is bad for global latency (parallel txn) #60968

Comments

ajwerner commented Feb 23, 2021 • edited by cockroach-jira-scripts Loading

cucaroach commented Oct 25, 2022

rafiss commented Oct 25, 2022

ajwerner commented Oct 25, 2022

cucaroach commented Oct 25, 2022

ajwerner commented Oct 25, 2022

ajwerner commented Jan 30, 2024

ajwerner commented Feb 23, 2021 •

edited by cockroach-jira-scripts

Loading