Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: allow columnar data generation (alloc -73%, MB/s +60%) #35349

Merged
merged 1 commit into from
May 7, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Mar 4, 2019

A workload.Generator previously could specifiy its initial table data
using one of two methods. One outputs a single row as []interface{}
and the other is a batch of [][]interface{}. This commit switches the
latter to be a columar batch, allowing selected performance-critical
workload generators (tpcc, bank) to generate initial table data with
dramatically reduced allocations.

The single row option still exists and will continue to be used by most
workload.Generators, as it's much simpler. An adaptor from the columnar
batch to the old row oriented [][]interface{} batch is included for
ease of use in non-performance sensitive consumers of initial table
data. IMPORT, on the other hand, is switched to using the columnar
batches directly.

Some of the savings here is reusing batches, but the majority comes from
[]interface{} requiring that everything assinged to it be on the heap,
while something like []int doesn't. We also get some amount of speedup
in initial table data consumers from fewer type switches.

This could also be accomplished without being columnar, but doing the
minimum non-columnar thing to get the same results will require
reinventing most of the work that the columnar does. Plus it will almost
certainly be convenient for columnar exec benchmarking to have a common
format.

Benchmark results:

name                               old time/op    new time/op    delta
InitialData/tpcc/warehouses=1-8       552ms ± 1%     454ms ± 0%  -17.81%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8          461µs ± 0%     332µs ± 1%  -28.10%  (p=0.008 n=5+5)
WriteCSVRows-8                       15.0µs ± 1%    16.8µs ± 1%  +12.44%  (p=0.008 n=5+5)
CSVRowsReader-8                      17.8µs ± 1%    19.3µs ± 1%   +8.41%  (p=0.008 n=5+5)

name                               old speed      new speed      delta
InitialData/tpcc/warehouses=1-8     128MB/s ± 1%   209MB/s ± 0%  +63.22%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8        223MB/s ± 0%   350MB/s ± 1%  +56.83%  (p=0.008 n=5+5)
WriteCSVRows-8                      118MB/s ± 2%   107MB/s ± 3%   -9.43%  (p=0.008 n=5+5)
CSVRowsReader-8                     101MB/s ± 4%    92MB/s ± 5%   -8.31%  (p=0.016 n=5+5)

name                               old alloc/op   new alloc/op   delta
InitialData/tpcc/warehouses=1-8       246MB ± 0%      75MB ± 0%  -69.61%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8          232kB ± 0%      19kB ± 0%  -91.75%  (p=0.008 n=5+5)
WriteCSVRows-8                       5.60kB ± 0%    5.70kB ± 0%   +1.71%  (p=0.016 n=4+5)
CSVRowsReader-8                      7.35kB ± 1%    7.39kB ± 0%     ~     (p=0.667 n=5+4)

name                               old allocs/op  new allocs/op  delta
InitialData/tpcc/warehouses=1-8       5.60M ± 0%     1.48M ± 0%  -73.56%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8          6.00k ± 0%     1.02k ± 0%  -83.01%  (p=0.008 n=5+5)
WriteCSVRows-8                         48.0 ± 0%      50.0 ± 0%   +4.17%  (p=0.008 n=5+5)
CSVRowsReader-8                        54.0 ± 0%      55.0 ± 0%   +1.85%  (p=0.008 n=5+5)

We could probably speed the CSV stuff back up with some followup work,
but (a) the overall fixtures import benchmark is still faster and (b)
we're about to switch to dt's import magic that skips the CSV roundtrip.

Current fixtures import

name                               old time/op    new time/op    delta
ImportFixture/tpcc/warehouses=1-8     3.35s ± 2%     3.29s ± 3%     ~     (p=0.310 n=5+5)

name                               old speed      new speed      delta
ImportFixture/tpcc/warehouses=1-8  26.4MB/s ± 2%  26.9MB/s ± 3%     ~     (p=0.310 n=5+5)

name                               old alloc/op   new alloc/op   delta
ImportFixture/tpcc/warehouses=1-8    3.09GB ± 0%    2.92GB ± 0%   -5.60%  (p=0.008 n=5+5)

name                               old allocs/op  new allocs/op  delta
ImportFixture/tpcc/warehouses=1-8     32.1M ± 0%     28.0M ± 0%  -12.86%  (p=0.008 n=5+5)

Magic fixtures import that skips CSV roundtrip:

name                               old time/op    new time/op    delta
ImportFixture/tpcc/warehouses=1-8     2.88s ± 3%     2.86s ± 4%     ~     (p=0.548 n=5+5)

name                               old speed      new speed      delta
ImportFixture/tpcc/warehouses=1-8  30.7MB/s ± 3%  30.9MB/s ± 4%     ~     (p=0.548 n=5+5)

name                               old alloc/op   new alloc/op   delta
ImportFixture/tpcc/warehouses=1-8    2.87GB ± 0%    2.73GB ± 0%   -4.97%  (p=0.008 n=5+5)

name                               old allocs/op  new allocs/op  delta
ImportFixture/tpcc/warehouses=1-8     26.1M ± 0%     21.5M ± 0%  -17.75%  (p=0.008 n=5+5)

Touches #34809

Release note: None

@danhhz danhhz added the do-not-merge bors won't merge a PR with this label. label Mar 4, 2019
@danhhz danhhz requested review from jordanlewis and a team March 4, 2019 16:32
@danhhz danhhz requested a review from a team as a code owner March 4, 2019 16:32
@danhhz danhhz requested review from a team March 4, 2019 16:32
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented Mar 25, 2019

Okay @jordanlewis this is as far as I got on Friday. There's a bit of a performance regression, so I haven't bothered updated the commit message or PR description yet. This is a bummer and there's definitely a bunch of places I can optimize, but I'm torn between how huge and scary this already is vs having a window between landing this and the followup perf work to really get all the benefits of this. Note that even though the csv writing (which we still use for a little while longer) is slower, fixtures import tpcc already uses drastically fewer allocs and is overall a bit faster.

name                               old time/op    new time/op    delta
pkg:github.com/cockroachdb/cockroach/pkg/workload goos:darwin goarch:amd64
InitialData/tpcc/warehouses=1-8       628ms ± 2%     539ms ± 1%  -14.14%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8          572µs ±10%     752µs ± 2%  +31.40%  (p=0.008 n=5+5)
WriteCSVRows-8                       15.9µs ± 2%    18.9µs ± 2%  +18.93%  (p=0.008 n=5+5)
CSVRowsReader-8                      18.1µs ± 0%    20.7µs ± 1%  +14.05%  (p=0.016 n=4+5)
pkg:github.com/cockroachdb/cockroach/pkg/ccl/workloadccl goos:darwin goarch:amd64
ImportFixture/tpcc/warehouses=1-8     3.25s ± 1%     3.31s ± 0%   +1.83%  (p=0.016 n=5+4)

name                               old speed      new speed      delta
pkg:github.com/cockroachdb/cockroach/pkg/workload goos:darwin goarch:amd64
InitialData/tpcc/warehouses=1-8     113MB/s ± 2%   176MB/s ± 1%  +56.23%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8        180MB/s ± 9%   154MB/s ± 2%  -14.42%  (p=0.008 n=5+5)
WriteCSVRows-8                      114MB/s ± 3%    96MB/s ± 4%  -15.28%  (p=0.008 n=5+5)
CSVRowsReader-8                     101MB/s ± 4%    87MB/s ± 5%  -14.47%  (p=0.008 n=5+5)
pkg:github.com/cockroachdb/cockroach/pkg/ccl/workloadccl goos:darwin goarch:amd64
ImportFixture/tpcc/warehouses=1-8  27.2MB/s ± 1%  26.7MB/s ± 0%   -1.80%  (p=0.016 n=5+4)

name                               old alloc/op   new alloc/op   delta
pkg:github.com/cockroachdb/cockroach/pkg/workload goos:darwin goarch:amd64
InitialData/tpcc/warehouses=1-8       246MB ± 0%      75MB ± 0%  -69.51%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8          232kB ± 0%     347kB ± 0%  +49.41%  (p=0.016 n=5+4)
WriteCSVRows-8                       5.59kB ± 0%    7.74kB ± 0%  +38.44%  (p=0.016 n=5+4)
CSVRowsReader-8                      7.38kB ± 1%    7.38kB ± 0%     ~     (p=0.206 n=5+4)
pkg:github.com/cockroachdb/cockroach/pkg/ccl/workloadccl goos:darwin goarch:amd64
ImportFixture/tpcc/warehouses=1-8    2.56GB ± 0%    2.39GB ± 0%   -6.72%  (p=0.008 n=5+5)

name                               old allocs/op  new allocs/op  delta
pkg:github.com/cockroachdb/cockroach/pkg/workload goos:darwin goarch:amd64
InitialData/tpcc/warehouses=1-8       5.60M ± 0%     1.48M ± 0%  -73.56%  (p=0.008 n=5+5)
InitialData/bank/rows=1000-8          6.00k ± 0%     7.02k ± 0%  +16.97%  (p=0.008 n=5+5)
WriteCSVRows-8                         48.0 ± 0%      51.0 ± 0%   +6.25%  (p=0.008 n=5+5)
CSVRowsReader-8                        53.0 ± 0%      54.0 ± 0%   +1.89%  (p=0.008 n=5+5)
pkg:github.com/cockroachdb/cockroach/pkg/ccl/workloadccl goos:darwin goarch:amd64
ImportFixture/tpcc/warehouses=1-8     31.7M ± 0%     27.6M ± 0%  -13.02%  (p=0.016 n=5+4)

@jordanlewis
Copy link
Member

Any initial guesses why it might be slower? I'll take a look soon.

@danhhz
Copy link
Contributor Author

danhhz commented Mar 28, 2019

The fixtures import is almost certainly slower because CSV generation is slower. After 19.1.0 goes out the door, david is going to merge #36250, which switches fixtures import to bypass the csv by default and this is definitely faster with this PR. So even if we can't get the csv performance back, this whole problem goes away anyway.

As for why it's slower, I've tried digging into before/after profiles. My best guess so far is it's because most workload.Generators have a batch size of 1, which is the worst case for seeing columnar overhead.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work! I have a bunch of comments, but in general I'm definitely eager to see this happen.

@@ -56,7 +66,10 @@ func NewMemBatch(types []types.T) Batch {
// NewMemBatchWithSize allocates a new in-memory Batch with the given column
// size. Use for operators that have a precisely-sized output batch.
func NewMemBatchWithSize(types []types.T, size int) Batch {
b := &memBatch{}
if max := math.MaxUint16; size > max {
panic(fmt.Sprintf(`batches cannot have length larger than %d; requested %d`, max, size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -29,6 +29,9 @@ type column interface{}
type Vec interface {
Nulls

// Type returns the type of datums stored in this Vec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we haven't used datums yet in this code - could you just say data, to avoid eventual terminology clash with tree.Datum?

// Allocate all the []interface{} row slices in one go.
datums := make([]interface{}, numRows*numCols)
for colIdx, col := range cb.ColVecs() {
switch col.Type() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, here, you should really have the type information passed in, rather than having to check the ColVec, in my opinion. ColBatchToRows should also take a []types.T.

}
table := workload.Table{
InitialRows: workload.BatchedTuples{
Batch: func(rowIdx int) [][]interface{} { return rows[rowIdx] },
FillBatch: func(batchIdx int, cb coldata.Batch, _ *bufalloc.ByteAllocator) {
*cb.(*coldata.MemBatch) = *batches[batchIdx].(*coldata.MemBatch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the reason you had to export MemBatch? This seems kind of unpleasant - could you maybe change the FillBatch interface to return a coldata.Batch instead of modifying one in place? Or does that have other problems?

// FillBatch is a function to deterministically compute a columnar-batch of
// tuples given its index.
//
// To save allocations, the `Vec`s in the passed `Batch` are reused when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it just me or do we not normally enclose types in backticks in comments? obviously take it or leave it

for i, datum := range row {
if datum == nil {
// WIP what do we do here
colTypes[i] = types.Bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we know the types up front somehow? I think the Tuples interface should return its column types as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also solve the allocations issue right?

}

// ColBatchToRows materializes the columnar data in a coldata.Batch into rows.
func ColBatchToRows(cb coldata.Batch) [][]interface{} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the materializer operator instead of this function? It does the same thing.

@danhhz danhhz force-pushed the workload_colbatch branch 2 times, most recently from 654d73e to 3a1a58b Compare March 30, 2019 00:16
@danhhz danhhz changed the title [WIP] workload: allow columnar data generation (alloc -94%, MB/s +54%) workload: allow columnar data generation (alloc -73%, MB/s +60%) Mar 30, 2019
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @jordanlewis)


pkg/sql/exec/coldata/vec.go, line 32 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: we haven't used datums yet in this code - could you just say data, to avoid eventual terminology clash with tree.Datum?

Done


pkg/workload/csv_test.go, line 93 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Is this the reason you had to export MemBatch? This seems kind of unpleasant - could you maybe change the FillBatch interface to return a coldata.Batch instead of modifying one in place? Or does that have other problems?

It has to take it in to get the memory reuse. Also returning it seems awkward to me, especially since (I think) we're both sort of convinced at this point that MemBatch is the new Batch


pkg/workload/workload.go, line 173 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: is it just me or do we not normally enclose types in backticks in comments? obviously take it or leave it

I certainly do it, though admittedly inconsistently, so there are definitely areas of the codebase that have them. There aren't any other examples in this file, so happy to revert


pkg/workload/workload.go, line 197 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

That would also solve the allocations issue right?

FWIW, this Tuples helper is specifically for making it easier to write a Generator when you don't need to eek out every last bit of performance, so the allocation really isn't a big deal. Added a comment since that info wasn't in the code.

As for the types, I agree that we want to plumb some of this information. It'll be super helpful, for example in workloadReader. But I like how easy it is currently to write a workload.Generator if you don't need the super performant version. Basically, there are some design decisions here that I'd like to consider carefully before committing to something new. How do you feel about separating this question out and focusing a followup on how workload type info gets plumbed?


pkg/workload/workload.go, line 245 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Could you use the materializer operator instead of this function? It does the same thing.

Definitely don't want that dep, plus this way we get to control the behavior for what workload needs without worrying about breaking distsql if we change something. Also this is pretty simple, so I'm comfortable with the duplication.


pkg/workload/workload.go, line 250 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Like, here, you should really have the type information passed in, rather than having to check the ColVec, in my opinion. ColBatchToRows should also take a []types.T.

This actually seems cleaner to me as is than if it got some types passed in that may not even match. I don't understand the objection, to be honest, can you expand on it?

@danhhz danhhz removed the do-not-merge bors won't merge a PR with this label. label Mar 30, 2019
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we're past the 19.1.0 push, so I'd like to get this merged. Pulled and resolved conflicts. Also resolved the one outstanding CR issue (assuming you're happy with it). Finally, added a test that runs fixtures import on all registered workloads, which caught a couple missing switch cases. PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @jordanlewis)


pkg/workload/workload.go, line 197 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

FWIW, this Tuples helper is specifically for making it easier to write a Generator when you don't need to eek out every last bit of performance, so the allocation really isn't a big deal. Added a comment since that info wasn't in the code.

As for the types, I agree that we want to plumb some of this information. It'll be super helpful, for example in workloadReader. But I like how easy it is currently to write a workload.Generator if you don't need the super performant version. Basically, there are some design decisions here that I'd like to consider carefully before committing to something new. How do you feel about separating this question out and focusing a followup on how workload type info gets plumbed?

I think I came up with a good solution here to both the nil and allocation problems. You can now declare the types if you're going to return a nil, but you don't have to (only two tables needed this, both in ledger). It was also pretty easy to use a sync.Once to do the col type sniffing, so now we don't pay the allocations.

@danhhz danhhz force-pushed the workload_colbatch branch 2 times, most recently from 4dafabf to ed4d390 Compare May 7, 2019 19:39
A workload.Generator previously could specifiy its initial table data
using one of two methods. One outputs a single row as `[]interface{}`
and the other is a batch of `[][]interface{}`. This commit switches the
latter to be a columar batch, allowing selected performance-critical
workload generators (tpcc, bank) to generate initial table data with
dramatically reduced allocations.

The single row option still exists and will continue to be used by most
workload.Generators, as it's much simpler. An adaptor from the columnar
batch to the old row oriented `[][]interface{}` batch is included for
ease of use in non-performance sensitive consumers of initial table
data. IMPORT, on the other hand, is switched to using the columnar
batches directly.

Some of the savings here is reusing batches, but the majority comes from
`[]interface{}` requiring that everything assinged to it be on the heap,
while something like `[]int` doesn't. We also get some amount of speedup
in initial table data consumers from fewer type switches.

This could also be accomplished without being columnar, but doing the
minimum non-columnar thing to get the same results will require
reinventing most of the work that the columnar does. Plus it will almost
certainly be convenient for columnar exec benchmarking to have a common
format.

Benchmark results:

    name                               old time/op    new time/op    delta
    InitialData/tpcc/warehouses=1-8       552ms ± 1%     454ms ± 0%  -17.81%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8          461µs ± 0%     332µs ± 1%  -28.10%  (p=0.008 n=5+5)
    WriteCSVRows-8                       15.0µs ± 1%    16.8µs ± 1%  +12.44%  (p=0.008 n=5+5)
    CSVRowsReader-8                      17.8µs ± 1%    19.3µs ± 1%   +8.41%  (p=0.008 n=5+5)

    name                               old speed      new speed      delta
    InitialData/tpcc/warehouses=1-8     128MB/s ± 1%   209MB/s ± 0%  +63.22%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8        223MB/s ± 0%   350MB/s ± 1%  +56.83%  (p=0.008 n=5+5)
    WriteCSVRows-8                      118MB/s ± 2%   107MB/s ± 3%   -9.43%  (p=0.008 n=5+5)
    CSVRowsReader-8                     101MB/s ± 4%    92MB/s ± 5%   -8.31%  (p=0.016 n=5+5)

    name                               old alloc/op   new alloc/op   delta
    InitialData/tpcc/warehouses=1-8       246MB ± 0%      75MB ± 0%  -69.61%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8          232kB ± 0%      19kB ± 0%  -91.75%  (p=0.008 n=5+5)
    WriteCSVRows-8                       5.60kB ± 0%    5.70kB ± 0%   +1.71%  (p=0.016 n=4+5)
    CSVRowsReader-8                      7.35kB ± 1%    7.39kB ± 0%     ~     (p=0.667 n=5+4)

    name                               old allocs/op  new allocs/op  delta
    InitialData/tpcc/warehouses=1-8       5.60M ± 0%     1.48M ± 0%  -73.56%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8          6.00k ± 0%     1.02k ± 0%  -83.01%  (p=0.008 n=5+5)
    WriteCSVRows-8                         48.0 ± 0%      50.0 ± 0%   +4.17%  (p=0.008 n=5+5)
    CSVRowsReader-8                        54.0 ± 0%      55.0 ± 0%   +1.85%  (p=0.008 n=5+5)

We could probably speed the CSV stuff back up with some followup work,
but (a) the overall `fixtures import` benchmark is still faster and (b)
we're about to switch to dt's import magic that skips the CSV roundtrip.

Current `fixtures import`

    name                               old time/op    new time/op    delta
    ImportFixture/tpcc/warehouses=1-8     3.35s ± 2%     3.29s ± 3%     ~     (p=0.310 n=5+5)

    name                               old speed      new speed      delta
    ImportFixture/tpcc/warehouses=1-8  26.4MB/s ± 2%  26.9MB/s ± 3%     ~     (p=0.310 n=5+5)

    name                               old alloc/op   new alloc/op   delta
    ImportFixture/tpcc/warehouses=1-8    3.09GB ± 0%    2.92GB ± 0%   -5.60%  (p=0.008 n=5+5)

    name                               old allocs/op  new allocs/op  delta
    ImportFixture/tpcc/warehouses=1-8     32.1M ± 0%     28.0M ± 0%  -12.86%  (p=0.008 n=5+5)

Magic `fixtures import` that skips CSV roundtrip:

    name                               old time/op    new time/op    delta
    ImportFixture/tpcc/warehouses=1-8     2.88s ± 3%     2.86s ± 4%     ~     (p=0.548 n=5+5)

    name                               old speed      new speed      delta
    ImportFixture/tpcc/warehouses=1-8  30.7MB/s ± 3%  30.9MB/s ± 4%     ~     (p=0.548 n=5+5)

    name                               old alloc/op   new alloc/op   delta
    ImportFixture/tpcc/warehouses=1-8    2.87GB ± 0%    2.73GB ± 0%   -4.97%  (p=0.008 n=5+5)

    name                               old allocs/op  new allocs/op  delta
    ImportFixture/tpcc/warehouses=1-8     26.1M ± 0%     21.5M ± 0%  -17.75%  (p=0.008 n=5+5)

Touches cockroachdb#34809

Release note: None
Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Super pumped about this! You don't get a 90% allocation reduction every day. Thanks for pushing to get it in.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

@danhhz
Copy link
Contributor Author

danhhz commented May 7, 2019

Thanks for all the reviews!

bors r=jordanlewis

@craig
Copy link
Contributor

craig bot commented May 7, 2019

Build failed

@danhhz
Copy link
Contributor Author

danhhz commented May 7, 2019

TestStoreRangeMergeRaftSnapshot was just modified by #37350 so I'm inclined to guess flake. Dunno what the story is with that TestEncodeBitArray panic but I doubt it's this PR.

bors r=jordanlewis

craig bot pushed a commit that referenced this pull request May 7, 2019
35349: workload: allow columnar data generation (alloc -73%, MB/s +60%) r=jordanlewis a=danhhz

A workload.Generator previously could specifiy its initial table data
using one of two methods. One outputs a single row as `[]interface{}`
and the other is a batch of `[][]interface{}`. This commit switches the
latter to be a columar batch, allowing selected performance-critical
workload generators (tpcc, bank) to generate initial table data with
dramatically reduced allocations.

The single row option still exists and will continue to be used by most
workload.Generators, as it's much simpler. An adaptor from the columnar
batch to the old row oriented `[][]interface{}` batch is included for
ease of use in non-performance sensitive consumers of initial table
data. IMPORT, on the other hand, is switched to using the columnar
batches directly.

Some of the savings here is reusing batches, but the majority comes from
`[]interface{}` requiring that everything assinged to it be on the heap,
while something like `[]int` doesn't. We also get some amount of speedup
in initial table data consumers from fewer type switches.

This could also be accomplished without being columnar, but doing the
minimum non-columnar thing to get the same results will require
reinventing most of the work that the columnar does. Plus it will almost
certainly be convenient for columnar exec benchmarking to have a common
format.

Benchmark results:

    name                               old time/op    new time/op    delta
    InitialData/tpcc/warehouses=1-8       552ms ± 1%     454ms ± 0%  -17.81%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8          461µs ± 0%     332µs ± 1%  -28.10%  (p=0.008 n=5+5)
    WriteCSVRows-8                       15.0µs ± 1%    16.8µs ± 1%  +12.44%  (p=0.008 n=5+5)
    CSVRowsReader-8                      17.8µs ± 1%    19.3µs ± 1%   +8.41%  (p=0.008 n=5+5)

    name                               old speed      new speed      delta
    InitialData/tpcc/warehouses=1-8     128MB/s ± 1%   209MB/s ± 0%  +63.22%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8        223MB/s ± 0%   350MB/s ± 1%  +56.83%  (p=0.008 n=5+5)
    WriteCSVRows-8                      118MB/s ± 2%   107MB/s ± 3%   -9.43%  (p=0.008 n=5+5)
    CSVRowsReader-8                     101MB/s ± 4%    92MB/s ± 5%   -8.31%  (p=0.016 n=5+5)

    name                               old alloc/op   new alloc/op   delta
    InitialData/tpcc/warehouses=1-8       246MB ± 0%      75MB ± 0%  -69.61%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8          232kB ± 0%      19kB ± 0%  -91.75%  (p=0.008 n=5+5)
    WriteCSVRows-8                       5.60kB ± 0%    5.70kB ± 0%   +1.71%  (p=0.016 n=4+5)
    CSVRowsReader-8                      7.35kB ± 1%    7.39kB ± 0%     ~     (p=0.667 n=5+4)

    name                               old allocs/op  new allocs/op  delta
    InitialData/tpcc/warehouses=1-8       5.60M ± 0%     1.48M ± 0%  -73.56%  (p=0.008 n=5+5)
    InitialData/bank/rows=1000-8          6.00k ± 0%     1.02k ± 0%  -83.01%  (p=0.008 n=5+5)
    WriteCSVRows-8                         48.0 ± 0%      50.0 ± 0%   +4.17%  (p=0.008 n=5+5)
    CSVRowsReader-8                        54.0 ± 0%      55.0 ± 0%   +1.85%  (p=0.008 n=5+5)

We could probably speed the CSV stuff back up with some followup work,
but (a) the overall `fixtures import` benchmark is still faster and (b)
we're about to switch to dt's import magic that skips the CSV roundtrip.

Current `fixtures import`

    name                               old time/op    new time/op    delta
    ImportFixture/tpcc/warehouses=1-8     3.35s ± 2%     3.29s ± 3%     ~     (p=0.310 n=5+5)

    name                               old speed      new speed      delta
    ImportFixture/tpcc/warehouses=1-8  26.4MB/s ± 2%  26.9MB/s ± 3%     ~     (p=0.310 n=5+5)

    name                               old alloc/op   new alloc/op   delta
    ImportFixture/tpcc/warehouses=1-8    3.09GB ± 0%    2.92GB ± 0%   -5.60%  (p=0.008 n=5+5)

    name                               old allocs/op  new allocs/op  delta
    ImportFixture/tpcc/warehouses=1-8     32.1M ± 0%     28.0M ± 0%  -12.86%  (p=0.008 n=5+5)

Magic `fixtures import` that skips CSV roundtrip:

    name                               old time/op    new time/op    delta
    ImportFixture/tpcc/warehouses=1-8     2.88s ± 3%     2.86s ± 4%     ~     (p=0.548 n=5+5)

    name                               old speed      new speed      delta
    ImportFixture/tpcc/warehouses=1-8  30.7MB/s ± 3%  30.9MB/s ± 4%     ~     (p=0.548 n=5+5)

    name                               old alloc/op   new alloc/op   delta
    ImportFixture/tpcc/warehouses=1-8    2.87GB ± 0%    2.73GB ± 0%   -4.97%  (p=0.008 n=5+5)

    name                               old allocs/op  new allocs/op  delta
    ImportFixture/tpcc/warehouses=1-8     26.1M ± 0%     21.5M ± 0%  -17.75%  (p=0.008 n=5+5)

Touches #34809

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@craig
Copy link
Contributor

craig bot commented May 7, 2019

Build succeeded

@craig craig bot merged commit b963c0d into cockroachdb:master May 7, 2019
@danhhz danhhz deleted the workload_colbatch branch May 7, 2019 23:56
@danhhz danhhz restored the workload_colbatch branch May 8, 2019 00:52
danhhz added a commit to danhhz/cockroach that referenced this pull request May 8, 2019
workload's Table schemas are SQL schemas, but cockroachdb#35349 switched the
initial data to be returned as a coldata.Batch, which has a more limited
set of types. (Or, in the case of simple workloads that return a
[]interface{}, it's roundtripped through coldata.Batch by the `Tuples`
helper.) Notably, this means a SQL STRING column is represented the same
as a BYTES column (ditto UUID, etc).

This caused a regression in splits, which received some []byte data for
a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't
work for the UUID column in tpcc's history table nor the VARCHAR in
ycsb's usertable. Happily, a STRING works for both of these. It also
seems to work for BYTES columns, so it seems like the ambiguity is fine
in this case. When/if someone wants to add a workload that splits a
BYTES primary key column containing non-utf8 data, we'll may need to
revisit.

A more principled fix would be to get the fidelity back by parsing the
SQL schema, which in fact we do in `importccl.makeDatumFromColOffset`.
However, at the moment, this hack works and avoids the complexity and
the undesirable pkg/sql/parser dep.

Closes cockroachdb#37383
Closes cockroachdb#37382
Closes cockroachdb#37381
Closes cockroachdb#37380
Closes cockroachdb#37379
Closes cockroachdb#37378
Closes cockroachdb#37377
Closes cockroachdb#37393

Release note: None
craig bot pushed a commit that referenced this pull request May 8, 2019
37401: workload: fix --splits regression introduced in #35349 r=tbg a=danhhz

workload's Table schemas are SQL schemas, but #35349 switched the
initial data to be returned as a coldata.Batch, which has a more limited
set of types. (Or, in the case of simple workloads that return a
[]interface{}, it's roundtripped through coldata.Batch by the `Tuples`
helper.) Notably, this means a SQL STRING column is represented the same
as a BYTES column (ditto UUID, etc).

This caused a regression in splits, which received some []byte data for
a column tried to hand it to SPLIT as a SQL BYTES datum. This didn't
work for the UUID column in tpcc's history table nor the VARCHAR in
ycsb's usertable. Happily, a STRING works for both of these. It also
seems to work for BYTES columns, so it seems like the ambiguity is fine
in this case. When/if someone wants to add a workload that splits a
BYTES primary key column containing non-utf8 data, we'll may need to
revisit.

A more principled fix would be to get the fidelity back by parsing the
SQL schema, which in fact we do in `importccl.makeDatumFromColOffset`.
However, at the moment, this hack works and avoids the complexity and
the undesirable pkg/sql/parser dep.

Closes #37383
Closes #37382
Closes #37381
Closes #37380
Closes #37379
Closes #37378
Closes #37377
Closes #37393

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@danhhz danhhz deleted the workload_colbatch branch April 3, 2020 17:05
rytaft added a commit to rytaft/cockroach that referenced this pull request Sep 23, 2020
…rkload

This commit adds back the injection of statistics for some tables at the
start of the TPC-C workload that had been removed by mistake in cockroachdb#35349.
The removal caused a regression in TPC-C since it caused some plans to change
due to the lack of statistics at the beginning of the benchmark. Adding the
stats back should fix the regression.

Fixes cockroachdb#54702

Release note: None
craig bot pushed a commit that referenced this pull request Sep 23, 2020
54713: workload: add back injection of stats for TPC-C tables at start of workload r=rytaft a=rytaft

This commit adds back the injection of statistics for some tables at the
start of the TPC-C workload that had been removed by mistake in #35349.
The removal caused a regression in TPC-C since it caused some plans to change
due to the lack of statistics at the beginning of the benchmark. Adding the
stats back should fix the regression.

Fixes #54702

Release note: None

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
rytaft added a commit to rytaft/cockroach that referenced this pull request Sep 23, 2020
…rkload

This commit adds back the injection of statistics for some tables at the
start of the TPC-C workload that had been removed by mistake in cockroachdb#35349.
The removal caused a regression in TPC-C since it caused some plans to change
due to the lack of statistics at the beginning of the benchmark. Adding the
stats back should fix the regression.

Fixes cockroachdb#54702

Release note: None
rytaft added a commit to rytaft/cockroach that referenced this pull request Sep 23, 2020
…rkload

This commit adds back the injection of statistics for some tables at the
start of the TPC-C workload that had been removed by mistake in cockroachdb#35349.
The removal caused a regression in TPC-C since it caused some plans to change
due to the lack of statistics at the beginning of the benchmark. Adding the
stats back should fix the regression.

Fixes cockroachdb#54702

Release note: None
rytaft added a commit to rytaft/cockroach that referenced this pull request Sep 23, 2020
…rkload

This commit adds back the injection of statistics for some tables at the
start of the TPC-C workload that had been removed by mistake in cockroachdb#35349.
The removal caused a regression in TPC-C since it caused some plans to change
due to the lack of statistics at the beginning of the benchmark. Adding the
stats back should fix the regression.

Fixes cockroachdb#54702

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants