Add benchmarks for "insert many" operations #591

brandur · 2024-09-15T19:10:49Z

A couple quick benchmarks for the two "insert many" variants to help vet
whether or not we should keep the "fast" route.

There is definitely a speed difference, with the fast variant being
about 25% faster as it can skip returning rows:

go test -bench=BenchmarkClient
goos: darwin
goarch: arm64
pkg: github.com/riverqueue/river
cpu: Apple M1
BenchmarkClient/JobInsertMany-8                      312           4156647 ns/op
BenchmarkClient/JobInsertManyFast-8                  417           3054985 ns/op
PASS
ok      github.com/riverqueue/river     15.709s

Can't 100% decide where that leaves us. There's definitely a speed
advantage, but it definitely congests the API to a degree, and having
more functions make implementing features like middleware trickier.

A couple quick benchmarks for the two "insert many" variants to help vet whether or not we should keep the "fast" route. There is definitely a speed difference, with the fast variant being about 25% faster as it can skip returning rows: go test -bench=BenchmarkClient goos: darwin goarch: arm64 pkg: github.com/riverqueue/river cpu: Apple M1 BenchmarkClient/JobInsertMany-8 312 4156647 ns/op BenchmarkClient/JobInsertManyFast-8 417 3054985 ns/op PASS ok github.com/riverqueue/river 15.709s Can't 100% decide where that leaves us. There's definitely a speed advantage, but it definitely congests the API to a degree, and having more functions make implementing features like middleware trickier.

bgentry · 2024-09-18T03:04:21Z

Hmm, the copy variant will never be able to do things like handle conflicts on unique inserts. I don’t think it’s appropriate for a primary API for that reason. But also the cost of keeping it around for specific high volume use cases that need fewer features isn’t that high IMO, so I’d say we try to do that.

brandur · 2024-09-18T14:59:58Z

Yeah, having an optimized path can't hurt, but I have to say the API still feels wrong to me in that we're continuing to add more dimensions of modifiers to insertion. So we have N (insert + insert many) * M (non-tx + tx) * O (non-fast + fast) functions. If we end up adding one more hypothetical P dimension down the line, we're going to leave ourselves with a real mess. I wonder if there could be a "no return" flag passed as a function option somewhere that'd just have the normal "insert many" that'd use the fast path, and just have the function return an empty set of insert results?

bgentry · 2024-09-18T16:00:42Z

I wonder if there could be a "no return" flag passed as a function option somewhere that'd just have the normal "insert many" that'd use the fast path, and just have the function return an empty set of insert results?

The thing is it's not just about the return values. The nature of COPY FROM means it's fairly limited in terms of being able to integrate with some other features, such as relying on ON CONFLICT for uniqueness constraint violations. I'm not sure what other cases we will run into, but the general lesson seems to be that COPY FROM is probably not going to be a viable long-term option for most usage if we want it to have access to the full featureset.

I do agree we have to be super careful about the road we're going down with maintainability and different code paths, which is why I'm so keen on unifying the main insert code paths and even using a single bulk insert query for individual inserts. That means Insert/InsertTx/InsertMany/InsertManyTx, all behave the same, all have the same features, and all use the same underlying code. So it's really just the "fast" / COPY FROM variant that ends up being the oddball.

Maybe it's not even worth carrying the maintenance burden of that, it's definitely hard to say. But for now I feel ok with keeping it around as the odd man out while we work on unifying the code paths and feature set, and maybe once that's all in a good place we can revisit? LMK if that plan sounds ok.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for "insert many" operations #591

Add benchmarks for "insert many" operations #591

brandur commented Sep 15, 2024

bgentry commented Sep 18, 2024

brandur commented Sep 18, 2024

bgentry commented Sep 18, 2024

Add benchmarks for "insert many" operations #591

Are you sure you want to change the base?

Add benchmarks for "insert many" operations #591

Conversation

brandur commented Sep 15, 2024

bgentry commented Sep 18, 2024

brandur commented Sep 18, 2024

bgentry commented Sep 18, 2024