Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for "insert many" operations #591

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brandur
Copy link
Contributor

@brandur brandur commented Sep 15, 2024

A couple quick benchmarks for the two "insert many" variants to help vet
whether or not we should keep the "fast" route.

There is definitely a speed difference, with the fast variant being
about 25% faster as it can skip returning rows:

go test -bench=BenchmarkClient
goos: darwin
goarch: arm64
pkg: github.com/riverqueue/river
cpu: Apple M1
BenchmarkClient/JobInsertMany-8                      312           4156647 ns/op
BenchmarkClient/JobInsertManyFast-8                  417           3054985 ns/op
PASS
ok      github.com/riverqueue/river     15.709s

Can't 100% decide where that leaves us. There's definitely a speed
advantage, but it definitely congests the API to a degree, and having
more functions make implementing features like middleware trickier.

A couple quick benchmarks for the two "insert many" variants to help vet
whether or not we should keep the "fast" route.

There is definitely a speed difference, with the fast variant being
about 25% faster as it can skip returning rows:

    go test -bench=BenchmarkClient
    goos: darwin
    goarch: arm64
    pkg: github.com/riverqueue/river
    cpu: Apple M1
    BenchmarkClient/JobInsertMany-8                      312           4156647 ns/op
    BenchmarkClient/JobInsertManyFast-8                  417           3054985 ns/op
    PASS
    ok      github.com/riverqueue/river     15.709s

Can't 100% decide where that leaves us. There's definitely a speed
advantage, but it definitely congests the API to a degree, and having
more functions make implementing features like middleware trickier.
@bgentry
Copy link
Contributor

bgentry commented Sep 18, 2024

Hmm, the copy variant will never be able to do things like handle conflicts on unique inserts. I don’t think it’s appropriate for a primary API for that reason. But also the cost of keeping it around for specific high volume use cases that need fewer features isn’t that high IMO, so I’d say we try to do that.

@brandur
Copy link
Contributor Author

brandur commented Sep 18, 2024

Yeah, having an optimized path can't hurt, but I have to say the API still feels wrong to me in that we're continuing to add more dimensions of modifiers to insertion. So we have N (insert + insert many) * M (non-tx + tx) * O (non-fast + fast) functions. If we end up adding one more hypothetical P dimension down the line, we're going to leave ourselves with a real mess. I wonder if there could be a "no return" flag passed as a function option somewhere that'd just have the normal "insert many" that'd use the fast path, and just have the function return an empty set of insert results?

@bgentry
Copy link
Contributor

bgentry commented Sep 18, 2024

I wonder if there could be a "no return" flag passed as a function option somewhere that'd just have the normal "insert many" that'd use the fast path, and just have the function return an empty set of insert results?

The thing is it's not just about the return values. The nature of COPY FROM means it's fairly limited in terms of being able to integrate with some other features, such as relying on ON CONFLICT for uniqueness constraint violations. I'm not sure what other cases we will run into, but the general lesson seems to be that COPY FROM is probably not going to be a viable long-term option for most usage if we want it to have access to the full featureset.

I do agree we have to be super careful about the road we're going down with maintainability and different code paths, which is why I'm so keen on unifying the main insert code paths and even using a single bulk insert query for individual inserts. That means Insert/InsertTx/InsertMany/InsertManyTx, all behave the same, all have the same features, and all use the same underlying code. So it's really just the "fast" / COPY FROM variant that ends up being the oddball.

Maybe it's not even worth carrying the maintenance burden of that, it's definitely hard to say. But for now I feel ok with keeping it around as the odd man out while we work on unifying the code paths and feature set, and maybe once that's all in a good place we can revisit? LMK if that plan sounds ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants