-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: remove redundant indices from submission_defs table #805
enhance: remove redundant indices from submission_defs table #805
Conversation
Really great to see some throughput numbers and ideas for increasing it. A couple of high-level thoughts:
This was on the dev server, right? 2 vCPUs and 2GB RAM I believe the only read operation with performance requirements is data export. There are different variations:
I believe we know that these are typically RAM-bound.
As far as I can tell, there will have to be some amount of negative impact on the queries for all of these, right? In particular, they all include joins on How did you pick these specific indexes to target? Was it only based on the impact on submission processing time or also on some estimate of impact on exports (e.g. something like the high-level reasoning I went through above).
It would be helpful to get a quick narrative on how you think about these as well.
|
From @sadiqkhoja: there are still multicolumn indexes that cover the same cases as the removed ones. From postgres docs:
I did not know this. I can confirm the following indexes remain on
I believe that leaves submission attachments without index support because they use I verified with For submissions, with the proposed changes there would remain:
All of the removed ones are subsets of those. We only use equality conditions against any of those fields. According to the experimentation I did above and my understanding of the queries on |
I don't see |
I'm sorry, I meant client audits: https://github.com/getodk/central-backend/blob/master/lib/model/query/client-audits.js#L41 My understanding is that the |
ah! there will be significant impact definitely. Q: Do we really need to order by -- in test.getodk
SELECT count(1) FROM
(
SELECT id, "createdAt", ROW_NUMBER() OVER (ORDER BY id) rownum FROM submission_defs sd
) orderbyid
JOIN (
SELECT id, "createdAt", ROW_NUMBER() OVER (ORDER BY "createdAt") rownum FROM submission_defs sd
) orderbycreatedat ON orderbyid.rownum = orderbycreatedat.rownum
WHERE orderbyid.id != orderbycreatedat.id
-- output
-- 19692 I am going to create composite index on ( |
improves submission throughput by 20% without any impact on select queries Benchmark: 250_questions form with 20K existing submissions JMeter parameters: 300 thread 60 test duration 1500 per 10 sec target throughput release 40 threads in batch Result: 95/sec throughput with 4% error without this change 115/sec throughput without any error with this change
add (createdAt, id) index on submission_defs add (formId, createdAt, id) index on submissions
596a114
to
1a2eba8
Compare
That's interesting and intersects with discussions we've been having about entity updates through submissions, I think. Does this have to do with postgres transaction rules or something? Like is the id assigned at the beginning of the transaction and That said, maybe sorting by id still is ok. I don't know that the specific order matters, only that it's relatively stable. My understanding is that order is meaningful only because sometimes humans manually look at the audit export and it's helpful in that case to be able to relate that order to the submission table. The order of the individual log entries within a submission is the most important thing for that. The absolute ideal would be that it be sorted by date/time that the submissions were created on the client but we know that's literally impossible (because clients could be offline and who knows how their clocks are set). |
Values for |
Fascinating. What did you think of my reasoning for Do users need to run |
For client audit logs, sorting by just id is fine. But we are sorting submissions by We haven't put |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple questions. I'm not confident enough about changing the client audit sort so let's leave that as-is.
I'm satisfied by the answers to my latest questions and the analysis I wrote up above. I think it can be merged without another pair of eyes but also happy for you to loop in another reviewer if you still find this risky, @sadiqkhoja. |
Improves submission throughput by 20% without any impact on select queries.
What has been done to verify that this works as intended?
Performed load testing with jmeter.
Used 250_questions form. For every run, I kept only 20K submissions in the database (having 0 existing submission doesn't truly reflect impact of indices).
Parameters:
Threads: 300
Total test duration: 60s
Precise throughput target: 1500 per 10 sec. Release 40 threads in a batch
(I run tests with various parameters to find out what is the maximum throughput that can be achieved with no index on submission_defs table)
Result:
95/sec throughput with 4% failed submissions without this change
115/sec throughput without any failure with this change
Why is this the best possible solution? Were any other approaches considered?
I want this change to be reviewed under the microscope.
Thinking to delete
submissions_draft_index
,submissions_formid_index
andsubmissions_formid_instanceid_index
from submissions table as wellHow does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
We need to really make sure that there is no negative impact of this change on submission retrieval (odata, csv, zip)
Does this change require updates to the API documentation? If so, please update docs/api.md as part of this PR.
NA
Before submitting this PR, please make sure you have:
make test-full
and confirmed all checks still pass OR confirm CircleCI build passes