-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[db] Garbage collection of analysis_info timeout #3775
Conversation
f466e6e
to
dcd324a
Compare
@@ -197,24 +197,37 @@ def upgrade_severity_levels(session_maker, checker_labels): | |||
|
|||
def remove_unused_analysis_info(session_maker): | |||
""" Remove unused analysis information from the database. """ | |||
# Analysis info deletion is a relatively slow operation due to database | |||
# cascades. Removing files in big chunks prevents reaching a potential |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you mean, but didn't you mean to say "smaller chunks"? (as opposed all in one go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
# cascades. Removing files in big chunks prevents reaching a potential | ||
# database statement timeout. This hard-coded value is a safe choice | ||
# according to some measurements. | ||
CHUNK_SIZE = 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this batch size of 500 will completely resolve all database timeout during cleanup? If not maybe we should fall back to smaller size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems to fix the timeout issue. Of course it depends on the database manager settings, but we are aiming 2min timeout threshold and this looks a proper choice for that. I agree that chunking is not an elegant solution, but we used such hard-coded values at other parts of the code too. In the long terms we should review our database schema and find some possibilities for optimizations.
The garbage collection of analysis_info table has been restructured because the original query exceeded a 2min timeout.
dcd324a
to
12535bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The garbage collection of analysis_info table has been restructured because the original query exceeded a 2min timeout.