Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[db] Garbage collection of analysis_info timeout #3775

Merged
merged 1 commit into from
Nov 8, 2022

Conversation

bruntib
Copy link
Contributor

@bruntib bruntib commented Oct 21, 2022

The garbage collection of analysis_info table has been restructured because the original query exceeded a 2min timeout.

@bruntib bruntib added enhancement 🌟 database 🗄️ Issues related to the database schema. bugfix 🔨 labels Oct 21, 2022
@bruntib bruntib added this to the release 6.21.0 milestone Oct 21, 2022
@bruntib bruntib requested a review from Szelethus October 21, 2022 15:47
@bruntib bruntib requested a review from vodorok as a code owner October 21, 2022 15:47
@bruntib bruntib force-pushed the garbage_timeout_analysis_info branch from f466e6e to dcd324a Compare October 25, 2022 14:37
@@ -197,24 +197,37 @@ def upgrade_severity_levels(session_maker, checker_labels):

def remove_unused_analysis_info(session_maker):
""" Remove unused analysis information from the database. """
# Analysis info deletion is a relatively slow operation due to database
# cascades. Removing files in big chunks prevents reaching a potential
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean, but didn't you mean to say "smaller chunks"? (as opposed all in one go)

Copy link
Contributor

@vodorok vodorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# cascades. Removing files in big chunks prevents reaching a potential
# database statement timeout. This hard-coded value is a safe choice
# according to some measurements.
CHUNK_SIZE = 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this batch size of 500 will completely resolve all database timeout during cleanup? If not maybe we should fall back to smaller size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems to fix the timeout issue. Of course it depends on the database manager settings, but we are aiming 2min timeout threshold and this looks a proper choice for that. I agree that chunking is not an elegant solution, but we used such hard-coded values at other parts of the code too. In the long terms we should review our database schema and find some possibilities for optimizations.

The garbage collection of analysis_info table has been restructured
because the original query exceeded a 2min timeout.
@bruntib bruntib force-pushed the garbage_timeout_analysis_info branch from dcd324a to 12535bf Compare November 8, 2022 14:55
@bruntib bruntib requested a review from Szelethus November 8, 2022 14:58
Copy link
Contributor

@vodorok vodorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dkrupp dkrupp merged commit d2ae811 into Ericsson:master Nov 8, 2022
@bruntib bruntib deleted the garbage_timeout_analysis_info branch November 9, 2022 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix 🔨 database 🗄️ Issues related to the database schema. enhancement 🌟
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants