Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Bulk Inserts in Facade Collection Instead of Single Inserts #2546

Merged
merged 17 commits into from
Oct 20, 2023

Conversation

IsaacMilarky
Copy link
Contributor

@IsaacMilarky IsaacMilarky commented Oct 10, 2023

Description

  • Change trim_commits phase of facade to trim commits in bulk instead of one at a time. Before, commits were deleted one at a time.
  • Change analyze_commits_in_parallel phase of facade to insert commits in bulk instead of one at a time. Before, more database transactions were used than desirable.
  • Implement a stagger insert that inserts a block of facade records when the amount of objects in memory exceeds 10000.

Signed commits

  • Yes, I signed my commits.

…t a time

Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IsaacMilarky : Have we tested this with the 7 repo starting set to ensure that this branch generates commit data identical to the current method, and also clears out the working_commits table?

@@ -125,16 +125,9 @@ def update_analysis_log(repos_id,status):

# If there's a commit still there, the previous run was interrupted and
# the commit data may be incomplete. It should be trimmed, just in case.
for commit in working_commits:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here appears to be simply a refactoring of how working commits are trimmed. Basically putting it into a method. Is that right, @IsaacMilarky ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly this also uses a bulk operation instead of sending a query for every commit to trim.

augur/tasks/git/facade_tasks.py Show resolved Hide resolved
@IsaacMilarky
Copy link
Contributor Author

@IsaacMilarky : Have we tested this with the 7 repo starting set to ensure that this branch generates commit data identical to the current method, and also clears out the working_commits table?

I tested it with a single repo, I did not test it with all 7 starting repos. I will check the 7 starting repos after merging dev

Copy link
Contributor

@ABrain7710 ABrain7710 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found few issues. In addition, I think we should have a fall back method when a bulk insert fails that simply inserts all 10000 commits one at a time so that one bad row doesn't cause us to miss 10000 commits. Alternatively anytime an error occurs we could have it try to insert half the data each time and then and then recursively insert half until the bad data row is the only thing left (this would be more efficient but more complex)

augur/tasks/git/facade_tasks.py Outdated Show resolved Hide resolved
augur/tasks/git/facade_tasks.py Outdated Show resolved Hide resolved
augur/tasks/git/facade_tasks.py Outdated Show resolved Hide resolved
augur/tasks/git/facade_tasks.py Outdated Show resolved Hide resolved
IsaacMilarky and others added 7 commits October 17, 2023 14:38
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Copy link
Contributor

@ABrain7710 ABrain7710 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. There is one small indentation issue and some thoughts on a way to make a method potentially more clear

augur/tasks/git/facade_tasks.py Outdated Show resolved Hide resolved
@@ -205,3 +215,55 @@ def update_facade_scheduling_fields(session, repo_git, weight, commit_count):

session.execute(update_query)
session.commit()

def facade_bulk_insert_commits(session,records):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method might be cleaner if it took in the list of records and then tried to insert them. And if an exception occurs and the insert was on more than one row then we log the multiple record error message, and split in half then try again. But if an exception occurs and the number or records is 1 then we check if it a DataError exception and do all the logic and error logging for a single record insert. Put simply it moves the if len(records) == 1 into the exception block to determine what to log or how to react

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this change. I think it makes it slightly more readable

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
@IsaacMilarky IsaacMilarky merged commit 1962a81 into dev Oct 20, 2023
@IsaacMilarky IsaacMilarky deleted the bulk-facade branch October 20, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants