-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Bulk Inserts in Facade Collection Instead of Single Inserts #2546
Conversation
…t a time Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IsaacMilarky : Have we tested this with the 7 repo starting set to ensure that this branch generates commit data identical to the current method, and also clears out the working_commits table?
@@ -125,16 +125,9 @@ def update_analysis_log(repos_id,status): | |||
|
|||
# If there's a commit still there, the previous run was interrupted and | |||
# the commit data may be incomplete. It should be trimmed, just in case. | |||
for commit in working_commits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happening here appears to be simply a refactoring of how working commits are trimmed. Basically putting it into a method. Is that right, @IsaacMilarky ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly this also uses a bulk operation instead of sending a query for every commit to trim.
I tested it with a single repo, I did not test it with all 7 starting repos. I will check the 7 starting repos after merging dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found few issues. In addition, I think we should have a fall back method when a bulk insert fails that simply inserts all 10000 commits one at a time so that one bad row doesn't cause us to miss 10000 commits. Alternatively anytime an error occurs we could have it try to insert half the data each time and then and then recursively insert half until the bad data row is the only thing left (this would be more efficient but more complex)
augur/tasks/git/util/facade_worker/facade_worker/analyzecommit.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
Signed-off-by: Isaac Milarsky <krabs@tilde.team>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. There is one small indentation issue and some thoughts on a way to make a method potentially more clear
@@ -205,3 +215,55 @@ def update_facade_scheduling_fields(session, repo_git, weight, commit_count): | |||
|
|||
session.execute(update_query) | |||
session.commit() | |||
|
|||
def facade_bulk_insert_commits(session,records): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method might be cleaner if it took in the list of records and then tried to insert them. And if an exception occurs and the insert was on more than one row then we log the multiple record error message, and split in half then try again. But if an exception occurs and the number or records is 1 then we check if it a DataError exception and do all the logic and error logging for a single record insert. Put simply it moves the if len(records) == 1 into the exception block to determine what to log or how to react
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this change. I think it makes it slightly more readable
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Description
Signed commits