Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery Agent Merging #3838

Merged
merged 30 commits into from
Jul 26, 2023
Merged

Celery Agent Merging #3838

merged 30 commits into from
Jul 26, 2023

Conversation

acwhite211
Copy link
Member

Allow for the agent merging process to run in the background as a celery task in order to handle long running record merges.

@acwhite211 acwhite211 self-assigned this Jul 25, 2023
@acwhite211 acwhite211 changed the base branch from production to v7.9-dev July 25, 2023 05:22
@acwhite211
Copy link
Member Author

Still making some fixes, but this is the general structure for creating agent merging as a celery task. I kept the same API call, but with the option of bg to run as a background celery task, which will have a default value of true. I also added a check to make sure only one celery merge task can run at a time.

@acwhite211
Copy link
Member Author

There is a new status API call to check on the state of the merge process, this is modeled after how the workbench upload status works. I have also incorporated the messaging and notifications system in the record merge celery task, so a notification will be sent to the user when a new task starts and finishes.

@CarolineDenis CarolineDenis linked an issue Jul 25, 2023 that may be closed by this pull request
@acwhite211
Copy link
Member Author

In order to avoid django migration errors, I had to move the Spmerging model into a seperate file, and the migration file somewhere besides the specify folder. This will be a problem for any new model added to the specify folder until we get rid of the runtime model creation generated from specify6.

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This is what the notification looks like on the front-end now:

{
  "messageId": 312,
  "read": false,
  "timestamp": "2023-07-26T02:04:07.169078",
  "type": "record-merge-pending",
  "payload": {
    "name": "Merge_agent_19108",
    "task_id": "fecd123b-6126-4581-8c39-0ad79c1709c6"
  }
}

We need to implement some better visualization in the UI

Just the beginning of my review– looking closer at the process now. The code looks good to me and my AI.

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major issue:

See the notifications_message table:

id timestampcreated content user_id read
313 2023-07-26 02:06:16.454 {"type": "record-merge-pending", "name": "Merge_agent_18003", "task_id": "267fdc45-3d52-4346-a869-1e69b20de5c2"} 16 0
312 2023-07-26 02:04:07.169 {"type": "record-merge-pending", "name": "Merge_agent_19108", "task_id": "fecd123b-6126-4581-8c39-0ad79c1709c6"} 16 1

and see the spmerging table:

id name taskid mergingstatus timestampcreated timestampmodified createdbyagent_id modifiedbyagent_id
1 Merge_agent_19108 fecd123b-6126-4581-8c39-0ad79c1709c6 FAILED 2023-07-26 02:04:07.152 2023-07-26 02:06:16.437 26142 26142
2 Merge_agent_18003 267fdc45-3d52-4346-a869-1e69b20de5c2 MERGING 2023-07-26 02:06:16.440 2023-07-26 02:06:16.450 26142 26142

Even if the merge process fails, the status is not reported to the user via a notification. The notifications only indicate that the merge is pending, but never on whether or not it has completed.

It would be very nice to have the notification upon failure have a link to download or view the exception formatted in a user-friendly way.

@grantfitzsimmons grantfitzsimmons added this to the 7.9 milestone Jul 26, 2023
specifyweb/specify/views.py Outdated Show resolved Hide resolved
specifyweb/specify/views.py Outdated Show resolved Hide resolved
specifyweb/specify/views.py Outdated Show resolved Hide resolved
@acwhite211
Copy link
Member Author

Hey @grantfitzsimmons , I added a try catch block around the record merging function, so any otherwise uncaught errors will trigger a notification being sent.

@acwhite211
Copy link
Member Author

Anyone know how to run django migration commands on GitHub's instance of the project. I think the reason the test isn't succeeding is because there was a migration change while developing the code, so GitHub is saying that it can't run the new migration because the 'Spmerging' table already exists from the earlier migration that no longer exists. The tests run fine for me locally.

@acwhite211 acwhite211 marked this pull request as ready for review July 26, 2023 10:56
@realVinayak
Copy link
Collaborator

I think we host it on digital ocean.

Regardless, can you make another migration to delete that able, and then another one to add the changed table? We can just keep the latest migration for the actual code. Will that work?

@grantfitzsimmons
Copy link
Member

@acwhite211

I believe we need to add an additional column to this table with the ID of the newly merged record so that we can create a link in the notification to go and visit the merged record:

id name taskid mergingstatus timestampcreated timestampmodified createdbyagent_id modifiedbyagent_id
1 Merge_agent_19108 fecd123b-6126-4581-8c39-0ad79c1709c6 FAILED 2023-07-26 02:04:07.152 2023-07-26 02:06:16.437 26142 26142
2 Merge_agent_18003 267fdc45-3d52-4346-a869-1e69b20de5c2 MERGING 2023-07-26 02:06:16.440 2023-07-26 02:06:16.450 26142 26142

I think we could use both the table name (or ID) and the record ID to construct a link to the new record in the notification.

@acwhite211 acwhite211 added python Pull requests that update Python code 2 - API Issues that are related to the APIs labels Jul 26, 2023
@acwhite211 acwhite211 merged commit 2eed710 into v7.9-dev Jul 26, 2023
9 checks passed
@acwhite211 acwhite211 deleted the celery_agent_merging branch July 26, 2023 21:16
@maxpatiiuk
Copy link
Member

maxpatiiuk commented Jul 27, 2023

Anyone know how to run django migration commands on GitHub's instance of the project. I think the reason the test isn't succeeding is because there was a migration change while developing the code, so GitHub is saying that it can't run the new migration because the 'Spmerging' table already exists from the earlier migration that no longer exists. The tests run fine for me locally.

@acwhite211
I think it creates a temporary database while the tests are running. pretty sure database is not reused between test runs (the tests do cache some files, but database is not one of them)

I don't remember if it creates the db from a .SQL file somewhere or if django creates it from the data model

I think we host it on digital ocean.

@realVinayak
github action tests have no relation to digital ocean - github provides it's own VMs free of charge for us to run tests on

@realVinayak
Copy link
Collaborator

realVinayak commented Jul 27, 2023

@maxpatiiuk Ah my bad. Thanks for clearing it up!

@acwhite211
Copy link
Member Author

acwhite211 commented Jul 27, 2023

Thanks @maxpatiiuk, I figured out that the problem came from django trying to put the 'spmerging' model in the synchronization process of all the models generated in the Specify django app. So when the django migration occurred for 'spmerging', the table already existed. For now I just moved spmerging into the notifications models.py file. Once I get all the specify6 generated models hard-coded into a models.py file, this will not be an issue when we add new models in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - API Issues that are related to the APIs python Pull requests that update Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Move the merging process to the worker
4 participants