Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a data migration to copy all course index data into MySQL (Take 2) #29293

Conversation

bradenmacdonald
Copy link
Contributor

@bradenmacdonald bradenmacdonald commented Nov 10, 2021

Description

This PR is a repeat of https://github.com/edx/edx-platform/pull/29144 which had to be reverted.

This version has similar code but when encountering unexpected data, it will just log it and proceed rather than throwing an error. See this discussion.

If the error occurs again, the log output will look like this:

2021-11-10 03:19:06,653 ERROR 449 [common.djangoapps.split_modulestore_django.migrations.0002_data_migration] [user None] [ip None] 0002_data_migration.py:36 - Possible data issue found during data migration of course indexes from MongoDB to MySQL: 
Course course-v1:edX+DemoX+Demo_Course already exists in MySQL but the MongoDB version is newer. That's unexpected because since the course index table was added to MySQL, there has never been a time when we would write course_indexes updates only to MongoDB without also writing to MySQL. 
Mongo data: edited_on: 2021-11-03 22:23:48.989000+00:00, last_update: 2021-11-03 22:24:14.439000+00:00, published_version: 61830c0dca80abc00d1b233f
MySQL data: edited_on: 2021-10-01 11:38:42.219000+00:00, last_update: 2021-10-01 11:38:42.219000+00:00, published_version: bc00d1b233f61830c0dca80a
The MySQL version will be overwritten and the MongoDB version used.

Supporting information

See previous PRs.

Testing instructions

  1. Check out a master devstack. Make some changes to a course or two (maybe a library too).
  2. At http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ verify that any courses that you've modified are already copied into MySQL. Keep this tab open.
  3. Check out this PR and run LMS migrations
  4. Open http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ in a new tab, and compare to the prior version. Courses that were already present in MySQL should be unchanged, and all remaining courses from your devstack should now be listed there.

Deadline

None

@openedx-webhooks
Copy link

Thanks for the pull request, @bradenmacdonald! I've created OSPR-6215 to keep track of it in JIRA.

As a core committer in this repo, you can merge this once the pull request is approved per the core committer reviewer requirements and according to the agreement with your edX Champion.

@openedx-webhooks openedx-webhooks added core committer open-source-contribution PR author is not from Axim or 2U waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. labels Nov 10, 2021
@edx-status-bot
Copy link

Your PR has finished running tests. There were no failures.

@natabene
Copy link
Contributor

@bradenmacdonald Thank you for your contribution. Is this ready for our review?

@bradenmacdonald
Copy link
Contributor Author

@natabene Yes, thanks.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jristau1984, @connorhaugh, @kenclary: After this goes through the pipeline, please check the log for instances of the error message this migration puts out if it finds weird data inconsistencies. It's possible that the issues we ran into before were some artifact of how we deploy and rollback database changes in the stage environment. But it could also point to some deeper logic bug or weird data race condition.

@natabene natabene removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Nov 18, 2021
@ormsbee
Copy link
Contributor

ormsbee commented Nov 19, 2021

@natabene, @bradenmacdonald: FYI that the TNL team is aiming to merge/deploy this on Monday.

@connorhaugh
Copy link
Contributor

@ormsbee @bradenmacdonald merging this now, will report back on any errors and have a revert PR at the ready.

@connorhaugh connorhaugh merged commit b529967 into openedx:master Nov 22, 2021
@openedx-webhooks
Copy link

@jristau1984: thought you might like to know that bradenmacdonald merged this pull request.

@openedx-webhooks
Copy link

@bradenmacdonald 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@connorhaugh
Copy link
Contributor

connorhaugh commented Nov 22, 2021

@ormsbee @bradenmacdonald Reverted, but copying all stage logs here, so no worries of PII.

Nov 22 16:50:23 ip-10-3-11-178 [service_variant=cms][common.djangoapps.split_modulestore_django.migrations.0002_data_migration][env:stage-edx-edxapp] ERROR [ip-10-3-11-178 1642] [user None] [ip None] [0002_data_migration.py:36] - Possible data issue found during data migration of course indexes from MongoDB to MySQL:
Course course-v1:MITProfessionalX+Zero+Zero already exists in MySQL but the MongoDB version is newer. That's unexpected because since the course index table was added to MySQL, there has never been a time when we would write course_indexes updates only to MongoDB without also writing to MySQL.
Mongo data: edited_on: 2015-09-03 03:24:08.071000+00:00, last_update: 2015-09-03 03:24:08.100000+00:00, published_version: 55e7bd5802c8870b7e174773
MySQL data: edited_on: 2015-09-03 03:23:54.826000+00:00, last_update: 2015-09-03 03:23:54.852000+00:00, published_version: 55e7bd4a02c8870b89174767
The MySQL version will be overwritten and the MongoDB version used.

Course course-v1:MITProfessionalX+Zero+Zero already exists in MySQL but the MongoDB version is newer. That's unexpected because since the course index table was added to MySQL, there has never been a time when we would write course_indexes updates only to MongoDB without also writing to MySQL.
Mongo data: edited_on: 2015-09-03 03:24:08.071000+00:00, last_update: 2015-09-03 03:24:08.100000+00:00, published_version: 55e7bd5802c8870b7e174773
MySQL data: edited_on: 2015-09-03 03:23:54.826000+00:00, last_update: 2015-09-03 03:23:54.852000+00:00, published_version: 55e7bd4a02c8870b89174767
The MySQL version will be overwritten and the MongoDB version used.

Running migrations:

&1|16:50:25.667       Applying split_modulestore_django.0002_data_migration...
&1|16:50:25.667   succeeded: false
&1|16:50:25.667   succeeded_migrations: []
&1|16:50:25.667   traceback: |
&1|16:50:25.667     Traceback (most recent call last):
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/release_util/management/commands/__init__.py", line 280, in __apply
&1|16:50:25.668         call_command("migrate", **migrate_kwargs)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/management/__init__.py", line 181, in call_command
&1|16:50:25.668         return command.execute(*args, **defaults)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
&1|16:50:25.668         output = self.handle(*args, **options)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/management/base.py", line 89, in wrapped
&1|16:50:25.668         res = handle_func(*args, **kwargs)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/core/management/commands/migrate.py", line 244, in handle
&1|16:50:25.668         post_migrate_state = executor.migrate(
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/db/migrations/executor.py", line 117, in migrate
&1|16:50:25.668         state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
&1|16:50:25.668         state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/db/migrations/executor.py", line 227, in apply_migration
&1|16:50:25.668         state = migration.apply(state, schema_editor)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/db/migrations/migration.py", line 123, in apply
&1|16:50:25.668         operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
&1|16:50:25.668       File "/edx/app/edxapp/venvs/edxapp/lib/python3.8/site-packages/django/db/migrations/operations/special.py", line 190, in database_forwards
&1|16:50:25.668         self.code(from_state.apps, schema_editor)
&1|16:50:25.668       File "/edx/app/edxapp/edx-platform/common/djangoapps/split_modulestore_django/migrations/0002_data_migration.py", line 37, in forwards_func
&1|16:50:25.668         "Possible data issue found during data migration of course indexes from MongoDB to MySQL: \n"
&1|16:50:25.668     KeyError: 'published_version'
&1|16:50:25.668 

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@bradenmacdonald
Copy link
Contributor Author

Thanks for handling this @connorhaugh. I will review that log and see if I can figure out what's causing it. Perhaps I didn't account for courses that were never published at all, or something like that?

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@bradenmacdonald
Copy link
Contributor Author

@connorhaugh @ormsbee OK I see now that the course in the error log is one of those courses where there are actually two courses on stage, whose course IDs differ only by case. I suspect that's the root cause of the slightly inconsistent data.

I will submit a new PR that doesn't also fail when there is no published_version for a course. Hopefully third time's the charm.

@bradenmacdonald bradenmacdonald deleted the braden/course-indexes-mysql-4-data-migration-take2 branch November 23, 2021 21:10
@ormsbee
Copy link
Contributor

ormsbee commented Nov 24, 2021

😞 I wonder how many collective Open edX developer weeks have been lost to case sensitivity issues around course keys. Thank you for keeping at this @bradenmacdonald.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants