Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sync #3312

Merged
merged 68 commits into from
Dec 19, 2024
Merged

Refactor sync #3312

merged 68 commits into from
Dec 19, 2024

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Sep 2, 2024

Fixes #2057
Fixes #2078
Closes #2083 -- source changes are now sync'd to targets eagerly
Fixes #2087 -- git file moves are caught, but not copies
Closes #2129 -- refactoring the sync changes the performance characteristics completely
Fixes #2169
Fixes #2175
Fixes #2189
Fixes #2211
Fixes #2242
Closes #2285 -- not relevant after the refactor
Fixes #2641
Fixes #3302
Fixes #3449
Fixes #2154

This is effectively a rewrite of the sync_project() function that's currently here, and which ends up calling most of the code under pontoon/sync/.

The end results of the code here should be the same as currently, but the implementation is completely new, and does things in a different order.

Per-locale repositories are dropped here, as per #3303.

Explicitly left out of this PR but liable to change later:

  • What sync logging information is persisted in the database.
  • How files are read and written.
  • How localizable messages are represented in the database.
  • How aggregated stats are gathered.

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very high-level look, but this already looks very promising!

Currently, Pretranslation is tightly integrated with sync in order to minimize time between exposing new strings for localization and pretranslating them. What's your plan with this?

pontoon/base/models/changed_entity_locale.py Show resolved Hide resolved
pontoon/base/models/project.py Show resolved Hide resolved
pontoon/sync/sync_project.py Outdated Show resolved Hide resolved
pontoon/sync/sync_project.py Outdated Show resolved Hide resolved
pontoon/sync/checkouts.py Outdated Show resolved Hide resolved
pontoon/sync/sync_entities_from_repo.py Outdated Show resolved Hide resolved
pontoon/sync/sync_entities_from_repo.py Outdated Show resolved Hide resolved
pontoon/sync/sync_translations_to_repo.py Outdated Show resolved Hide resolved
@eemeli
Copy link
Member Author

eemeli commented Sep 4, 2024

Currently, Pretranslation is tightly integrated with sync in order to minimize time between exposing new strings for localization and pretranslating them. What's your plan with this?

Ah, I'd missed that! Yeah, that needs to happen the same as before.

requirements/default.txt Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 90.05266% with 170 lines in your changes missing coverage. Please review.

Project coverage is 79.20%. Comparing base (e3e7168) to head (ff2b00d).
Report is 9 commits behind head on main.

Additional details and impacted files

@eemeli
Copy link
Member Author

eemeli commented Oct 1, 2024

This is now at the dangerous stage of looking like it works. But some verification work still remains:

  • Add tests for the fixtures used by the old sync tests.
  • Check the sync results against the active pontoon.mozilla.org projects.

pontoon/base/views.py Outdated Show resolved Hide resolved
@flodolo
Copy link
Collaborator

flodolo commented Dec 12, 2024

Firefox times out for now.

It works for me locally, but it takes ~20s, so not surprised it times out on Heroku.

@eemeli
Copy link
Member Author

eemeli commented Dec 13, 2024

I managed to download AMO file (in 2nd attempt). I made a change to one of the strings and uploaded it. That stats went off.

After some investigation, the stats "went off" already during the preceding sync, not due to the upload.

This should now be fixed, by making the total_strings counts in aggregated stats depend on the locale's plural category count for gettext resources with plural messages.

Most of the work to effect that went into cleaning up the optimizations introduced in #1140, and making the tests it introduced work with more real-world usage patterns. The update_stats argument of Translation.save() is dropped, as it was only used to test a code path that is only ever called from said tests.

It would be a good idea to call ./manage.py calculate_stats once this lands, rather than waiting for each project's first sync to fix the stats.

@eemeli
Copy link
Member Author

eemeli commented Dec 13, 2024

@mathjazz I think I've now addressed all comments, PTAL?

@mathjazz
Copy link
Collaborator

  • The app crashes when I try to download a file from the Firefox project.

This still happens.

  • I managed to download AMO file (in 2nd attempt). I made a change to one of the strings and uploaded it. That stats went off.

This is fixed! 👍

I've downloaded translations from the pontoon-test-1 project and uploaded amo.po back and the changes I made locally were reflected correctly (incl. the stats). But: All fuzzy strings got rejected. I checked locally on master and it works as expected there.

@flodolo
Copy link
Collaborator

flodolo commented Dec 13, 2024

Sync is failing for me locally, while it's been working fine until yesterday (just updated the project, translated one string, and tried a sync)

[INFO:pontoon.sync.core] 2024-12-13 16:35:22,941 [firefox-local] Sync start
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:22,945 Git: Updating repo https://github.com/mozilla-l10n/firefox-l10n-source.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:22,952 Git: Repo updated.
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:22,955 Git: Branch update checked out.
[INFO:pontoon.sync.core.checkout] 2024-12-13 16:35:22,957 [firefox-local] Repo at 7fd0da7a2
[DEBUG:pontoon.sync.core.checkout] 2024-12-13 16:35:22,958 [firefox-local] source root: /app/media/projects/firefox-local/mozilla-l10n/firefox-l10n-source.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:22,958 Git: Updating repo git@github.com:mozilla-l10n/firefox-l10n.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:23,074 Git: Repo updated.
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:23,158 Git: Branch main checked out.
[INFO:pontoon.sync.core.checkout] 2024-12-13 16:35:23,159 [firefox-local] Repo updated from 89d1f8cda72 to dafff26cecf
[DEBUG:pontoon.sync.core.checkout] 2024-12-13 16:35:23,160 [firefox-local] target root: /app/media/projects/firefox-local/git@github.com:mozilla-l10n/firefox-l10n.git
[DEBUG:pontoon.sync.core.paths] 2024-12-13 16:35:36,935 [firefox-local] Paths(auto): ref_root=. base=../../git@github.com:mozilla-l10n/firefox-l10n.git
[INFO:pontoon.sync.core.translations_from_repo] 2024-12-13 16:35:36,939 [firefox-local] Reading changes from 3 changed target files
[DEBUG:pontoon.sync.core.translations_from_repo] 2024-12-13 16:35:36,939 [firefox-local] Scanning for translation updates...
[DEBUG:pontoon.sync.core.translations_from_repo] 2024-12-13 16:35:37,026 [firefox-local] Filtering matches from translations...
[INFO:pontoon.sync.core.translations_to_repo] 2024-12-13 16:35:37,033 [firefox-local] Updating 2 changed resources
[INFO:pontoon.sync.core.translations_to_repo] 2024-12-13 16:35:37,033 [firefox-local:browser/browser/browserSets.ftl] Updating locales: it
[DEBUG:pontoon.sync.formats.ftl] 2024-12-13 16:35:37,056 Saving file: /app/media/projects/firefox-local/git@github.com:mozilla-l10n/firefox-l10n.git/it/browser/browser/browserSets.ftl
[INFO:pontoon.sync.core.translations_to_repo] 2024-12-13 16:35:37,057 [firefox-local:browser/browser/sidebar.ftl] Updating locales: it
[DEBUG:pontoon.sync.formats.ftl] 2024-12-13 16:35:37,085 Saving file: /app/media/projects/firefox-local/git@github.com:mozilla-l10n/firefox-l10n.git/it/browser/browser/sidebar.ftl
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 15:56:41,106 Git: Commit to repository.
[WARNING:pontoon.sync.core.translations_to_repo] 2024-12-13 15:56:42,570 [firefox-local] git commit failed: Remote contains work that you do not have locally. To github.com:mozilla-l10n/firefox-l10n.git
 ! [rejected]                main -> main (fetch first)
error: failed to push some refs to 'github.com:mozilla-l10n/firefox-l10n.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

[ERROR:pontoon.base.errors] 2024-12-13 15:56:42,580 Remote contains work that you do not have locally. To github.com:mozilla-l10n/firefox-l10n.git
 ! [rejected]                main -> main (fetch first)
error: failed to push some refs to 'github.com:mozilla-l10n/firefox-l10n.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/app/pontoon/sync/tasks.py", line 44, in sync_project_task
    sync_project(project, sync_log, pull=pull, commit=commit, force=force)
  File "/app/pontoon/sync/core/__init__.py", line 69, in sync_project
    sync_translations_to_repo(
  File "/app/pontoon/sync/core/translations_to_repo.py", line 82, in sync_translations_to_repo
    raise error
  File "/app/pontoon/sync/core/translations_to_repo.py", line 78, in sync_translations_to_repo
    repo.commit(co.path, commit_msg, commit_author, co.repo.branch, co.url)
  File "/app/pontoon/sync/repositories/git.py", line 93, in commit
    raise CommitToRepositoryException(error)
pontoon.sync.repositories.utils.CommitToRepositoryException: Remote contains work that you do not have locally. To github.com:mozilla-l10n/firefox-l10n.git
 ! [rejected]                main -> main (fetch first)
error: failed to push some refs to 'github.com:mozilla-l10n/firefox-l10n.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

@eemeli
Copy link
Member Author

eemeli commented Dec 16, 2024

@flodolo Based on the log and these lines in particular:

[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:22,958 Git: Updating repo git@github.com:mozilla-l10n/firefox-l10n.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:23,074 Git: Repo updated.
[DEBUG:pontoon.sync.repositories.git] 2024-12-13 16:35:23,158 Git: Branch main checked out.
[INFO:pontoon.sync.core.checkout] 2024-12-13 16:35:23,159 [firefox-local] Repo updated from 89d1f8cda72 to dafff26cecf

As the commit dafff26cecf isn't on GitHub, I'm pretty sure that you have a local config that explicitly sets the repo branch to main, rather than leaving that field empty, and that you had some local changes committed to the main branch before the sync. Those changes should've been dismissed, but some of the code is run in the wrong order, so a reset that should follow a checkout is run before it.

This is technically an unrelated bug to the sync refactor, but I've fixed it here.

@eemeli
Copy link
Member Author

eemeli commented Dec 16, 2024

The app crashes when I try to download a file from the Firefox project.

As a workaround, downloads are now handled by redirecting the request to the target repository. For GitHub & GitLab the target is the raw version of the current resource.

Because we currently only store the source path and not the target path in the database, this still requires an initial git clone or pull.

@flodolo
Copy link
Collaborator

flodolo commented Dec 17, 2024

Different problem

[INFO:pontoon.sync.core] 2024-12-17 07:09:57,777 [firefox-local] Sync start
[DEBUG:pontoon.sync.repositories.git] 2024-12-17 07:09:57,779 Git: Updating repo https://github.com/mozilla-l10n/firefox-l10n-source.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-17 07:09:58,381 Git: Repo updated.
[DEBUG:pontoon.sync.repositories.git] 2024-12-17 07:09:58,385 Git: Branch update checked out.
[INFO:pontoon.sync.core.checkout] 2024-12-17 07:09:58,393 [firefox-local] Repo updated from 444fda616 to eceb2cee5
[DEBUG:pontoon.sync.core.checkout] 2024-12-17 07:09:58,394 [firefox-local] source root: /app/media/projects/firefox-local/mozilla-l10n/firefox-l10n-source.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-17 07:09:58,394 Git: Updating repo git@github.com:mozilla-l10n/firefox-l10n.git
[DEBUG:pontoon.sync.repositories.git] 2024-12-17 07:10:00,069 Git: Repo updated.
[DEBUG:pontoon.sync.repositories.git] 2024-12-17 07:10:00,163 Git: Branch main checked out.

The source repo was correctly updated, but no new strings were exposed for localization (there are quite a few in this update).
mozilla-l10n/firefox-l10n-source@2da0d05
mozilla-l10n/firefox-l10n-source@eceb2ce

EDIT: even running manage.py sync_projects with --force didn't fix it. Doing it on main branch discovered the 5 strings.

@eemeli
Copy link
Member Author

eemeli commented Dec 17, 2024

The source repo was correctly updated, but no new strings were exposed for localization

After some investigation, this issue was identified as having been caused by first running a sync from this branch, then main, then this branch again. That sequence of actions leaves a templates/ subdirectory with a second clone of the source repo under media/projects/firefox/, which was being mis-detected as the reference root.

This problem will not arise once the PR is merged.

Copy link
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on stage one more time and didn't spot any issues.

Syncing a month worth of Firefox changes took much less time than with the current code, and I haven't seen a single instance of "Memory quota exceeded" message in the logs.

Well done!

Left some comments inline, but nothing major. We should file a few bugs.

Also, we need to update the README:
https://github.com/mozilla/pontoon/blob/main/pontoon/sync/README.md

docs/user/localizing-your-projects.rst Show resolved Hide resolved
@@ -390,6 +390,7 @@ def stats(self):
{
"title": "all-resources",
"resource__path": [],
# FIXME rename as total_strings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File a good first issue for this?

pontoon/sync/core/translations_from_repo.py Outdated Show resolved Hide resolved
pontoon/sync/core/translations_from_repo.py Outdated Show resolved Hide resolved
pontoon/sync/core/translations_to_repo.py Show resolved Hide resolved
pontoon/sync/utils.py Show resolved Hide resolved
@eemeli eemeli merged commit 42421cd into mozilla:main Dec 19, 2024
2 checks passed
@eemeli eemeli deleted the sync-refactor branch December 19, 2024 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment