Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also sync DB branches on push if necessary #28361

Merged
merged 13 commits into from
Dec 9, 2023

Conversation

lunny
Copy link
Member

@lunny lunny commented Dec 5, 2023

Fix #28056

This PR will check whether the repo has zero branch when pushing a branch. If that, it means this repository hasn't been synced.

The reason caused that is after user upgrade from v1.20 -> v1.21, he just push branches without visit the repository user interface. Because all repositories routers will check whether a branches sync is necessary but push has not such check.

For every repository, it has two states, synced or not synced. If there is zero branch for a repository, then it will be assumed as non-sync state. Otherwise, it's synced state. So if we think it's synced, we just need to update branch/insert new branch. Otherwise do a full sync. So that, for every push, there will be almost no extra load added. It's high performance than yours.

For the implementation, we in fact will try to update the branch first, if updated success with affect records > 0, then all are done. Because that means the branch has been in the database. If no record is affected, that means the branch does not exist in database. So there are two possibilities. One is this is a new branch, then we just need to insert the record. Another is the branches haven't been synced, then we need to sync all the branches into database.

…ly but not visit UI after upgrading from v1.20 -> v1.21
@lunny lunny added type/bug backport/v1.21 This PR should be backported to Gitea 1.21 labels Dec 5, 2023
@lunny lunny added this to the 1.22.0 milestone Dec 5, 2023
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 5, 2023
@pull-request-size pull-request-size bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 5, 2023
@pull-request-size pull-request-size bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 5, 2023
@github-actions github-actions bot added the modifies/api This PR adds API routes or modifies them label Dec 5, 2023
models/git/branch.go Outdated Show resolved Hide resolved
modules/repository/branch.go Outdated Show resolved Hide resolved
@lunny
Copy link
Member Author

lunny commented Dec 7, 2023

@wolfogre done.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Dec 7, 2023
@lunny lunny mentioned this pull request Dec 7, 2023
@delvh delvh changed the title Fix the possible branches sync failure when user push branches directly but not visit UI after upgrading from v1.20 -> v1.21 Also sync DB branches on push if necessary Dec 8, 2023
models/db/context.go Outdated Show resolved Hide resolved
services/repository/push.go Outdated Show resolved Hide resolved
Comment on lines 20 to 23
// UpdateBranch updates the branch information in the database. If the branch exist, it will update latest commit of this branch information
// If it doest not exist, insert a new record into database
func UpdateBranch(ctx context.Context, repoID, pusherID int64, branchName string, commit *git.Commit) error {
cnt, err := git_model.UpdateBranch(ctx, repoID, pusherID, branchName, commit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function is completely wrong here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I move the function to service layer and renamed function name to syncBranchToDB. Maybe it fixed your problem?

Comment on lines 31 to 45
// if user haven't visit UI but directly push to a branch after upgrading from 1.20 -> 1.21,
// we cannot simply insert the branch but need to check we have branches or not
hasBranch, err := db.Exist[git_model.Branch](ctx, git_model.FindBranchOptions{
RepoID: repoID,
IsDeletedBranch: util.OptionalBoolFalse,
}.ToConds())
if err != nil {
return err
}
if !hasBranch {
if _, err = SyncRepoBranches(ctx, repoID, pusherID); err != nil {
return fmt.Errorf("repo_module.SyncRepoBranches %d:%s failed: %v", repoID, branchName, err)
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of trouble understanding this precondition: I don't think this is the correct behavior we are doing here.
Shouldn't the workflow be:
Push -> Add all branches added by the push if they don't exist already?

Copy link
Member Author

@lunny lunny Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your idea is OK to sync. But it will compare all branches every time. It will become a performance bottleneck for big repositories.

My method is for every repository, it has two states, synced or not synced. If there is zero branch for a repository, then it's non-sync state. otherwise, it's synced state. So if we think it's synced, we just need to update branch/insert new branch. Otherwise do a full sync. So that, for every push, there will be almost no extra load added. It's high performance than yours.

@GiteaBot GiteaBot added lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Dec 8, 2023
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged labels Dec 9, 2023
@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Dec 9, 2023
@lunny lunny enabled auto-merge (squash) December 9, 2023 12:57
@lunny lunny merged commit aeb3830 into go-gitea:main Dec 9, 2023
25 checks passed
@GiteaBot
Copy link
Collaborator

GiteaBot commented Dec 9, 2023

I was unable to create a backport for 1.21. @lunny, please send one manually. 🍵

go run ./contrib/backport 28361
...  // fix git conflicts if any
go run ./contrib/backport --continue

@GiteaBot GiteaBot added backport/manual No power to the bots! Create your backport yourself! and removed reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. labels Dec 9, 2023
@lunny lunny deleted the lunny/fix_branch_sync branch December 9, 2023 14:23
lunny added a commit to lunny/gitea that referenced this pull request Dec 9, 2023
Fix go-gitea#28056

This PR will check whether the repo has zero branch when pushing a
branch. If that, it means this repository hasn't been synced.

The reason caused that is after user upgrade from v1.20 -> v1.21, he
just push branches without visit the repository user interface. Because
all repositories routers will check whether a branches sync is necessary
but push has not such check.

For every repository, it has two states, synced or not synced. If there
is zero branch for a repository, then it will be assumed as non-sync
state. Otherwise, it's synced state. So if we think it's synced, we just
need to update branch/insert new branch. Otherwise do a full sync. So
that, for every push, there will be almost no extra load added. It's
high performance than yours.

For the implementation, we in fact will try to update the branch first,
if updated success with affect records > 0, then all are done. Because
that means the branch has been in the database. If no record is
affected, that means the branch does not exist in database. So there are
two possibilities. One is this is a new branch, then we just need to
insert the record. Another is the branches haven't been synced, then we
need to sync all the branches into database.
@lunny lunny added the backport/done All backports for this PR have been created label Dec 9, 2023
zjjhot added a commit to zjjhot/gitea that referenced this pull request Dec 11, 2023
* giteaofficial/main:
  [skip ci] Updated licenses and gitignores
  Actually recover from a panic in cron task (go-gitea#28409)
  Fix missing check (go-gitea#28406)
  Also sync DB branches on push if necessary (go-gitea#28361)
  Remove stale since giteabot has similiar feature (go-gitea#28401)
  [skip ci] Updated translations via Crowdin
lunny added a commit that referenced this pull request Dec 11, 2023
Fix #28056
Backport #28361 

This PR will check whether the repo has zero branch when pushing a
branch. If that, it means this repository hasn't been synced.

The reason caused that is after user upgrade from v1.20 -> v1.21, he
just push branches without visit the repository user interface. Because
all repositories routers will check whether a branches sync is necessary
but push has not such check.

For every repository, it has two states, synced or not synced. If there
is zero branch for a repository, then it will be assumed as non-sync
state. Otherwise, it's synced state. So if we think it's synced, we just
need to update branch/insert new branch. Otherwise do a full sync. So
that, for every push, there will be almost no extra load added. It's
high performance than yours.

For the implementation, we in fact will try to update the branch first,
if updated success with affect records > 0, then all are done. Because
that means the branch has been in the database. If no record is
affected, that means the branch does not exist in database. So there are
two possibilities. One is this is a new branch, then we just need to
insert the record. Another is the branches haven't been synced, then we
need to sync all the branches into database.
@kdumontnu
Copy link
Contributor

@lunny There is a pretty significant performance regression in this PR.

Most integration tests creating a branch now take minutes to complete.

You can see the effect of this in CI runtime. In a PR merged before this one, sqlite integration tests take 16min

In this PR and after, sqlite integration tests take >30min

@lunny
Copy link
Member Author

lunny commented Dec 22, 2023

I will investigate it.

wxiaoguang pushed a commit that referenced this pull request Dec 28, 2023
#28361 introduced `syncBranchToDB` in `CreateNewBranchFromCommit`. This
PR will revert the change because it's unnecessary. Every push will
already be checked by `syncBranchToDB`.
This PR also created a test to ensure it's right.
GiteaBot pushed a commit to GiteaBot/gitea that referenced this pull request Dec 28, 2023
go-gitea#28361 introduced `syncBranchToDB` in `CreateNewBranchFromCommit`. This
PR will revert the change because it's unnecessary. Every push will
already be checked by `syncBranchToDB`.
This PR also created a test to ensure it's right.
lunny added a commit that referenced this pull request Dec 29, 2023
Replace #28625

Backport #28624 by lunny

#28361 introduced `syncBranchToDB` in `CreateNewBranchFromCommit`. This
PR will revert the change because it's unnecessary. Every push will
already be checked by `syncBranchToDB`.
This PR also created a test to ensure it's right.
fuxiaohei pushed a commit to fuxiaohei/gitea that referenced this pull request Jan 17, 2024
Fix go-gitea#28056 

This PR will check whether the repo has zero branch when pushing a
branch. If that, it means this repository hasn't been synced.

The reason caused that is after user upgrade from v1.20 -> v1.21, he
just push branches without visit the repository user interface. Because
all repositories routers will check whether a branches sync is necessary
but push has not such check.

For every repository, it has two states, synced or not synced. If there
is zero branch for a repository, then it will be assumed as non-sync
state. Otherwise, it's synced state. So if we think it's synced, we just
need to update branch/insert new branch. Otherwise do a full sync. So
that, for every push, there will be almost no extra load added. It's
high performance than yours.

For the implementation, we in fact will try to update the branch first,
if updated success with affect records > 0, then all are done. Because
that means the branch has been in the database. If no record is
affected, that means the branch does not exist in database. So there are
two possibilities. One is this is a new branch, then we just need to
insert the record. Another is the branches haven't been synced, then we
need to sync all the branches into database.
fuxiaohei pushed a commit to fuxiaohei/gitea that referenced this pull request Jan 17, 2024
go-gitea#28361 introduced `syncBranchToDB` in `CreateNewBranchFromCommit`. This
PR will revert the change because it's unnecessary. Every push will
already be checked by `syncBranchToDB`.
This PR also created a test to ensure it's right.
silverwind pushed a commit to silverwind/gitea that referenced this pull request Feb 20, 2024
Fix go-gitea#28056 

This PR will check whether the repo has zero branch when pushing a
branch. If that, it means this repository hasn't been synced.

The reason caused that is after user upgrade from v1.20 -> v1.21, he
just push branches without visit the repository user interface. Because
all repositories routers will check whether a branches sync is necessary
but push has not such check.

For every repository, it has two states, synced or not synced. If there
is zero branch for a repository, then it will be assumed as non-sync
state. Otherwise, it's synced state. So if we think it's synced, we just
need to update branch/insert new branch. Otherwise do a full sync. So
that, for every push, there will be almost no extra load added. It's
high performance than yours.

For the implementation, we in fact will try to update the branch first,
if updated success with affect records > 0, then all are done. Because
that means the branch has been in the database. If no record is
affected, that means the branch does not exist in database. So there are
two possibilities. One is this is a new branch, then we just need to
insert the record. Another is the branches haven't been synced, then we
need to sync all the branches into database.
silverwind pushed a commit to silverwind/gitea that referenced this pull request Feb 20, 2024
go-gitea#28361 introduced `syncBranchToDB` in `CreateNewBranchFromCommit`. This
PR will revert the change because it's unnecessary. Every push will
already be checked by `syncBranchToDB`.
This PR also created a test to ensure it's right.
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Mar 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created backport/manual No power to the bots! Create your backport yourself! backport/v1.21 This PR should be backported to Gitea 1.21 lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/api This PR adds API routes or modifies them size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

branch does not exist
6 participants