You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why we need an abstract layer for managed repositories? I think there are some benefit for that.
Convert to a different storage directory structure. Currently, renaming a user or repository will need to rename the disk directories. This makes it difficult to keep consistent when operations failure. The best method is to use fixed repository information as directorie names, we can use user/repository id or others as directories name so when rename user/repository, no disk operation is necessary.
Reduce fork repositories size. Git itself supports shared repositories but Gitea haven't use this feature to reduce forked repositories disk usage. Some designs need to be considered. Which one should be the root repositories of the base and forked repositories? Should we have a hide repository as the root repositories? This is also related as the layer.
For big gitea sites or for high availability system, distributed git storage is a MUST. Currently, users can use NFS to store the managed git repositories. But it still has the single node problem.
Concepts
I ever sent some PRs to want to introduce a layer in the module/git but I found it's not the right direction. That package modules/git should be a basis package which will always focus on handling disk operations. Whatever the repository is the managed one, the wiki one, the temporary one or the hide one. So I think some concepts need to be introduced to clarify.
Managed Git Repositories: All repositories recorded on Gitea's databases include wiki repositories or future other types repositories can be considered as managed git repositories. Only these git repositories should be managed by the distributed system.
Temporary Git Repositories: The repositories will be created/deleted when doing some operations in Gitea internal. Those repositories will be stored on system's temporary file system and will be clean after the related operations finished.
modules/git: This package should be a low level package which can handle any disk git repositories. For managed git repositories, a new package should be introduced.
modules/gitrepo: This is the new package introduced as an abstract layer to handle managed git repositories. It may include different storage strategy but the interface to other package is almost the same as before to hide the implementation details. This package will depend on modules/git and should not depend on any models packages. It can be dependent by other modules, services layer packages.
Refactoring
To address the purpose, we need do some refactorings.
Hide the setting.RepoRootPath into the modules/gitrepo package. Any other non-test packages should not use it directly. There could be some method provided by that package like GitStorageInfo to return the storage methods and storage path but that should only be used as information displayed on UI.
All managed git repositories invokes the functions on modules/gitrepo but not modules/git and all the functions in modules/gitrepo should hide the absolute RepoPath even the relative storage path but use ID as directory name. Just use some interface like
RunGitCmd should be in the new package. And it can become a proxy method to invoke different implementation.
Mocking
To make the abstract work, we need a mocking git storage server which can reuse the current repository root path but all requests are come from the HTTP operations. So there will be two implemenations for the basic operations. i.e.
For local disk operations
funcrunGitCmdLocal(repoRepository, c*git.Command, opts*RunOpts) error {
ifopts.Dir!="" {
// we must panic here, otherwise there would be bugs if developers set Dir by mistake, and it would be very difficult to debugpanic("dir field must be empty when using RunStdBytes")
}
opts.Dir=getPath(repo, opts.IsWiki)
returnc.Run(&opts.RunOpts)
}
For mock http storage service
funcrunGitCmdForMockServer(repoRepository, c*git.Command, opts*RunOpts) error {
ifopts.Dir!="" {
// we must panic here, otherwise there would be bugs if developers set Dir by mistake, and it would be very difficult to debugpanic("dir field must be empty when using RunStdBytes")
}
returnmockHTTPClient.RunGitCmd(ctx, repo.GetOwnerName(), repo.GetRepoName(), c, opts.RunOpts)
}
That will be a massive benefit for big hosters with many forks per repo and this is also how GitHub works under the hood. A repo and all of its forks use a shared git repo on the server, so if a repo has 1000 forks, you are only storing their changed branches.
Care needs to taken to prevent cross-repo influences. GitHub also had a number of issues related to this in the past (this comes to mind).
Purpose
Why we need an abstract layer for managed repositories? I think there are some benefit for that.
Concepts
I ever sent some PRs to want to introduce a layer in the
module/git
but I found it's not the right direction. That packagemodules/git
should be a basis package which will always focus on handling disk operations. Whatever the repository is the managed one, the wiki one, the temporary one or the hide one. So I think some concepts need to be introduced to clarify.modules/git
: This package should be a low level package which can handle any disk git repositories. For managed git repositories, a new package should be introduced.modules/gitrepo
: This is the new package introduced as an abstract layer to handle managed git repositories. It may include different storage strategy but the interface to other package is almost the same as before to hide the implementation details. This package will depend onmodules/git
and should not depend on anymodels
packages. It can be dependent by othermodules
,services
layer packages.Refactoring
To address the purpose, we need do some refactorings.
setting.RepoRootPath
into themodules/gitrepo
package. Any other non-test packages should not use it directly. There could be some method provided by that package likeGitStorageInfo
to return the storage methods and storage path but that should only be used as information displayed on UI.modules/gitrepo
but notmodules/git
and all the functions inmodules/gitrepo
should hide the absoluteRepoPath
even the relative storage path but useID
as directory name. Just use some interface likeRunGitCmd
should be in the new package. And it can become a proxy method to invoke different implementation.Mocking
To make the abstract work, we need a mocking git storage server which can reuse the current repository root path but all requests are come from the HTTP operations. So there will be two implemenations for the basic operations. i.e.
Related PRs
#28937
#28940
#28966
The text was updated successfully, but these errors were encountered: