-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration of repositories with tags to organisation fails #8540
Comments
Is easier to read. |
Thanks you both for your quick response. I changed the database settings and tried migrating one of our biggest repositories (>6,000 tags, >14,000 commits) to our Gitea organisation. After creating 854 releases and around 100,000
I think this is an efficiency issue, as I don't see why there have to be so many |
@mansshardt Thanks for your reports. Could you share your database setting? |
Sure! Here they are:
|
|
Sorry, my fault about the nanoseconds. |
My fault too. I should have checked. |
No problem! I changed the setting to
|
This is most likely because of attempting too many connections to MariaDB, but I thought this would be solved by the new settings. As TCP can only create and destroy <64515 socket pairs in a small lapse of time (TIME_WAIT), and this import operation seems to be depleting that, perhaps you can try using UNIX sockets for your connection to MariaDB instead? |
I just pushed (--mirror) again and checked the connections to the database via I don't understand why so many connections are nessecary in the first place. In my opinion fiddling with MariaDB properties or database settings to mitigate the issue shouldn't be a solution. I also can't use unix sockets as the database is not on the same machine. |
@mansshardt I agree that this should not be the solution. I was only offering alternatives. 😄 For some reason |
You probably need my pr which allows you to set maxopenconns to prevent too many open connections to the db |
@zeripath If I'm not mistaken, the problem in this issue is the number of closed connections, not the open ones. |
@guillep2k Sure, and thanks again for your input. The issue is not that there are to many open connections to the database as @guillep2k mentioned. The problem is, that during the creation of releases from tags, there is a huge amount of connections which get established and quickly closed. This doens't seem very efficient. As the system holds a tcp socket in TIME_WAIT for some time, we have a lot of these (~25.000) during the creation of releases from tags. It seems that I think two elements are relevant in this issue:
|
I've checked the docs and the sources and I couldn't find a reason why the connections are not pooled. https://www.alexedwards.net/blog/configuring-sqldb @zeripath It looks like your PR #8528 could be related after all, but the default value should be working nonetheless. In theory, by not calling |
@guillep2k yeah so with a long enough lifetime, a large enough MaxOpenConns and a MaxIdleConns near the MaxOpenConns should prevent the rapid opening and closing of connections preventing the port number depletion at least papering over this implementation fault. We need to think about these permissions calls a bit better and consider if we can cache these results in some way. In this particular case if we look at Lines 124 to 186 in 280f4be
Within the context of the hook we read each line of the provided stdin. We get one line per updated ref and they are of the form:
This then gets translated to a GET request to the Gitea server calling: Lines 126 to 244 in 280f4be
This has the benefit of meaning each commit sha id is logged for free but if you're updating a lot of refs that means that you get a lot of separate HTTP requests. Pre-receive has a similar architecture. Now, that architecture means that even if we were doing this within a single session we wouldn't get much benefit from session caching - although it might have some benefit. A better architecture would be to pass all of the refs in a POST, we could then create a Unfortunately when I made these changes to the hooks I considered but dismissed the idea that anyone would be likely to send almost a thousand updates in one push so in terms of doing the least work I only made the simplest implementation. |
Although optimization is always a good thing, I think the root of the problem here is the connection pool. It will bite us back anytime, not only with the migration of large repositories. |
Yeah, at the end of the day - it doesn't matter how efficient this bit of code is - if you have not configured the pool properly you could run it out of connections with the correct kind of load. At least with #8528 we will expose all the configurables that go provides to the user - if that's still not enough then we'll have to think about writing our own pool. ( One which at the least could handle this error and wait) If MaxOpenConns and MaxIdleConns are equal then there should be at most MSL * MaxOpenCons / MaxLifetime TimeWait connections. If you change MaxIdleConns to be different from MaxOpenConns you're likely to need to increase the maxlifetime but there will be point at which there is no stable solution. Without setting MaxOpenConns a sufficient load will cause port exhaustion. |
@zeripath Mmm... I was about to write a long explanation of how MaxOpenConns should not affect the number of closed connections but now I think I see your point. The only way to avoid the system from creating time_wait entries is to keep it from closing them as much as possible, so MaxIdleConns should be equal to MaxOpenConns in this type of application where many users can be doing operations at the same time. Again, your PR seems on point. What I wonder is: what's the strategy in the database driver for scheduling the connection requests when they are all busy? FIFO? Are there others available? |
Without looking at the code I would guess it's actually "random" - the most obvious implementation is a simple spin lock with a wait until you actually get the real lock. I would bet there is no formal queue - too expensive - so we're into OS level queuing algorithms, suggesting a likely bias towards LIFO. |
@mansshardt I know it's a bit of a hassle, but is it possible for you to grab @zeripath 's PR to build from source and try?:
(It's important for the test that the first two values match) |
@guillep2k I will try that on monday when I am back at the office and get back to you. |
I just had the chance to test with a build from @zeripath PR and the following db settings:
With this build and settings I can see a proper database pooling. During migration I have two or three tcp sockets in |
@mansshardt would you be able to try #8602 it would be a great test of the code if it worked. |
Damn that means that I have a bug... |
OK @mansshardt I think that 200 is too large a batch size for gitea to process without the internal request timing out and that's why the you only get ~200 processed. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions. |
[x]
):Description
We are currently migrating all of our 230 repositories to Gitea. Here's how we do it:
To keep it simple our Gitea instance has just one organisation with around 110 "Owner" team members. When we migrate repositories with more than 50 tags to that organisation, the git client pushes everything without an error, but Gitea fails when creating releases for every tag with the following message:
In the Gitea logfile I can see that Gitea calls
repo_permission.go
for every tag and every user of that organisation, which results in a lot of calls and presumably database queries:When migrating that same repository to a "standalone" user account, everything works as expected. All releases get created. In the Gitea logfile I can see that gitea doesn't call
repo_permission.go
, in this case:We also tried to migrate a very large repository with around 15.000 commits and 7000 tags. Although we get an error from the git client, everything (tags, commits) gets transmitted and all releases get created by gitea when migrating to a user repository.
Long story short, when we migrate repositories with more, or way more than 50 tags to a user account, everything works. When we migrate them to an organizazion with a lot of members, Gitea gets stuck when creating releases from tags.
Workaround
As a workaround we first migrate to a user account and then transmit the ownership of the repository to the organisation.
The text was updated successfully, but these errors were encountered: