LFS: Cloning objects / batch not found #8273

gabyx · 2019-09-24T09:10:38Z

Gitea version: Bug contained in 1.8.3 - 1.9.3
Git version: 2.23.0 (local)
Operating system: Gitea (Linux, docker), Pushing from repo from: Windows
Database (use [x]):
- PostgreSQL
- MySQL
- MSSQL
- SQLite
Can you reproduce the bug at https://try.gitea.io:
- Yes (provide example URL)
- No
- Not relevant

Description

When I upload a repo with LFS objects, the upload mostly works.
While cloning, after some time, the lfs smudge filter (here 58%)
stalls always after some time, saying

After a night of debugging (updating sucessively through all versions with docker),
we come to the conclusions that

this issue arises in all versions from 1.8.3 till 1.9.3.
Version 1.7.4 - 1.8.2 all work correctly.
Setting the repository to private or public did not help (version 1.8.3)

Could it be that the following Submissions into 1.8.3 are problematic:

Always set userID on LFS authentication (Always set userID on LFS authentication #7224) (Part of Move serv hook functionality & drop GitLogger #6993)
Fix LFS Locks over SSH (Fix LFS Locks over SSH #6999) (Fix LFS Locks over SSH (#6999) #7223)

The hints/workarounds in the discussion below, did not solve this issue:
https://discourse.gitea.io/t/solved-git-lfs-upload-repeats-infinitely/635/2

Hopefully this gets some attention, since its a nasty LFS Bug which made us almost to apple crumble. 🍎

The text was updated successfully, but these errors were encountered:

m-a-v · 2019-09-27T20:16:35Z

I've made some more tests. After compiling the version of commit dbd0a2e Fix LFS Locks over SSH (#6999) (#7223) the error appears. The LFS data is large (approximately 10 GB). One commit before (7697a28) everthing works perfectly.

I've tried to disable the SSH server. But this doesn't change anything.

@zeripath Let me know if you need more information.

m-a-v · 2019-09-27T20:57:01Z

Here you can see the debug log output when the error occurs: PANIC:: runtime error: invalid memory address or nil pointer dereference,

2019/09/27 20:44:19 [D] Could not find repository: company/repository - dial tcp 172.18.0.6:3306: connect: cannot assign requested address, 2019/09/27 20:44:19 [D] LFS request - Method: GET, URL: /company/repository.git/info/lfs/objects/063e23a8631392cc939b6b609df91e02d064f3fe279522c3eefeb1c5f1d738a3, Status 404, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20), , 2019/09/27 20:44:19 [D] Template: status/500, 2019/09/27 20:44:19 [...les/context/panic.go:36 1()] [E] PANIC:: runtime error: invalid memory address or nil pointer dereference, /usr/local/go/src/runtime/panic.go:82 (0x44abc0), /usr/local/go/src/runtime/signal_unix.go:390 (0x44a9ef), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:120 (0x108a0ed), /go/src/code.gitea.io/gitea/models/repo_permission.go:95 (0x1183338), /go/src/code.gitea.io/gitea/modules/lfs/server.go:501 (0x118330a), /go/src/code.gitea.io/gitea/modules/lfs/server.go:128 (0x117f2dd), /go/src/code.gitea.io/gitea/modules/lfs/server.go:146 (0x117f468), /go/src/code.gitea.io/gitea/modules/lfs/server.go:105 (0x117ef90), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x11667e8), /go/src/code.gitea.io/gitea/modules/context/panic.go:40 (0x11667db), /usr/local/go/src/reflect/value.go:447 (0x4cb930), /usr/local/go/src/reflect/value.go:308 (0x4cb3b3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:177 (0x9a1466), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:137 (0x9a0d5b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9efe76), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/session/session.go:192 (0x9efe61), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:79 (0x9cfdc0), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e197f), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/recovery.go:161 (0x9e196d), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:112 (0x9e0ca0), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:52 (0x9e0c8b), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/logger.go:40 (0x9d3bb3), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:157 (0x9a1120), /go/src/code.gitea.io/gitea/vendor/github.com/go-macaron/inject/inject.go:135 (0x9a0e4a), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/context.go:121 (0x9cff19), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:187 (0x9e2bc6), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/router.go:303 (0x9dc635), /go/src/code.gitea.io/gitea/vendor/gopkg.in/macaron.v1/macaron.go:220 (0x9d4f8c), /go/src/code.gitea.io/gitea/vendor/github.com/gorilla/context/context.go:141 (0xce374a), /usr/local/go/src/net/http/server.go:1995 (0x6f63a3), /usr/local/go/src/net/http/server.go:2774 (0x6f9677), /usr/local/go/src/net/http/server.go:1878 (0x6f5360), /usr/local/go/src/runtime/asm_amd64.s:1337 (0x464c20),

m-a-v · 2019-09-28T20:52:16Z

I suppose that Gitea is exceeding the number of local socket connections permitted by the OS.

Failure: cannot assign requested address

See also explanation and possible solution here:
golang/go#16012 (comment)

Where could I change the setting MaxIdleConnsPerHost and other LFS server settings to make further tests?

m-a-v · 2019-09-28T20:54:19Z

BTW: The error PANIC:: runtime error: invalid memory address or nil pointer dereference does not always appear in the log output. Sometimes the server and client just hang.

m-a-v · 2019-09-28T20:56:03Z

@lunny Who could help to isolate this bug? Is there any Gitea programmer who could support us? I am willing to make more tests but I need some hints.

gabyx · 2019-09-29T10:19:41Z

@m-a-v: There is also a setting:

git -c lfs.concurrenttransfers=5 clone

which will affect the transfer probably, nevertheless it should not crash the server...

gabyx · 2019-09-29T10:32:41Z

Another interesting read: https://www.fromdual.com/huge-amount-of-time-wait-connections

Check ulimit, maxfiles, and somaxconn. Possibly system runs out of limits resources. Link

lunny · 2019-09-30T01:06:46Z

@m-a-v I think @zeripath maybe. But if not, I can take a look at this.

m-a-v · 2019-09-30T08:07:49Z

The problem seems to be the huge amount of connections for the Get request (more than 10k connections for one single client!). See also here:

https://medium.com/@valyala/net-http-client-has-the-following-additional-limitations-318ac870ce9d.
https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779

zeripath · 2019-10-10T01:11:33Z

@m-a-v I've been very busy doing other things for a while so have been away from Gitea. I'll take a look at this.

I think you're on the right trail with the number of connections thing. IIRC there's another person who had a similar issue.

zeripath · 2019-10-10T01:25:43Z

@m-a-v I can't understand why dbd0a2e should break things, but I'll double check.

Maybe it's possible the request body isn't being closed or something stupid like that. That would cause a leak if so and could explain the issue.

The other possiblity is that dbd0a2e has nothing to do with things and it's a Heisenbug relating to the number of connections thing.

guillep2k · 2019-10-10T02:18:24Z

A netstat -an could be usefull to see in what state are the connections when this happens. It doesn't need to make Gitea fail, but it will be useful as long as there is a large number of connections listed. It's not the same if the connections are in CONNECTED state, or CLOSE_WAIT, FIN_WAIT1, etc.

zeripath · 2019-10-10T16:22:07Z

OK, so all these calls to ReadCloser() don't Close():

gitea/modules/lfs/server.go

Line 330 in 57b0d9a

if err := contentStore.Put(meta, ctx.Req.Body().ReadCloser()); err != nil {

gitea/modules/lfs/server.go

Line 437 in 57b0d9a

dec := json.NewDecoder(r.Body().ReadCloser())

gitea/modules/lfs/server.go

Line 456 in 57b0d9a

dec := json.NewDecoder(r.Body().ReadCloser())

Whether that's the cause of your bug is another question - however, it would fit with dbd0a2e causing more issues because suddenly you get a lot more calls to unpack.

These should be closed so I guess that's at least a starting point for attempting to fix this. (If I find anything else I will update this.)

zeripath · 2019-10-10T16:36:57Z

@m-a-v would you be able to rebuild from my PR #8454 and see if that solves your issue?

m-a-v · 2019-10-11T22:23:12Z

@zeripath Thanks a lot. It may take some time until I can test it, but I certainly will.

zeripath · 2019-10-12T04:58:21Z

It's actually been merged in to 1.10 and 1.9 branches already.

m-a-v · 2019-10-15T12:35:26Z

I've tested it again with 1.10 and it seems that the described LFS bug has been solved or at least it made the error appear for this specific scenario. Before @zeropath fix we had more than 10k connections in a TIME_WAIT state. Now there are still approximately 3.5k connections in the TIME_WAIT state. I assume if multiple clients will access the LFS server the same problem could still occur.

Any idea how to improve this? Are there other possible leaks? I assume that a connection which closes will not remain in a TIME_WAIT state. Can anyone confirm this?

zeripath · 2019-10-15T13:38:06Z

Hi @m-a-v, I guess this means that I must have missed some others. Is there anyway of checking that they're all LFS connections?

m-a-v · 2019-10-15T13:42:08Z

Indirectly, yes. I had only one active client. Before LFS checkout I had two connections on the MariaDB database server instance. During LFS checkout about 3.5k connections and then some minutes later again 2 connections.

This article could be interesting:
http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

zeripath · 2019-10-15T13:46:37Z

LFS checkout causes 3.5K connections?! How many LFS objects do you have?

m-a-v · 2019-10-15T13:51:01Z

12k LFS objects.

guillep2k · 2019-10-15T15:00:59Z

@zeripath Any connections that Gitea leaves open should remain in either ESTABLISHED or CLOSE_WAIT states.

zeripath · 2019-10-15T16:18:10Z

Could it be that git lfs on the client is also leading connections?

guillep2k · 2019-10-15T16:27:01Z

Could it be that git lfs on the client is also leading connections?

That would be either FIN_WAIT_1 or FIN_WAIT_2.

TIME_WAIT is a state maintained by the OS to keep the port from being reused (by port I mean the client+server address & port pair).

guillep2k · 2019-10-15T16:30:06Z

This picture should help (but it's not easy to read, so I guess it doesn't):

m-a-v · 2019-10-15T17:43:08Z

I think the problem is more the following:

"Your problem is that you are not reusing your MySQL connections within your app but instead you are creating a new connection every time you want to run an SQL query. This involves not only setting up a TCP connection, but then also passing authentication credentials across it. And this is happening for every query (or at least every front-end web request) and it's wasteful and time consuming."

I think this would also speed up Gitea's LFS server a lot.

source: https://serverfault.com/questions/478691/avoid-time-wait-connections

zeripath · 2019-10-15T18:15:34Z

AHA! Excellent! Well done for finding that!

zeripath · 2019-10-15T18:35:40Z

OK We do recycle connections. We use the underlying go sql connection pool.

For MySQL there are the following in the [database] part of the app.ini:

MAX_IDLE_CONNS 0: Max idle database connections on connnection pool, default is 0
CONN_MAX_LIFETIME 3s: Database connection max lifetime

https://docs.gitea.io/en-us/config-cheat-sheet/#database-database

I think MAX_IDLE_CONNECTIONS was set to 0 because MySQL doesn't like long lasting connections.

~~I will however make a PR, exposing SetConnMaxLifetime.~~ Edit: I'm an idiot it's already exposed for MySQL.

zeripath · 2019-10-15T18:56:42Z

I think what you need to do is tune those variables better. I think our defaults are highly likely to be incorrect - however, I think they were set to this because of other users complaining of problems.

I suspect that MAX_IDLE_CONNECTIONS being set to 0 happened before we adjusted CONN_MAX_LIFETIME and it could be that we could be more generous with both of these. I.e. something like MAX_IDLE_CONNECTIONS 10 and CONN_MAX_LIFETIME 15m would work.

m-a-v · 2019-10-21T19:48:02Z

I could test it again with the repo. Which branch should I take? Which parameters (I've seen that discussions continued)?

m-a-v · 2019-10-21T19:49:42Z

So I've spotted another unclosed thing, which is unlikely to be causing your issue, however, I am suspicious that we're not closing the response body in modules/lfs/server.go.

Did you also fix this?

m-a-v · 2019-10-31T21:51:39Z

I have made several experiments with the currently running gitea server(v1.7.4 and with the new version v.1.9.5). The netstat snapshots were created at the peak of the number of open connections.

Version 1.7.4

root@917128b828cb:/# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 Foreign
      1 established)
      2 ESTABLISHED
      2 LISTEN
    162 TIME_WAIT

Version 1.9.5 (and same default settings as with 1.7.4

bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 ESTABLISHED
      1 Foreign
      1 established)
      5 LISTEN
  30064 TIME_WAIT

Version 1.9.5 (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)

bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 ESTABLISHED
      1 Foreign
      1 established)
      5 LISTEN
  31095 TIME_WAIT

With both configurations the LFS servers has much too many open connections. So I think we still have serious problems with large LFS repos.

$ git clone https://domain.org/repo.git test
Cloning into 'test'...
remote: Enumerating objects: 157392, done.
remote: Counting objects: 100% (157392/157392), done.
remote: Compressing objects: 100% (97424/97424), done.
remote: Total 157392 (delta 63574), reused 151365 (delta 57755)
Receiving objects: 100% (157392/157392), 6.99 GiB | 57.68 MiB/s, done.
Resolving deltas: 100% (63574/63574), done.
Updating files: 100% (99264/99264), done.
Filtering content:  53% (6594/12372), 4.13 GiB | 2.38 MiB/s

The clone process just freezes at a certain percentage (as soon as there are too many connections).

I think this bug should be reopened.

zeripath · 2019-10-31T22:15:14Z

#8528 was only backported to 1.10 as #8618 . It was not backported to 1.9.5.

Setting MAX_OPEN_CONNS won't have any effect on 1.9.5.

Please try on 1.10-rc2 or master.

m-a-v · 2019-10-31T22:28:40Z

master (CONN_MAX_LIFETIME = 45s, MAX_IDLE_CONNS = 10, MAX_OPEN_CONNS = 10)

bash-5.0# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 FIN_WAIT1
      1 Foreign
      1 established)
      5 ESTABLISHED
      5 LISTEN
   8041 TIME_WAIT

The checkout succeeds but still many used connections remain in TIME_WAIT status. If multiple clients would access the LFS server it could not handle it.

zeripath · 2019-10-31T23:03:04Z

Your max lifetime is probably too low, 45s seems aggressive.

Are you sure all of those connections are db connections? Lots of http connections will be made when dealing with lots of lfs objects. (There's probably some more efficiencies we can do.)

If they're all db then multiple users won't change it - you're likely at your max as it should be mathematically determinable:

Total Connections = open +idle + timewait

If max open=max idle:
Max C = O + W

dC/dt = dO/dt + dW/dt

max dO/dt = 0 (as it's fixed)

max dW/dT = max_o/max_l - W/max_tw

dC/dt is positive around C=0 therefore dC/dt=0 should represent max for positive C and thence maximize W.

max_W = max_tw * max_o / max_l

If they're all db then you have a very long max tw or I've messed up in my maths somewhere.

You can set your time_wait at a server network stack level.

m-a-v · 2019-10-31T23:10:35Z

I've chosen the 45 seconds from the discussion between you and @guillep2k in #8528.

How are the connections reused? Where is this made in the code? I assume after a connection is closed it will go in the TIME_WAIT state.

I don't know if all are db connections. Why did it work with 1.7.4 almost perfectly (see above)?

m-a-v · 2019-10-31T23:20:12Z

This could be interesting:
https://stackoverflow.com/questions/1931043/avoiding-time-wait

"Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)."

"Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR"

guillep2k · 2019-10-31T23:48:11Z

@zeripath @m-a-v It must be noticed that not all TIME_WAIT connections are from the database. Internal requests (e.g. internal router) and many others will create quick http connections that may or may not be reused.

@m-a-v it would be cool if you'd break your statistics down by listening port number.

guillep2k · 2019-10-31T23:57:28Z

"Probably the best option, if it's doable: refactor your protocol so that connections that are finished aren't closed, but go into an "idle" state so they can be re-used later, instead of opening up a new connection (like HTTP keep-alive)."

"Setting SO_REUSEADDR on the client side doesn't help the server side unless it also sets SO_REUSEADDR"

I don't think SO_REUSEADDR applies here. If you're down into this level of optimization, I'd suggest tuning the tcp_fin_timeout parameter in the kernel. Too short a value will have ill side effects, though; I'd wouldn't set it below 30 seconds.

But TIME_WAIT is actually the symptom, not the problem.

m-a-v · 2019-11-01T00:07:47Z

@guillep2k What do you exactly mean with "it would be cool if you'd break your statistics down by listening port number"?

tcp_fin_timeout is set to 60 seconds on my system. Ubuntu 18.04 LTS standard configuration.

The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?

guillep2k · 2019-11-01T00:36:22Z

@m-a-v

# netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c | sort -n

guillep2k · 2019-11-01T01:23:46Z

The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?

I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁

m-a-v · 2019-11-01T08:05:03Z

The question still remains. Why did it work perfectly with 1.7.4 (and earlier) and now anymore?

I don't know, I'd need to check the code. The important thing is that it's taken care of now. 😁

I meant "and now not anymore".

guillep2k · 2019-11-01T14:06:03Z

I meant "and now not anymore".

I meant it's now solved by properly handling CONN_MAX_LIFETIME, MAX_IDLE_CONNS and MAX_OPEN_CONNS.

@m-a-v If you want to investigate what's the specific change between 1.7.4 and 1.9.5 that caused this, I'd be interested in learning about your results.

gabyx · 2019-12-20T08:38:35Z

on 1.7.4 (9f33aa6) I had lots of connections when cloning too on the peak, when Filtering... -> LFS Smudge:

$ netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
      1 Foreign
      1 established)
      5 LISTEN
     10 ESTABLISHED
   8599 TIME_WAIT

When git lfs push --all origin at the peak

$ netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c
66

suddenly the client hangs on 97%. GIT_TRACE=true does not show anything... it just hangs... possible not related to gitea.

gabyx · 2020-01-07T14:10:11Z

on 1.11.0+dev-563-gbcac7cb93:
netstat -ant | grep TIME_WAIT | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c
Peak is 280 connections in TIME_WAIT.

This was referenced Sep 26, 2019

Always set userID on LFS authentication #7224

Merged

Move serv hook functionality & drop GitLogger #6993

Merged

lunny added the type/bug label Sep 26, 2019

gabyx changed the title ~~LFS Upload objects/batch not found~~ LFS: Cloning objects / batch not found Sep 26, 2019

lunny added this to the 1.9.4 milestone Sep 30, 2019

lunny modified the milestones: 1.9.4, 1.9.5 Oct 8, 2019

zeripath mentioned this issue Oct 10, 2019

Ensure Request Body Readers are closed in LFS server #8454

Merged

zeripath closed this as completed in #8454 Oct 10, 2019

zeripath mentioned this issue Oct 15, 2019

Add missed close in ServeBlobLFS #8527

Merged

zeripath mentioned this issue Oct 15, 2019

Expose db.SetMaxOpenConns and allow non MySQL dbs to set conn pool params #8528

Merged

guillep2k mentioned this issue Oct 16, 2019

Migration of repositories with tags to organisation fails #8540

Closed

7 tasks

gabyx mentioned this issue Jan 8, 2020

To many TimeWait Connections when git clone (LFS) #9650

Closed

8 tasks

go-gitea locked and limited conversation to collaborators Nov 24, 2020

LFS: Cloning objects / batch not found #8273

LFS: Cloning objects / batch not found #8273

Comments

gabyx commented Sep 24, 2019 • edited Loading

Description

m-a-v commented Sep 27, 2019

m-a-v commented Sep 27, 2019

m-a-v commented Sep 28, 2019

m-a-v commented Sep 28, 2019

m-a-v commented Sep 28, 2019

gabyx commented Sep 29, 2019

gabyx commented Sep 29, 2019 • edited Loading

lunny commented Sep 30, 2019

m-a-v commented Sep 30, 2019

zeripath commented Oct 10, 2019 • edited Loading

zeripath commented Oct 10, 2019

guillep2k commented Oct 10, 2019

zeripath commented Oct 10, 2019

zeripath commented Oct 10, 2019

m-a-v commented Oct 11, 2019

zeripath commented Oct 12, 2019

m-a-v commented Oct 15, 2019

zeripath commented Oct 15, 2019

m-a-v commented Oct 15, 2019

zeripath commented Oct 15, 2019

m-a-v commented Oct 15, 2019 • edited Loading

guillep2k commented Oct 15, 2019

zeripath commented Oct 15, 2019

guillep2k commented Oct 15, 2019

guillep2k commented Oct 15, 2019

m-a-v commented Oct 15, 2019 • edited Loading

zeripath commented Oct 15, 2019

zeripath commented Oct 15, 2019 • edited Loading

zeripath commented Oct 15, 2019

m-a-v commented Oct 21, 2019

m-a-v commented Oct 21, 2019

m-a-v commented Oct 31, 2019

zeripath commented Oct 31, 2019

m-a-v commented Oct 31, 2019

zeripath commented Oct 31, 2019 • edited Loading

m-a-v commented Oct 31, 2019

m-a-v commented Oct 31, 2019

guillep2k commented Oct 31, 2019 • edited Loading

guillep2k commented Oct 31, 2019

m-a-v commented Nov 1, 2019

guillep2k commented Nov 1, 2019

guillep2k commented Nov 1, 2019

m-a-v commented Nov 1, 2019

guillep2k commented Nov 1, 2019

gabyx commented Dec 20, 2019 • edited Loading

gabyx commented Jan 7, 2020 • edited Loading

gabyx commented Sep 24, 2019 •

edited

Loading

gabyx commented Sep 29, 2019 •

edited

Loading

zeripath commented Oct 10, 2019 •

edited

Loading

m-a-v commented Oct 15, 2019 •

edited

Loading

m-a-v commented Oct 15, 2019 •

edited

Loading

zeripath commented Oct 15, 2019 •

edited

Loading

zeripath commented Oct 31, 2019 •

edited

Loading

guillep2k commented Oct 31, 2019 •

edited

Loading

gabyx commented Dec 20, 2019 •

edited

Loading

gabyx commented Jan 7, 2020 •

edited

Loading