Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang with 1.20.2/1.20.3 #26684

Closed
elburb opened this issue Aug 23, 2023 · 18 comments · Fixed by #26762
Closed

Hang with 1.20.2/1.20.3 #26684

elburb opened this issue Aug 23, 2023 · 18 comments · Fixed by #26762
Labels
Milestone

Comments

@elburb
Copy link

elburb commented Aug 23, 2023

Description

I run a medium sized, public gitea instance, we have approx 2.5k users with 1k repositories (about 6GB in total).

Starting with 1.20.2 we started experiencing an issue where, after some seemingly random period, trying to access repositories appears to hang or deadlock, both via the web interface or via SSH. We've got nothing to correlate when the issue occurs, sometimes it happens within a few minutes of restarting, sometimes hours. When it does occur we still seem able to access the web interface for non repository things (i.e. admin admin pages).
I've confirmed the issue still happens in 1.20.3 but it no longer occurs after rolling back to 1.20.1.

So far, I've not been able to properly narrow down what is causing the hang.
I see some 'slow' requests in the router logs which just seem indicative of the hanging, but nothing else that seems unusual:

2023/08/23 02:25:04 ...eb/routing/logger.go:68:func1() [W] router: slow      POST /api/internal/ssh/1985/update/3 for <redacted>:0, elapsed 3354.7ms @ private/key.go:16(private.UpdatePublicKeyInRepo)
2023/08/23 02:25:06 ...eb/routing/logger.go:68:func1() [W] router: slow      POST /api/internal/ssh/1985/update/3 for <redacted>:0, elapsed 3269.4ms @ private/key.go:16(private.UpdatePublicKeyInRepo)

On the host, I see a bunch of waiting processes, but I think that is also just showing normal activity that is hung ?

git      1251330 1251329  0 Aug22 ?        00:00:00 -bash
git      1405726       1  0 06:58 ?        00:01:58 /home/git/gitea web
git      1417492 1405726  0 07:44 ?        00:00:00 /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch
git      1417493 1405726  0 07:44 ?        00:00:00 /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check
git      1417628 1405726  0 07:44 ?        00:00:00 /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check
git      1417629 1405726  0 07:44 ?        00:00:01 /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch
git      1420047       1  0 07:58 ?        00:00:00 /usr/lib/git-core/git remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1420048 1420047  0 07:58 ?        00:00:02 /usr/lib/git-core/git-remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1421921       1  0 08:08 ?        00:00:00 /usr/lib/git-core/git remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1421922 1421921  0 08:08 ?        00:00:01 /usr/lib/git-core/git-remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1422176       1  0 08:10 ?        00:00:00 /lib/systemd/systemd --user
git      1422177 1422176  0 08:10 ?        00:00:00 (sd-pam)
git      1423695       1  0 08:18 ?        00:00:00 /usr/lib/git-core/git remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1423696 1423695  0 08:18 ?        00:00:01 /usr/lib/git-core/git-remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1425427       1  0 08:28 ?        00:00:00 /usr/lib/git-core/git remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1425428 1425427  0 08:28 ?        00:00:00 /usr/lib/git-core/git-remote-https origin https://oauth2:<redacted>@git.dn42.dev/dn42/registry.git
git      1428317 1405726  0 08:44 ?        00:00:00 /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch
git      1428318 1405726  0 08:44 ?        00:00:00 /usr/bin/git -c protocol.version=2 -c credential.helper= -c filter.lfs.required= -c filter.lfs.smudge= -c filter.lfs.clean= cat-file --batch-check
git      1428442       1  0 08:44 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-1478
git      1428470 1428455  0 08:45 ?        00:00:00 sshd: git@notty
git      1428471 1428470  0 08:45 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-1985
git      1428498 1428457  0 08:45 ?        00:00:00 sshd: git@notty
git      1428499 1428498  0 08:45 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-358
git      1428521 1428485  0 08:45 ?        00:00:00 sshd: git@notty
git      1428531 1428487  0 08:45 ?        00:00:00 sshd: git@notty
git      1428532 1428521  0 08:45 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-237
git      1428546 1428531  0 08:45 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-586
git      1428571 1428557  0 08:45 ?        00:00:00 sshd: git@notty
git      1428572 1428571  0 08:45 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-1985
git      1428599 1428588  0 08:45 ?        00:00:00 sshd: git@notty
git      1428600 1428599  0 08:45 ?        00:00:00 /home/git/gitea --config=/home/git/custom/conf/app.ini serv key-357

gitea doctor check --all came back clean and I've also run gitea doctor recreate-table without any obvious impact.

Without seeing a specific error, I'm at a loss as to what else to look for so would appreciate some initial advice or guide on what I can do to get some additional information and narrow down what the root cause would be ?

Gitea Version

1.20.2 & 1.20.3

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.34.1

Operating System

No response

How are you running Gitea?

We are running gitea downloaded as a binary from https://github.com/go-gitea/gitea/releases as a systemd service under ubuntu 22.04.3. Gitea is proxied by traefik for web connections.

Database

SQLite

@lunny
Copy link
Member

lunny commented Aug 23, 2023

Hm, you are using SQLite for 2.5k users. It's amazing. Maybe you can use MySQL or Postgres.

2023/08/23 02:25:04 ...eb/routing/logger.go:68:func1() [W] router: slow POST /api/internal/ssh/1985/update/3 for :0, elapsed 3354.7ms @ private/key.go:16(private.UpdatePublicKeyInRepo) 2023/08/23 02:25:06 ...eb/routing/logger.go:68:func1() [W] router: slow POST /api/internal/ssh/1985/update/3 for :0, elapsed 3269.4ms @ private/key.go:16(private.UpdatePublicKeyInRepo)

From this request, it will only do database operations, so the database operation is too slow. Since SQLite is shared read and locked write, it expected very slow when there are many operations at the same time.

@elburb
Copy link
Author

elburb commented Aug 23, 2023

Thanks for the super quick response.

Moving the database is certainly something I can look at doing though I'm curious as to why it should only be a problem with 1.20.2 onward.

Is there any additional debug or stats I can look at to confirm this is problem before attempting to move everything ?

@lunny
Copy link
Member

lunny commented Aug 23, 2023

According to the code it should be a random event. Even 1.20.1 or previous version should have the same problem. You can have a test to push 10 times at the same time.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Aug 23, 2023

To debug the hanging problem:

  • If the admin panel still works, go to admin panel -> monitor -> stacktrace, and download a diagnosis report
  • If the admin panel stops working, use "pprof" tool get some profiling reports.
    ;ENABLE_PPROF = false

About "pprof":

  1. Set app.ini:

    [server]
    ENABLE_PPROF = true
    
  2. Restart Gitea

  3. Try to trigger the bug, when the requests get stuck for a while, use curl or browser to visit: http://127.0.0.1:6060/debug/pprof/goroutine?debug=1


But I would also suggest to use a dedicated database server, SQLite is not suitable for such a large instance, SQLite itself doesn't work for concurrent requests well.

@elburb
Copy link
Author

elburb commented Aug 23, 2023

Many thanks for the responses and suggestions, much appreciated.
I'll close this issue for now whilst I take a look at the additional debug and test migrating away from sqlite.

@elburb elburb closed this as completed Aug 23, 2023
@smartEBL
Copy link

We experience exactly the same behaviour with 1.20.3 (also with a sqlite3 database). We have ~100 users and everything was fine before with 1.19.3.
We're also trying to get more logs.

@wxiaoguang
Copy link
Contributor

Feel free to provide more details, including the diagnosis report and stacktrace.

@elburb
Copy link
Author

elburb commented Aug 25, 2023

For my instance, I was able to get both stacktrace and debug outputs.
The debug clearly showed a bunch of processes stuck with sqlite, here's just one example (of ~300+)

1 @ 0x43df36 0x408f1d 0x408a18 0x89950d 0x890a0b 0x884818 0x8710fd 0x878e95 0x881c0c 0x8787e6 0x878508 0x8782f5 0x877867 0x87820c 0xeb43e5 0xf3c3fb 0xf41271 0x1773ce7 0x2664238 0x2664fb
c 0x4a818b 0x4a743c 0x2312c46 0x9b4b8f 0x9b4b8f 0x2312cdc 0x9b4b8f 0x2312cdc 0x9b4b8f 0x2312cdc 0x9b4b8f 0x2312cdc
# labels: {"graceful-lifecycle":"with-hammer", "pid":"64e63796-2", "ppid":"", "process-description":"GET: /dn42/registry", "process-type":"request"}
#	0x89950c	github.com/mattn/go-sqlite3.(*SQLiteStmt).exec+0x22c		/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3.go:2067
#	0x890a0a	github.com/mattn/go-sqlite3.(*SQLiteConn).exec+0x26a		/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3.go:877
#	0x884817	github.com/mattn/go-sqlite3.(*SQLiteConn).ExecContext+0x157	/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3_go18.go:41
#	0x8710fc	database/sql.ctxDriverExec+0xdc					/usr/local/go/src/database/sql/ctxutil.go:31
#	0x878e94	database/sql.(*DB).execDC.func2+0x174				/usr/local/go/src/database/sql/sql.go:1675
#	0x881c0b	database/sql.withLock+0x8b					/usr/local/go/src/database/sql/sql.go:3405
#	0x8787e5	database/sql.(*DB).execDC+0x265					/usr/local/go/src/database/sql/sql.go:1670
#	0x878507	database/sql.(*DB).exec+0xe7					/usr/local/go/src/database/sql/sql.go:1655
#	0x8782f4	database/sql.(*DB).ExecContext.func1+0x54			/usr/local/go/src/database/sql/sql.go:1634
#	0x877866	database/sql.(*DB).retry+0x46					/usr/local/go/src/database/sql/sql.go:1538
#	0x87820b	database/sql.(*DB).ExecContext+0xcb				/usr/local/go/src/database/sql/sql.go:1633
#	0xeb43e4	xorm.io/xorm/core.(*DB).ExecContext+0x1e4			/go/pkg/mod/xorm.io/xorm@v1.3.3-0.20230725140238-59b727260d35/core/db.go:273
#	0xf3c3fa	xorm.io/xorm.(*Session).exec+0x25a				/go/pkg/mod/xorm.io/xorm@v1.3.3-0.20230725140238-59b727260d35/session_raw.go:178
#	0xf41270	xorm.io/xorm.(*Session).Update+0x1190				/go/pkg/mod/xorm.io/xorm@v1.3.3-0.20230725140238-59b727260d35/session_update.go:196
#	0x1773ce6	code.gitea.io/gitea/models/activities.SetRepoReadBy+0x226	/source/models/activities/notification.go:777
#	0x2664237	code.gitea.io/gitea/routers/web/repo.checkHomeCodeViewable+0x77	/source/routers/web/repo/view.go:645
#	0x2664fbb	code.gitea.io/gitea/routers/web/repo.Home+0x5b			/source/routers/web/repo/view.go:727
#	0x4a818a	reflect.Value.call+0xb0a					/usr/local/go/src/reflect/value.go:586
#	0x4a743b	reflect.Value.Call+0xbb						/usr/local/go/src/reflect/value.go:370
#	0x2312c45	code.gitea.io/gitea/modules/web.toHandlerProvider.func2.1+0x1c5	/source/modules/web/handler.go:176
#	0x9b4b8e	net/http.HandlerFunc.ServeHTTP+0x2e				/usr/local/go/src/net/http/server.go:2122
#	0x9b4b8e	net/http.HandlerFunc.ServeHTTP+0x2e				/usr/local/go/src/net/http/server.go:2122
#	0x2312cdb	code.gitea.io/gitea/modules/web.toHandlerProvider.func2.1+0x25b	/source/modules/web/handler.go:186
#	0x9b4b8e	net/http.HandlerFunc.ServeHTTP+0x2e				/usr/local/go/src/net/http/server.go:2122
#	0x2312cdb	code.gitea.io/gitea/modules/web.toHandlerProvider.func2.1+0x25b	/source/modules/web/handler.go:186
#	0x9b4b8e	net/http.HandlerFunc.ServeHTTP+0x2e				/usr/local/go/src/net/http/server.go:2122
#	0x2312cdb	code.gitea.io/gitea/modules/web.toHandlerProvider.func2.1+0x25b	/source/modules/web/handler.go:186
#	0x9b4b8e	net/http.HandlerFunc.ServeHTTP+0x2e				/usr/local/go/src/net/http/server.go:2122
#	0x2312cdb	code.gitea.io/gitea/modules/web.toHandlerProvider.func2.1+0x25b	/source/modules/web/handler.go:186

Whilst I completely understand the point that sqlite isn't really suited to our usage, we had zero issues for the past 2.5 years and only started getting issues with 1.20.2, something clearly did change in that version to trigger the problem.

My intention is to migrate away from sqlite this weekend, and expect that will resolve it for us.

@elburb
Copy link
Author

elburb commented Aug 25, 2023

and just for reference, another slightly different example:

1 @ 0x43df36 0x408f1d 0x408a18 0x89d1fe 0x89e5c5 0x406651 0x406309 0x4724d4 0x4704e1 0x406185 0x88582c 0x899a85 0x899805 0x899752 0x472701
# labels: {"graceful-lifecycle":"with-hammer", "pid":"64e635a3-2", "ppid":"", "process-description":"GET: /dn42/registry.git/info/refs?service=git-upload-pack", "process-type":"request"
}
#	0x89d1fd	github.com/mattn/go-sqlite3.unlock_notify_wait+0x15d			/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3_opt_unlock_notify.go:90
#	0x89e5c4	_cgoexp_2a83583a9b45_unlock_notify_wait+0x24				_cgo_gotypes.go:1419
#	0x406650	runtime.cgocallbackg1+0x2b0						/usr/local/go/src/runtime/cgocall.go:315
#	0x406308	runtime.cgocallbackg+0x108						/usr/local/go/src/runtime/cgocall.go:234
#	0x4724d3	runtime.cgocallback+0xb3						/usr/local/go/src/runtime/asm_amd64.s:998
#	0x4704e0	runtime.systemstack_switch+0x0						/usr/local/go/src/runtime/asm_amd64.s:463
#	0x406184	runtime.cgocall+0x84							/usr/local/go/src/runtime/cgocall.go:167
#	0x88582b	github.com/mattn/go-sqlite3._Cfunc__sqlite3_step_row_internal+0x4b	_cgo_gotypes.go:387
#	0x899a84	github.com/mattn/go-sqlite3.(*SQLiteStmt).execSync.func3+0x64		/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3.go:2084
#	0x899804	github.com/mattn/go-sqlite3.(*SQLiteStmt).execSync+0x64			/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3.go:2084
#	0x899751	github.com/mattn/go-sqlite3.(*SQLiteStmt).exec.func1+0x31		/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.16/sqlite3.go:2055

@smartEBL
Copy link

When I hear 1.20.2, sqlite and bug I have to think of #26280
This bug was fixed with a backport in 1.20.3 (#26339)

Just a quick thought.

@elburb
Copy link
Author

elburb commented Aug 25, 2023

When I hear 1.20.2, sqlite and bug I have to think of #26280 This bug was fixed with a backport in 1.20.3 (#26339)

Sadly, 1.20.3 has the same problem.

@smartEBL
Copy link

Sadly, 1.20.3 has the same problem.

Hmm, okay. Nevermind then.

@wxiaoguang
Copy link
Contributor

What's the full stacktrace? Partial information doesn't show the whole picture.

@lunny
Copy link
Member

lunny commented Aug 25, 2023

I will take a look at this problem.

@elburb
Copy link
Author

elburb commented Aug 25, 2023

Output from pprof, and the goroutine-before.txt and goroutine-after.txt are here:

https://gist.github.com/sesa-me/7ed3050ae68576f73017c9362cdf7c90

@wxiaoguang wxiaoguang reopened this Aug 25, 2023
@wxiaoguang wxiaoguang added this to the 1.21.0 milestone Aug 25, 2023
@elburb
Copy link
Author

elburb commented Aug 27, 2023

Just an update for my instance, I migrated the database to mariadb yesterday and have been happily running 1.20.3 for the past 24hrs. With sqlite, we typically saw it hang within a few hours so migrating looks like it has resolved/worked around the issue for us.

@wxiaoguang
Copy link
Contributor

I can see there seems to be a regression by #26158

image

This was referenced Aug 28, 2023
@lunny lunny modified the milestones: 1.21.0, 1.20.4 Aug 28, 2023
lunny added a commit that referenced this issue Aug 28, 2023
wxiaoguang pushed a commit that referenced this issue Aug 28, 2023
@smartEBL
Copy link

Great! Thanks a lot for the quick support! 🚀

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants