Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About DB connection random lost #6804

Closed
2 of 7 tasks
cevin opened this issue Apr 30, 2019 · 19 comments
Closed
2 of 7 tasks

About DB connection random lost #6804

cevin opened this issue Apr 30, 2019 · 19 comments
Labels
Milestone

Comments

@cevin
Copy link

cevin commented Apr 30, 2019

  • Gitea version (or commit ref): 1.8.0
  • Git version:2.9.5
  • Operating system: centos 7
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant
  • Log gist:
2019/04/30 12:59:41 [.../gogits/cron/cron.go:92 Run()] [E] MirrorUpdate: invalid connection
2019/04/30 13:01:36 [...user/notification.go:33 GetNotificationCount()] [E] GetNotificationCount: invalid connection
2019/04/30 13:02:12 [...s/context/context.go:238 func1()] [E] GetUserById: invalid connection
2019/04/30 13:05:34 [...s/context/context.go:238 func1()] [E] GetUserById: invalid connection

Description

like #5736
Aliyun Cloud ( www.aliyun.com ) & Aliyun RDS MySQL ( www.aliyun.com/product/rds/mysql )
as far as I know aliyun rds mysql server recycle connection is very fast, so....

@cevin cevin changed the title About DB connection About DB connection random lost Apr 30, 2019
@bobemoe
Copy link
Contributor

bobemoe commented May 13, 2019

I'm getting the same kinds of errors:

==> /home/gitea/log/gitea.log <==
2019/05/13 19:41:15 [...s/context/context.go:238 func1()] [E] GetUserById: invalid connection
2019/05/13 19:41:15 [...ules/context/repo.go:299 func1()] [E] GetUserByName: invalid connection

Seems to be pretty random and manifesting as 500/404 errors, sometimes on git push/pull and others also while navigating the web fronted. A retry/refresh of the same URL seems to always succeed straight away.

I'm using Gitea 1.8.1
mysql Ver 14.14 Distrib 5.6.43, for Linux (x86_64) using EditLine wrapper
PHP 7.3.5-1+020190503093827.38+stretch1.gbp60a41b (cli) (built: May 3 2019 09:38:28) ( NTS

Not seeing anything in MySQL log, but maybe I haven't enabled it verbose enough.

Seeing similar reports in #5736 and #5612

@lunny lunny added the status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR label May 14, 2019
@lunny
Copy link
Member

lunny commented May 14, 2019

I think this should be a go-sql-driver/mysql 's problem.

@bobemoe
Copy link
Contributor

bobemoe commented May 14, 2019

How does go load the driver? Is it all packaged into the binary? I didn't have to install any dependencies, so I'm guessing there isn't anything I can update on the OS level, apart from MySQL itself perhaps?

@bobemoe
Copy link
Contributor

bobemoe commented May 14, 2019

I've upgraded MySQL to 5.7.26 and still experiencing this.

The error in mysql.log is
2019-05-14T16:13:22.312637Z 3045 [Note] Aborted connection 3045 to db: 'gitea' user: 'gitea' host: 'localhost' (Got timeout reading communication packets)

@lafriks
Copy link
Member

lafriks commented May 14, 2019

Could it be that xorm does not close connection at same cases (go-sql-driver/mysql#556)?

@bobemoe
Copy link
Contributor

bobemoe commented May 14, 2019

Yeah I reckon it could be.

I've been sitting tailing the logs for a bit and it seems that the mysql timeout comes FIRST. Then next page refresh I get the 404/500 error accompanied by the GO error "GetUserById: invalid connection" and this is 100% repeatable.

Now also watching the MySQL process list. What it looks like is happening:

  1. When I hit ANY frontend page, a gitea connection appears in mysql processlist.
  2. This connection returns its data and then enters "sleep" mode.
  3. As time passes the sleep count increases.
  4. If I hit another page within 60 secs, the connection is reused, and the sleep count resets to 0.
  5. If the sleep count reaches 60 then the connection times out, the mysql timeout error is logged in mysql.log
  6. Next page hit the go error "GetUserById: invalid connection" appears in gitea.log and the frontend 404's.
  7. A new connection now appears, go to step 3

So, I'm guessing Gitea is (efficiently) only opening one connection, and keeping it idle until it needs to use it again?

However when that time comes, if the connection is no longer available this is not caught properly to re-open the connection and retry the last query.

I can increase my mysql timeout, but this will only make the errors less frequent. Unless Gitea closes the connection after a set time, in which case if I make the mysql timeout greater than this it should solve it?

Should Gitea be closing its connection after use or is it by design keeping it open?

@lafriks
Copy link
Member

lafriks commented May 14, 2019

I think golang MySQL library is keeping pool of open connections to reuse them but it should have reopen new one if current one has become invalid ihmo

@bobemoe
Copy link
Contributor

bobemoe commented May 14, 2019

I've tried setting mysql timeout on the server to 120, 180 and 300 and each time the connection times out, I hit a page and get the error still. I've now set the timeout to 3000 and been watching the logs and processes, it seems Gitea has an internal cron that is making a query every 10 mins, this is essentially acting as a keep alive heartbeat that keeps resetting the connection sleep time.

2019/05/14 20:25:13 [I] [SQL] SELECT `id`, `repo_id`, `interval`, `enable_prune`, `updated_unix`, `next_update_unix` FROM `mirror` WHERE (next_update_unix<=?) AND (next_update_unix!=0) []interface {}{1557861913}

This looks like a maintenance task and not a real keep alive implementation, so its a bit of a crazy workaround, but I'm not getting the 404/500 errors any more :)

So how do I proceed? How can I see what version of go-sql-driver/mysql is in the binary? The issue go-sql-driver/mysql#556 was closed several months ago with no real solution other than either "dont forget to db.Close()" or "disable strict mode".

@lunny
Copy link
Member

lunny commented May 15, 2019

This has been packaged into Gitea binary.

@cevin
Copy link
Author

cevin commented May 15, 2019

the problem is very annoying

@bobemoe
Copy link
Contributor

bobemoe commented May 15, 2019

I've set my timeouts in /etc/mysql/my.cnf to just over 10 mins and can confirm that the issue has cleared up.

wait_timeout = 650         
interactive_timeout = 650

This is a shared server so not an ideal solution.

@lunny how can I check which version is packaged into the binary? Is it worth me trying Gitea@dev-master?

@cevin are you able to try this on your cloud instance? or check the mysql logs for that timeout error? It would help confirm if we are describing the same issue. I guess you don't really have that level of access on a cloud?

@bobemoe
Copy link
Contributor

bobemoe commented May 15, 2019

So looks like Gitea 1.8.1 is using go mysql driver V1.4.
V1.4.1 is out now (2018-11-14) with a potential fix:
https://github.com/go-sql-driver/mysql/blob/v1.4.1/CHANGELOG.md
go-sql-driver/mysql#836 (committed Oct 1, 2018 )

The Gitea 1.8.1 branch looks like its using driver V1.4
The Gitea 1.9.0+dev-254-g710245e81 branch looks like its using driver c45f530f8e7fe40f4687eaa50d0c8c5f1b66f9e0 which was committed Dec 18, 2018 so should include that fix??

My testing on 1.9.0+dev-254-g710245e81 still has the issue :(

So either my calculations are wrong and the fix has not been included (can anyone confirm this?) or it has not worked.

Hmmm :/

@sapk
Copy link
Member

sapk commented May 15, 2019

We override the previous tag 1.4.0 with a commit dating 2018-12-18 that should contain this fix 🤔 https://github.com/go-gitea/gitea/blame/710245e81e0d65c72231dbb3b5c9f860cdc71899/go.mod#L140

@cevin
Copy link
Author

cevin commented May 16, 2019

@bobemoe yep, but I can set wait_timeout & interactive_timeout in rds( aliyun mysql cloud database) online dashboard. but looks that's not a good idea ?

@mrsdizzie
Copy link
Member

Perhaps relevant here:

gogs/gogs@0a176df

Which claims to fix similar problem and has more discussion/links here:

gogs/gogs#5532

@lunny
Copy link
Member

lunny commented May 18, 2019

gogs/gogs@0a176df should be good practice, we could take it.

@lafriks
Copy link
Member

lafriks commented May 24, 2019

That should probably be set only if database is MySQL

@lafriks lafriks added type/bug and removed status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR labels May 26, 2019
@lafriks lafriks added this to the 1.8.2 milestone May 26, 2019
@lunny
Copy link
Member

lunny commented May 29, 2019

closed by #7071

@lunny lunny closed this as completed May 29, 2019
@bobemoe
Copy link
Contributor

bobemoe commented Jun 9, 2019

Just to confirm I've tried 1.8.2 and set my mysql timeouts back to 60s. All seems to be functioning well.

Looking at mysql processlist, I can see that the behaviour has changed from my previous investigation. Gitea no no longer keeps a connection open. This solves the issue for me. Thanks everyone :)

Although I see no connection kept open, looking at the mysql id numbers jump up by about 10 now each time I hit a page, suggests Gitea is not reusing its mysql connections at all now and opening a new one per query? Just wanted to make sure this will not be a performance hit on busier sites?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants