Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kill queries on timeout #791

Closed

Conversation

Sovianum
Copy link

Description

Added mechanism to kill orphan queries after context timeout.
Used approach is similar to one used in postgres driver: if context is cancelled, new connection is opened and than KILL query is executed.
Query is killed using id of corresponding connection which is saved on initial handshake.

Related issues: #731, #496

Checklist

  • Code compiles correctly
  • Created tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary
  • Added myself / the copyright holder to the AUTHORS file

@Sovianum Sovianum force-pushed the kill-queries-on-timeout branch from 2c780b2 to 959332b Compare May 11, 2018 17:03
@Sovianum Sovianum force-pushed the kill-queries-on-timeout branch from 959332b to 9ed2daa Compare May 11, 2018 17:10
@methane
Copy link
Member

methane commented May 12, 2018

I think this should be opt-in option, because it creates unexpected connections and it won't work in some environment.

And I want to wait adding this feature until next version (v1.4) is tagged.

connection.go Outdated
}

func (mc *mysqlConn) kill() error {
conn, err := mc.d.Open(mc.dsn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't creating new connection pool! Create mysqlConn directly.
Addtionally, opening new connection should be have short timeout.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about timeout but didn't catch your idea about connection pool. Here I'm using MySQLDriver.Open method, which returns driver.Conn and error (underlying type of driver.Conn is *mysqlConn)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I was wrong about the pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#705 should be used instead of Driver.Open.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will use after it merged to master

@methane
Copy link
Member

methane commented May 12, 2018

Killing query is difficult than you think. Apart from load balanced MySQL server, there are many race scenarios.

For example:

  1. The query is slow... Timeout happened
  2. Start opening connection for sending KILL query
  3. The query is finished and MySQL server sent normal response
  4. New connection for KILL query is opened / or timed out.

In this scenario, what should we do?
Query is finished successful, it's just bit slow. We can get normal result of the query.

@Sovianum
Copy link
Author

Do you mean that timeout shout be used on the query execution stage, but not while response is being sent back to client?

connection_go18.go Outdated Show resolved Hide resolved
@methane
Copy link
Member

methane commented May 15, 2018

Do you mean that timeout shout be used on the query execution stage, but not while response is being sent back to client?

I just meant there are many race conditions we should care about before merging this.

packets.go Outdated Show resolved Hide resolved
@methane
Copy link
Member

methane commented May 15, 2018

Even though MySQL is not clustered, connection can be not unique.
For example, server closed the connection and client failed to receive FIN packet.
When timeout happened, the connection id (=thread id) may be used for another connection.

Additionally, when MySQL is under very heavy state, many concurrent queries can be timed out at same time.
In this situation, DB.SetMaxConns(N) may use 2*N connections. (1/2 of them are for kill queries)

So I think (1) mc.kill() should be have very short timeout, and (2) this should be opt-in feature.

@Sovianum
Copy link
Author

Sovianum commented May 15, 2018

(1) 50 ms?
(2) Ok, I will add extra option to config.

@Sovianum Sovianum force-pushed the kill-queries-on-timeout branch 2 times, most recently from aea431d to 1251784 Compare May 15, 2018 17:01
@Sovianum Sovianum force-pushed the kill-queries-on-timeout branch from 1251784 to eabce5d Compare May 15, 2018 17:05
@Sovianum
Copy link
Author

Made this feature optional and added tests to check that query is not killed if flag is not activated.
You said, you don't want to add this feature until version 1.4, but do you have any forecast when it will be tagged? I'm asking this since we extensively use this driver in production and periodically find our database loaded with long orphaned queries. So this feature is really important for us.

@julienschmidt julienschmidt added this to the v1.5.0 milestone May 20, 2018
@ceshihao

This comment has been minimized.

@methane
Copy link
Member

methane commented Jun 13, 2018

@Sovianum @ceshihao
This feature is very difficult to review (many possible race conditions which can be very difficult to debug. I need to check whole code, not only diff). So I can't estimate when we can ship this.

I recommend using timeout by optimizer-hint. It's much safer than this.

@pjebs
Copy link

pjebs commented Mar 14, 2019

@methane
Copy link
Member

methane commented Mar 14, 2019

@pjebs Thanks for writing it. My comments:

  • It can't be used for load balanced environment. It may even kill different query running on another DB instance, because thread id is not UUID.
  • When you need only timeout for heavy select query, MAX_EXECUTION_TIME optimizer hint is better than KILL QUERY on timeout.

@pjebs
Copy link

pjebs commented Mar 16, 2019

I've improved my design for reverse proxy scenarios for mysql using SELECT @@global.server_uuid to identify the correct server. When it's time to send the kill signal, I first check if the server is correct.

Unfortunately I wasn't able to find a reliable way to get a unique server identifier for mariadb.

@methane Thanks for the feedback.

@julienschmidt julienschmidt modified the milestones: v1.5.0, v1.6.0 Apr 24, 2019
@RafeKettler
Copy link

@Sovianum @ceshihao
This feature is very difficult to review (many possible race conditions which can be very difficult to debug. I need to check whole code, not only diff). So I can't estimate when we can ship this.

I recommend using timeout by optimizer-hint. It's much safer than this.

@methane is there a path to getting this PR merged from here? While I'm not sure if @Sovianum still needs this feature or has found a workaround, I could really use this feature on an opt-in basis. The MAX_EXECUTION_TIME optimizer hint you refer to doesn't work for my use case -- I'd like to be able to cancel queries other than a SELECT with a timeout. I am currently using KILL QUERY as a workaround, but it is pretty painful to have to add this logic everywhere I want to ensure a query is canceled after it times out.

@pjebs
Copy link

pjebs commented Jul 1, 2019

@methane
Copy link
Member

methane commented Jul 1, 2019

@methane is there a path to getting this PR merged from here?

At least, new Connector interface should be used. At most one connection should be used for cancelling many queries.

@pjebs
Copy link

pjebs commented Jul 2, 2019

It definitely should be optional because you need to waste time querying for the connection id.
Most production queries would be milliseconds in duration, so cancellation itself is "overkill" or the query time to fetch the connection id is "overkill"

@Sovianum Sovianum closed this Jul 22, 2019
@jpcope
Copy link

jpcope commented Sep 12, 2019

I would love to see this functionality added to mysql.

I would not mind taking a stab at it. Circuit breaking is an important part of reliable, timely delivery in a microservice world so data stores remain highly available.

Is now the right time to add this functionality or are there other high priority refactors in the works that could lead to PRs being closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants