Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Adding a new (unrelated) keyspace can break working queries because of the routing of dual #12674

Closed
aquarapid opened this issue Mar 20, 2023 · 2 comments · Fixed by #13359

Comments

@aquarapid
Copy link
Contributor

aquarapid commented Mar 20, 2023

Overview of the Issue

Because of the code:

https://github.com/vitessio/vitess/blob/release-16.0/go/vt/vtgate/vindexes/vschema.go#L695-L702

Any naked query for dual will be routed to the first keyspace alphabetically, including queries rewritten to use dual, like select now();. This query is sometimes used as a "are you alive" query by MySQL connectors.

This behavior is a footgun, with two scenarios:

  • You think your existing keyspace aaa is unused, it has no tables being used anymore. You go ahead and remove/shutdown the primary vttablet, and suddenly all queries of select now(); or other queries to dual start failing, and everything breaks.
  • You add a new keyspace called aaa. It is not supposed to be in production for another month, but as soon as you add it (CreateKeyspace aaa), things start failing because it's the first keyspace alphabetically and you do not have serving tablets in the keyspace yet, e.g.:
 ERROR 1105 (HY000): keyspace aaa fetch error: node doesn't exist: /vitess/zone1/keyspaces/aaa/SrvKeyspace

Workaround:

  • Create an explicit routing rule for dual to a working keyspace (preferably before add or remove a keyspace that is first alphabetically), something like:
    {
      "fromTable": "dual",
      "toTables": [
        "workingkeyspace.dual"
      ]
    },
    {
      "fromTable": "dual@replica",
      "toTables": [
        "workingkeyspace.dual"
      ]
    }

Discussion:

  • We can't route this to just a random keyspace, because then the behavior would be inconsistent (and differ from current behavior).
  • A solution could be to allow you to specify a "default" keyspace to route queries like this to; the only question would be how to express this in a global (cross-keyspace) fashion in the topo. An alternate way would be to make it a vtgate config option, but that seems less clean?

Reproduction Steps

See overview

Binary Version

16.0;  but this behavior has been in Vitess for years.

Operating System and Environment details

N/A

Log Fragments

N/A
@aquarapid aquarapid added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged labels Mar 20, 2023
@dbussink
Copy link
Contributor

Any naked query for dual will be routed to the first keyspace alphabetically, including queries rewritten to use dual, like select now();. This query is sometimes used as a "are you alive" query by MySQL connectors.

This overlooks a simple but key element. This only happens when the evalengine doesn't support the query and can't execute it directly. So improving the evalengine can help significantly here to reduce the number of queries we need this for. Potentially to the point of it not being a problem.

#9647 tracks support in the evalengine that would help here significantly. We could focus here on some common query patterns that are often used against dual to resolve this for most cases. I'll look at adding at least support for NOW() (and associated versions of it) then.

@rohit-nayak-ps rohit-nayak-ps added Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Mar 28, 2023
@harshit-gangal
Copy link
Member

harshit-gangal commented May 9, 2023

I do not see a straightforward fix for this.
I propose we introduce a field in Vschema at the keyspace level to tell if it is available for routing the query or not. When Vitess has to pick any random keyspace for routing it can ignore the unavailable keyspaces.

@aquarapid @deepthi do you think this is an acceptable change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants