Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Foreign Key Support in Vitess #12967

Open
12 tasks done
GuptaManan100 opened this issue Apr 25, 2023 · 11 comments
Open
12 tasks done

RFC: Foreign Key Support in Vitess #12967

GuptaManan100 opened this issue Apr 25, 2023 · 11 comments
Assignees
Labels

Comments

@GuptaManan100
Copy link
Member

GuptaManan100 commented Apr 25, 2023

Introduction

This is an RFC for adding Foreign Key Support in Vitess.

Use Case

  • The use case is to support cross-shard foreign key constraints for a keyspace to establish relations between different tables.
  • Current restrictions require the schema to be structured such that rows for tables linked by foreign keys need to live on the same shard. This constrains schema design and sharding key options.
  • After the suggested changes, we would be able to support adding arbitrary foreign keys irrespective of the vschema / sharding key configuration.

Scope of the project

  • Foreign keys in a single keyspace.
  • Foreign keys will be created in MySQL. Online DDL with vanilla MySQL will not work if foreign keys are created.

Out of Scope

  • Cross-keyspace foreign key support.

Schema

The following is the schema we will use for providing examples as we discuss the design of the foreign key support.

MySQL Schema
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables;
+---------------------------+
| Tables_in_foreign_key_rfc |
+---------------------------+
| area                      |
| contact                   |
| customer                  |
| orders                    |
| product                   |
+---------------------------+
5 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table area;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                        |
  +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | area  | CREATE TABLE `area` (
                                  `id` int NOT NULL,
                                  `name` varchar(30) DEFAULT NULL,
                                  `zipcode` int DEFAULT NULL,
                                  PRIMARY KEY (`id`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table contact;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                                                                                                                  |
  +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | contact | CREATE TABLE `contact` (
                                       `id` int NOT NULL,
                                       `contactnum` varchar(10) DEFAULT NULL,
                                       `customer_id` int DEFAULT NULL,
                                       PRIMARY KEY (`id`),
                                       KEY `customer_id` (`customer_id`),
                                       CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table customer;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                          |
  +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | customer | CREATE TABLE `customer` (
                                         `id` int NOT NULL,
                                         `name` varchar(30) DEFAULT NULL,
                                         `area_id` int DEFAULT NULL,
                                         PRIMARY KEY (`id`),
                                         KEY `area_id` (`area_id`),
                                         CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`area_id`) REFERENCES `area` (`id`)
               ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table orders;
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                              |
  +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | orders | CREATE TABLE `orders` (
                                     `id` int DEFAULT NULL,
                                     `product_id` int DEFAULT NULL,
                                     `customer_id` int DEFAULT NULL,
                                     KEY `product_id` (`product_id`),
                                     KEY `customer_id` (`customer_id`),
                                     CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`),
                                     CONSTRAINT `orders_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
             ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table product;
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                             |
  +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | product | CREATE TABLE `product` (
                                       `id` int NOT NULL,
                                       `name` varchar(30) DEFAULT NULL,
                                       PRIMARY KEY (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
erDiagram
    Area {
        id int PK
        name varchar
        zipcode int
    }

    Product {
        id int PK
        name varchar
    }

    Order {
        id int PK
        produce_id int FK
        customer_id int FK
    }

    Customer {
        id int PK
        name varchar
        area_id int FK
    }

    Contact {
        id int PK
        contact varchar
        customer_id int FK
    }

    Product ||--}o Order :""
    Customer ||--}o Order :""
    Customer ||--}o Contact :""
    Area ||--}o Customer :""
Loading

The data we insert for the examples that follow are -

delete from area;
delete from product;
delete from customer;
delete from orders;
delete from contact;
insert into product values ('1', 'Fan'), ('2', 'Cooler'), ('3', 'AC');
insert into area values(1, 'US', '521763'), (2, 'India', '432143');
insert into customer values(1, 'Manan', 2), (2, 'Andres', 1), (3, 'Harshit', 2);
insert into contact values (1, '897876876', 3), (2, '234123453', 1), (3, '789975234', 2), (4, '343214314', 1);
insert into orders values (1, 1, 2), (2, 1, 1), (3, 1, 1), (4, 2, 1), (4, 2, 3);

Design

This section dives into more specific details of what we intend to do to support foreign keys.

Basic Design

  • Foreign keys will be passed down to MySQL and created there. This will give us inherent support for information schema queries since foreign key definitions will exist on MySQL level.
  • We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.
  • Vitess will be aware of the foreign key constraints and enforce them on all DMLs.
    • We’ll change Schema Tracking to start using GetSchema RPC call instead of execute Select statements directly. Schema tracking will then be used for getting the create definition calls from GetSchema.
    • On vtgate we’ll create a graph of foreign key constraints (table-relationship graph). We’ll also need to store default values for all columns. (Required for SET DEFAULT).
    • Description of the data structure to follow.
    • We’ll also need to store the uniqueness constraints for the table. (Not required immediately, but will be needed for ON DUPLICATE KEY UPDATE support).
    • Note - There is a delay between the DDL operations executing and between schema tracking updating vtgate. Needs discussion.
  • foreign_key_mode is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and put ForeignKeyMode as a VSchema configuration.

Planning INSERTs

  • While planning inserts, we’ll consult the table-relationship graph and get the foreign key constraints where the column is a child and plan the SELECT validation queries to make sure the corresponding parent row exists. We’ll get a shared lock on the SELECT.
  • If we’re in AUTOCOMMIT mode, we still have to start a transaction.
  • Overall steps look like -
    • Step 1: If there is no transaction, we’ll start one.
    • Step 2: Execute all the validation queries parallel and get a lock in share mode
    • Step 3: If any of those validation queries return empty results, we fail and rollback (only if the transaction was started implicitly).
    • Step 4: Execute the Insert and vindex updates.
    • Step 5: commit the transaction (if we started one in step 1).
  • ON DUPLICATE KEY UPDATE
    • Plan the update and insert. Runtime defines which one you execute.
    • Run a SELECT statement to know which side to run.

Example

For example, if the user was to execute insert into orders (id, product_id, customer_id) values (4, :a, :b);, the set of steps that Vitess would take -

  1. Start a transaction. START TRANSACTION
  2. Validation queries - SELECT 1 FROM product WHERE ID = :a FOR SHARE, SELECT 1 FROM customer WHERE ID = :b FOR SHARE
  3. If both the queries return a row, then we execute the insert.
  4. COMMIT

Planning UPDATEs and DELETEs

  • Planning of UPDATE and DELETE depends on the referential actions that they are configured with. Let's dive into each one that MySQL allows

RESTRICT/NO ACTION (default)

  • In this configuration, MySQL rejects any update or delete to a row which has a column that is a parent in a foreign key relation.
  • Planning for this, is very similar to INSERTS, in the part that before we run the update or delete we need to verify if there exists a foreign key constraint that references the row.
  • There are subtle differences between the two though, the first being that here we care for foreign key constraints where the table being updated is the parent in the relation, the second is that here we fail if we find a single row matching our SELECT validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!
  • Overall, the steps look like the following -
    • Step 1: If there is no transaction, we’ll start one.
    • Step 2: Convert the UPDATE/DELETE to a SELECT that returns the rows that are being updated/deleted.
    • Step 3: Run the SELECT query and use these results to generate the validation queries.
    • Step 4: Execute all the validation queries parallel and get a lock in share mode
    • Step 5: If any of those validation queries return any results, we fail and roll back (only if the transaction was started implicitly).
    • Step 6: Execute the Update/Delete and vindex updates.
    • Step 7: commit the transaction (if we started one in step 1).
  • Possible abuse from the user.
    • If the user runs a query like DELETE FROM customer which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and the SELECT validation query it executes might be extremely large too. This could lead to OOMs. We can add a LIMIT clause to the equivalent SELECT statement of the DELETE and reject such mass updates/deletes if we get more results than the LIMIT allows.

Example

For example, if the user was to execute DELETE FROM customer WHERE area_id = 2;, then Vitess would need to take the following steps -

  1. Start a transaction. START TRANSACTION
  2. Convert the DELETE query into a SELECT with the same WHERE clause. So Vitess would execute SELECT id FROM customer WHERE area_id = 2;. We would get back the following result -
+----+
| id |
+----+
|  1 |
|  3 |
+----+
2 rows in set (0.00 sec)
  1. These results will drive the next validation queries we run. Since the customer table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them - SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE and SELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE.
  2. If any of the validation queries return any rows, we fail and rollback.
  3. Execute the delete and vindex updates (if required).
  4. COMMIT.

CASCADE

  • In this configuration, MySQL cascades any updates/deletes to the children in the foreign key constraints. So a deletion of the parent will trigger the deletion of the child rows that are referenced by it.
  • Planning for CASCADEs is a little complex since the rows that we need to cascade the delete too will only be available on run time. Those rows themselves could have foreign key constraints that could fail the operation, in which case, the entire transaction has to be rolled back. In the RESTRICT case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).
  • Overall, the steps look like the following -
    • Step 1: If there is no transaction, we’ll start one.
    • Step 2: Convert the UPDATE/DELETE to a SELECT that returns the rows that are being updated/deleted.
    • Step 3: Run the SELECT query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.
    • Step 4: Execute the original Update/Delete and vindex updates.
    • Step 5: Repeat the UPDATE/DELETE for the children rows in the same transaction. Do this until no further cascades are required.
    • Step 5: If any of them fail, we'll need to ROLLBACK.
    • Step 6: commit the transaction (if we started one in step 1).
  • Possible abuse from the user.
    • Cascading deletes can end up deleting a lot of rows, especially since deleting a child row could trigger a cascade from a foreign key constraint for a different column other than the child column in the original foreign key constraint. We could add the LIMIT clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.

SET NULL

  • The planning for SET NULL is very similar to CASCADE. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.

SET DEFAULT

  • Not supported in InnoDb. We still need to decide if we need to support this in Vitess.
  • If we decide to support it, then it is very similar to SET NULL. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.

Planning REPLACE

  • Currently REPLACE statements are only supported in unsharded mode.
  • To support REPLACE we'll plan the DELETE and INSERT and execute a SELECT query to decide if a DELETE is necessary.

Important Considerations

  • Checking for constraints will lead to cross-shard transactions. If the INSERT/UPDATE/DELETE only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only be SELECT... FOR SHARE statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on the COMMIT succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.
  • Users should still try to design schema such that parent and child row lives in the same shard as it will be more optimized for enforcing FK constraints.
  • Cyclic foreign key constraints: The design we described above will work fine even when two tables have foreign key references to each other as long as it is not cyclic on the column level. Having a cycle of foreign key constraints where a single column is both a parent and a child in the cycle, then that would mean that no data can be inserted into that column ever! Since to insert data into that column, the data should already exist in that column! We don't plan to worry about this initially, we'll see if we need to add support for this eventually. (It can be done on MySQL. Inserting data has to be done by setting FOREIGN_KEY_CHECKS to 0).
flowchart TD
    subgraph This is acceptable
    Table1
    col1[[col1]]
    col2[[col2]]
    Table1 --- col1
    Table1 --- col2

    Table2
    col3[[col3]]
    col4[[col4]]
    Table2 --- col3
    Table2 --- col4

    col1 -.-> col3
    col4 -.-> col2
    end

    subgraph This won't work
    Table3
    col5[[col1]]
    col6[[col2]]
    Table3 --- col5
    Table3 --- col6

    Table4
    col8[[col3]]
    col7[[col4]]
    Table4 --- col8
    Table4 --- col7

    col5 -.-> col8
    col8 -.-> col5
    end
Loading

Data structure to store FK constraints in VSchema

Schema tracking will give us a list of foreign key constraints as part of the SHOW CREATE TABLE output. We want to store this
output in the VSchema struct in a form that gives us the best performance while planning.

We'll need to answer queries of the following sorts -

  1. For a given table, find all the foreign key constraints where it is a child. (Needed for planning INSERTs)
  2. For a given table, find all the foreign key constraints where it is a parent. (Needed for planning DELETE's and UPDATE`s)

The VSchema struct stores a map of KeyspaceSchema for each keyspace. Within a KeyspaceSchema we have a map of Table.
We'll store the foreign key constraints inside this Table struct.

We'll add 2 more fields to the Table struct -

type Table struct {
	...
	ParentFKs []*ForeignKeyConstraint
	ChildFKs []*ForeingKeyConstraint
}

Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.

The ForeignKeyConstraint struct would look something like this -

type ForeignKeyConstraint struct {
    ParentTable TableName
    ParentColumns []Columns
    ChildTable TableName
    ChildColumns []Columns
    OnDeleteAction int // This will be an enum
    OnUpdateAction int
}

Performance Improvements

  1. We can get MySQL to run the INSERT, UPDATE/DELETE (with Restrict) checks for us by using FOREIGN_KEY_CHECKS=1 on the connection for unsharded and single-sharded cases.
  2. For CASCADE and SET NULL we want vtgate to split them up into separate queries so that we have binlog entries for them and vreplication operations like Reshard and MoveTables work.

Phases

  1. Minimal Viable Product
    1. Support for basic INSERT, UPDATE and DELETE statements for unsharded.
    2. RESTRICT/NO ACTION, CASCADE, SET NULL mode for foreign key constraints will be supported.
    3. Support for ON DUPLICATE KEY UPDATE in INSERTs for unsharded.
    4. Support for REPLACE for unsharded.
  2. Phase two
    1. INSERT/UPDATE/DELETE ... (SELECT) (SELECT subquery in DMLs) for unsharded.
    2. Support for single-shard foreign key constraints.
  3. Phase three
    1. Cross-shard support.

Prerequisites

Tasks

@shlomi-noach
Copy link
Contributor

General tracking issue: #11975

@derekperkins
Copy link
Member

We heavily use FKs, so I am still wrapping my head around this proposal. Here are my thoughts in no particular order:

Latency

Obviously handling this at vtgate will increase latency vs being enforced in MySQL. For the majority of our use cases, I don't think we would be willing to pay that price, as we're just enforcing shard-local keys.

For cross-shard purposes, today we vreplicate PK tables into destination shards for critical FK relationships, but I do see where this would be a win for larger tables where it would be size/cost prohibitive to copy those around, and we would opt in for some of those

FOREIGN_KEY_CHECKS

We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections

This would be a complete non-starter for us. I would much rather see a separate connection pool for this

Cascade

We don't use this at all today, so no preferences about Vitess support

foreign_key_mode

We’ll add a new mode for foreign_key_mode in VTGate called Vitess Managed

Following my above points, I would prefer to choose the mode at a per table level, instead of at the entire vtgate/vttablet level. I understand that increases the complexity, but if I had to choose, I would stay with MySQL enforced FKs over Vitess enforced

@shlomi-noach
Copy link
Contributor

We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections

This would be a complete non-starter for us. I would much rather see a separate connection pool for this

@derekperkins this is driven by the idea that vitess would own validating and cascading foreign key writes. Enabling FOREIGN_KEY_CHECKS on MySQL means that on top of vitess already validating the relationship, and already e.g. cascading a DELETE, so would MySQL. Which means even more lookups on child/parent tables, which is wasteful.

@derekperkins
Copy link
Member

Enabling FOREIGN_KEY_CHECKS on MySQL means that on top of vitess already validating the relationship, and already e.g. cascading a DELETE, so would MySQL. Which means even more lookups on child/parent tables, which is wasteful

I totally understand that point, and am fine with this choice if I have opted into Vitess FK mode. The crux of it for me is that I would only opt into Vitess FK mode for a small subset of tables at most, and thus wouldn't be ok with FKs being disabled at the MySQL level for all connections.

At a practical level, I would think this could be supported, given my preference for a per table opt in. When the DML is parsed at the vtgate level, it checks to see if the table is Vitess managed or not. If Vitess managed, handle it as described in the RFC, and at the vttablet level, use the FK disabled pool. If not Vitess managed, use the normal pool with FKs enabled.

@harshit-gangal
Copy link
Member

@derekperkins Vitess knows about the schema and vschema so it knows when the foreign key constraint is applied at the shard level and when it is going cross-shard.
So, the query planner can take care of it.
External information would not be needed to do the optimization for the case you highlighted above.

We want to reduce the operational burden here.

@GuptaManan100
Copy link
Member Author

GuptaManan100 commented May 2, 2023

Me, @harshit-gangal and @shlomi-noach had a discussion today and we realised that it might be better to keep the information about how to deal with foreign keys as a key-space level configuration instead of a flag on vtgates. There are 2 reasons for this -

  1. It doesn't make sense for some vtgates to manage foreign keys while some don't. Similarly, the configuration has to be consistent with vttablets. We avoid all of this misconfiguration if we put it into the topo-server
  2. It gives the user more control on which keyspaces they want Vitess to manage the foreign keys and on which keyspaces they don't want Vitess to do anything. This will also address @derekperkins's point to some extant. I think it would be too much to do control Vitess/MySQL managed foreign keys on a table level, but doing it on a keyspace level looks like a good idea to me

@GuptaManan100
Copy link
Member Author

A couple of updates. We have reworked the phases of the project and we'll store the foreign key mode in the VSchema instead of storing it in the keyspace record.

@frouioui frouioui added this to v18.0.0 Jun 30, 2023
@frouioui frouioui moved this to In Progress in v18.0.0 Jun 30, 2023
@frouioui frouioui removed this from v17.0.0 Jun 30, 2023
@shlomi-noach
Copy link
Contributor

shlomi-noach commented Aug 30, 2023

VTGate should ignore foreign key constraints where one (or both) of the related tables is an internal Vitess table: #13894

@harshit-gangal
Copy link
Member

harshit-gangal commented Nov 20, 2023

Pending Task:

Addon:

  • Improve FKs performance for unsharded keyspace

Next Set of Support:

  • Multi Table Update
  • Multi Table Delete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

4 participants