-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Foreign Key Support in Vitess #12967
Comments
General tracking issue: #11975 |
We heavily use FKs, so I am still wrapping my head around this proposal. Here are my thoughts in no particular order: LatencyObviously handling this at vtgate will increase latency vs being enforced in MySQL. For the majority of our use cases, I don't think we would be willing to pay that price, as we're just enforcing shard-local keys. For cross-shard purposes, today we vreplicate PK tables into destination shards for critical FK relationships, but I do see where this would be a win for larger tables where it would be size/cost prohibitive to copy those around, and we would opt in for some of those FOREIGN_KEY_CHECKS
This would be a complete non-starter for us. I would much rather see a separate connection pool for this CascadeWe don't use this at all today, so no preferences about Vitess support foreign_key_mode
Following my above points, I would prefer to choose the mode at a per table level, instead of at the entire vtgate/vttablet level. I understand that increases the complexity, but if I had to choose, I would stay with MySQL enforced FKs over Vitess enforced |
@derekperkins this is driven by the idea that |
I totally understand that point, and am fine with this choice if I have opted into Vitess FK mode. The crux of it for me is that I would only opt into Vitess FK mode for a small subset of tables at most, and thus wouldn't be ok with FKs being disabled at the MySQL level for all connections. At a practical level, I would think this could be supported, given my preference for a per table opt in. When the DML is parsed at the vtgate level, it checks to see if the table is Vitess managed or not. If Vitess managed, handle it as described in the RFC, and at the vttablet level, use the FK disabled pool. If not Vitess managed, use the normal pool with FKs enabled. |
@derekperkins Vitess knows about the schema and vschema so it knows when the foreign key constraint is applied at the shard level and when it is going cross-shard. We want to reduce the operational burden here. |
Me, @harshit-gangal and @shlomi-noach had a discussion today and we realised that it might be better to keep the information about how to deal with foreign keys as a key-space level configuration instead of a flag on vtgates. There are 2 reasons for this -
|
A couple of updates. We have reworked the phases of the project and we'll store the foreign key mode in the VSchema instead of storing it in the keyspace record. |
VTGate should ignore foreign key constraints where one (or both) of the related tables is an internal Vitess table: #13894 |
Pending Task:
Addon:
Next Set of Support:
|
Introduction
This is an RFC for adding Foreign Key Support in Vitess.
Use Case
Scope of the project
Out of Scope
Schema
The following is the schema we will use for providing examples as we discuss the design of the foreign key support.
MySQL Schema
The data we insert for the examples that follow are -
Design
This section dives into more specific details of what we intend to do to support foreign keys.
Basic Design
FOREIGN_KEY_CHECKS=0
on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.SET DEFAULT
).ON DUPLICATE KEY UPDATE
support).foreign_key_mode
is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and putForeignKeyMode
as a VSchema configuration.Planning
INSERTs
ON DUPLICATE KEY UPDATE
Example
For example, if the user was to execute
insert into orders (id, product_id, customer_id) values (4, :a, :b);
, the set of steps that Vitess would take -START TRANSACTION
SELECT 1 FROM product WHERE ID = :a FOR SHARE
,SELECT 1 FROM customer WHERE ID = :b FOR SHARE
COMMIT
Planning
UPDATEs
andDELETEs
UPDATE
andDELETE
depends on the referential actions that they are configured with. Let's dive into each one that MySQL allowsRESTRICT
/NO ACTION
(default)SELECT
validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!UPDATE
/DELETE
to aSELECT
that returns the rows that are being updated/deleted.SELECT
query and use these results to generate the validation queries.DELETE FROM customer
which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and theSELECT
validation query it executes might be extremely large too. This could lead to OOMs. We can add aLIMIT
clause to the equivalentSELECT
statement of theDELETE
and reject such mass updates/deletes if we get more results than theLIMIT
allows.Example
For example, if the user was to execute
DELETE FROM customer WHERE area_id = 2;
, then Vitess would need to take the following steps -START TRANSACTION
DELETE
query into aSELECT
with the sameWHERE
clause. So Vitess would executeSELECT id FROM customer WHERE area_id = 2;
. We would get back the following result -customer
table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them -SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE
andSELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE
.COMMIT
.CASCADE
RESTRICT
case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).UPDATE
/DELETE
to aSELECT
that returns the rows that are being updated/deleted.SELECT
query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.UPDATE
/DELETE
for the children rows in the same transaction. Do this until no further cascades are required.ROLLBACK
.LIMIT
clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.SET NULL
SET NULL
is very similar toCASCADE
. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.SET DEFAULT
SET NULL
. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.Planning
REPLACE
REPLACE
statements are only supported in unsharded mode.REPLACE
we'll plan the DELETE and INSERT and execute aSELECT
query to decide if aDELETE
is necessary.Important Considerations
INSERT
/UPDATE
/DELETE
only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only beSELECT... FOR SHARE
statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on theCOMMIT
succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.FOREIGN_KEY_CHECKS
to 0).Data structure to store FK constraints in VSchema
Schema tracking will give us a list of foreign key constraints as part of the
SHOW CREATE TABLE
output. We want to store thisoutput in the
VSchema
struct in a form that gives us the best performance while planning.We'll need to answer queries of the following sorts -
INSERT
s)DELETE's and
UPDATE`s)The
VSchema
struct stores a map ofKeyspaceSchema
for each keyspace. Within aKeyspaceSchema
we have a map ofTable
.We'll store the foreign key constraints inside this
Table
struct.We'll add 2 more fields to the
Table
struct -Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.
The
ForeignKeyConstraint
struct would look something like this -Performance Improvements
INSERT
,UPDATE/DELETE
(with Restrict) checks for us by usingFOREIGN_KEY_CHECKS=1
on the connection for unsharded and single-sharded cases.Phases
INSERT
,UPDATE
andDELETE
statements for unsharded.RESTRICT
/NO ACTION
,CASCADE
,SET NULL
mode for foreign key constraints will be supported.ON DUPLICATE KEY UPDATE
in INSERTs for unsharded.REPLACE
for unsharded.INSERT/UPDATE/DELETE ... (SELECT)
(SELECT subquery in DMLs) for unsharded.Prerequisites
INSERT
planning in Gen4. Gen4: move insert planner to gen4 #12934GetSchema
RPC to also work forTable
andAll
type of input #13197Tasks
- [ ] On the vttablet, we need FOREIGN_KEY_CHECKS=0.INSERT
planning #13676UPDATE
planning #13762DELETE
planning #13746The text was updated successfully, but these errors were encountered: