Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Co-partitioned tables #79

Open
rkarthik007 opened this issue Mar 6, 2018 · 9 comments
Open

Co-partitioned tables #79

rkarthik007 opened this issue Mar 6, 2018 · 9 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@rkarthik007
Copy link
Collaborator

rkarthik007 commented Mar 6, 2018

Jira Link: DB-1949
This is a feature request for implementing co-partitioned tables in order to achieve distributed transactions with high performance in certain scenarios.

This should allow creating a table A, and then creating another table B with the same partition key as A and specify that we want B to be co-partitioned with A. The two tables would share tablets, raft groups, load-balancing decisions, etc. And we should allow for updates to these tables to happen atomically if the update involves the same partition key (i.e. userid in our example above).

For example:

CREATE TABLE user (
  userid int, 
  username text,
  PRIMARY KEY (userid)
);

CREATE TABLE messages (
  userid int,
  msgid int,
  msgtext text,
  PRIMARY KEY ((userid), msgid)
) with PARTITION SCHEME OF user;

This should ensure that the user and messages tables have the same tablet splits with respect to the primary hash keys, and the tablets serving a given key range across the two tables are always colocated on the same node.

Background in other NoSQL systems:

Apache Cassandra

In Apache Cassandra, if two tables share the same replication strategy and same partition key they will co-locate their partitions. So as long as the two tables are in the same keyspace AND their partition keys match, for example:

Customer table with PRIMARY KEY (customer_id)

and an

Invoice table with PRIMARY KEY ((customer_id), invoice_id)

Then, the data for a given customer_id across both tables is guaranteed to be colocated on the same set of nodes.

HBase

In HBase, co-partition happens across column families that share the same ROW KEY.

Additionally, in HBase, since a common WAL is shared for cross-column family mutations, we also guaranteed cross-column family atomicity.

@rkarthik007 rkarthik007 added this to To Do in YBase features via automation Mar 6, 2018
@rkarthik007 rkarthik007 added kind/enhancement This is an enhancement of an existing feature kind/new-feature This is a request for a completely new feature and removed kind/enhancement This is an enhancement of an existing feature labels Apr 8, 2018
@ddorian
Copy link
Contributor

ddorian commented Oct 28, 2018

Will this have ability to co-partition existing tables ?

@kmuthukk
Copy link
Collaborator

@ddorian - you mean "after the fact" co-partition of two tables that already have data and but are arbitrarily partitioned? This requires a lot more engineering and is a bit more complex, and wasn't on the plans for the initial cut.

The initial plans are around supporting creation of a "new" table that's co-partitioned with an existing table (which may contain data already). Under the covers the co-partitioned table will share Raft groups with it's parent table.

@ddorian
Copy link
Contributor

ddorian commented Oct 28, 2018

@kmuthukk Yes. If the number of tablets & partition-key is the same, data should be split the same I guess? You just had to migrate the tablets into the same servers.

Mostly asking cause it's a big performance improvement for colocated transactions/joins/aggregations.

TLDR: it's a pretty big feature, any eta ?

@kmuthukk
Copy link
Collaborator

Yes - number of tablets being same would make things a bit easier. Merging previously independent RAFT groups would be a bit more trickier... especially if we try to support such an operation in an "online" fashion.

Will have to get back to you on ETA after discussion with team.

@ddorian
Copy link
Contributor

ddorian commented Nov 16, 2018

@kmuthukk any type of ETA ? Like Qx 20xx. The same for #531.

@kmuthukk
Copy link
Collaborator

kmuthukk commented Nov 18, 2018

hi @ddorian

Don't have an exact timeline for this feature specifically, but by Q2'2019 quite likely. Specific use cases or customer requests might change/accelerate this.

@mbautin mbautin assigned ttyusupov and unassigned robertpang Jan 8, 2019
mbautin pushed a commit that referenced this issue Jun 20, 2019
* Correct mistakes in syntax description.

* Removed quote around keyword

* Fixed a few more typos in syntax and removed single quote from '(' and ','
@ndeodhar
Copy link
Contributor

From @ddorian's input on #1133 : Have you thought of colocation of range-sharding ? If tablet becomes too big, split it (and split all colocated tables on the same keys even though they're small. And on tablet-distribution, move them together.

@bmatican bmatican added this to the v2.1 milestone Sep 1, 2019
@rkarthik007 rkarthik007 removed this from the v2.1 milestone Jun 5, 2020
@rkarthik007 rkarthik007 added the area/docdb YugabyteDB core features label Jun 5, 2020
@frozenspider frozenspider added this to To do in Colocation via automation Apr 20, 2022
@frozenspider
Copy link
Contributor

This issue is quite old. While we haven't implemented co-partitioning as proposed here (we still might want to), we did introduce a concept of colocation (via either database (#3033) or tablegroups (#11665, currently in the works)).

@tverona1 tverona1 removed this from To do in Colocation May 18, 2022
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Jun 9, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/new-feature This is a request for a completely new feature labels Jul 29, 2022
@Bessonov
Copy link

Bessonov commented Mar 6, 2023

I am also interested in this feature. My use case is partitioning tables by tenant_id (not part of PK) with a lot of relations, especially around permission checking. I have raised a duplicate of this issue #16316 .

The implementation in Citus: Creating And Distributing Tables and Choosing Distribution Column. This is a very powerful way of associating data (especially with colocate_with and reference tables). From my point of view, this is exactly what I want to achieve. Of course, there are more fine-grained wishes like a partly (row- and column-wise) distributed reference table...

If I understand correctly, colocation of tables addresses another issue, not directly related here. And I am not sure about Tablegroups. in order to do this tables must share a "common" partition key (types & number of columns) sounds right, but I can't match it with the rest of the text.

Note that cockroachdb already had interleaved tables, but removed them again. (I am glad that we didn't use them, even though they fit well into the project).

Is there a plan for this feature request? And it would be nice to see how this feature affects the benchmarks when used.

druzac added a commit that referenced this issue Apr 4, 2023
Summary: Copartitioning was a feature from a few years ago. Original GH issue [[ #79 | request ]]. Some work was done to implement it but it was never completed and the feature was abandoned. This diff removes the logic wired into the master and tserver to support it.

Test Plan: builds & no new broken tests

Reviewers: timur, bogdan, jhe, yguan

Reviewed By: jhe, yguan

Subscribers: ybase, slingam, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D23661
premkumr pushed a commit to premkumr/yugabyte-db that referenced this issue Apr 10, 2023
Summary: Copartitioning was a feature from a few years ago. Original GH issue [[ yugabyte#79 | request ]]. Some work was done to implement it but it was never completed and the feature was abandoned. This diff removes the logic wired into the master and tserver to support it.

Test Plan: builds & no new broken tests

Reviewers: timur, bogdan, jhe, yguan

Reviewed By: jhe, yguan

Subscribers: ybase, slingam, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D23661
jasonyb pushed a commit that referenced this issue Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
YBase features
  
Backlog
Development

No branches or pull requests

10 participants