Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support backfilling of secondary index #448

Closed
robertpang opened this issue Aug 20, 2018 · 3 comments
Closed

Support backfilling of secondary index #448

robertpang opened this issue Aug 20, 2018 · 3 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue roadmap-tracking-issue This issue tracks a major roadmap item, and usually appears in the roadmap list.
Milestone

Comments

@robertpang
Copy link
Contributor

robertpang commented Aug 20, 2018

Jira Link: DB-2465
Upon adding new indexes to a table that already has data, this feature would enable building these indexes in an online manner, while continuing to serve other traffic. Note that this feature should work across both YSQL and YCQL APIs. It should support:

  • Online builds: Support building the indexes without locking out reads or writes on the table. The index build itself will occur asynchronously.
  • Correctness: After the index builds are completed, they should be consistent with the data in the primary table.
  • Constraint violations: If a problem arises while scanning the table, such as a unique constraint violation in a unique index, the CREATE INDEX command should abort and result in a failure. An aborted index will be cleaned up and deleted. Details (such as which constraints were violated) will be found in the logs.
  • Efficient for large datasets: Index build should occur in a distributed manner (utilizing multiple/all nodes in the cluster) to efficiently handle large datasets.
  • Resilience: The index build should be resilient to failures. The entire build process should not need to restart on a node failure in the cluster.

Prerequisites

Status Feature
Design doc: https://github.com/yugabyte/yugabyte-db/blob/master/architecture/design/online-index-backfill.md
Basic online schema change framework
Multi phase alter table to support online index creation in YB-Master

Phase 1 - simple index backfill

Status Feature
YCQL: Backfill indexes for YCQL indexes (non-unique)
YSQL: Backfill indexes for YSQL indexes (including expression and partial indexes) - IN PROGRESS (target v2.2) #2301

Phase 2 - manageability features

Status Feature
Ability to view background index backfill tasks #3668
Expose backfill metrics (writes/sec being performed on index table, rows/sec being processed from primary table, size of index table, etc)
Ability to throttle the rate of backfill

Phase 3 - constraints and unique indexes

Status Feature
YCQL: Handle unique indexes
YSQL: Handle unique indexes
Backfill should not block on very long running, pending transactions #3471
Handle master failures during the backfill by saving read time #3611

Phase 4 - Other misc improvements

Status Feature
YSQL backfill pagination for large tablets #5326
YSQL backfill throttling #7889
Perf improvements #2615
⬜️ Batch multiple index rebuilds on the same table
⬜️ Enhance YSQL grammar to support a simple, language-level paradigm to view backfills tasks

Analytics

@robertpang robertpang added the kind/enhancement This is an enhancement of an existing feature label Aug 20, 2018
@robertpang robertpang self-assigned this Aug 20, 2018
@robertpang robertpang added this to To do in Secondary indexes Aug 20, 2018
@YourTechBud
Copy link

Any ETA for this?

@bmatican
Copy link
Contributor

@YourTechBud sorry if we did not follow-up on this. We will be actively working on this for the next release!

@bmatican bmatican added this to To Do in YBase features via automation Jun 4, 2019
@bmatican bmatican moved this from To Do to In progress in YBase features Jun 4, 2019
@bmatican bmatican added the area/docdb YugabyteDB core features label Sep 1, 2019
@bmatican bmatican modified the milestones: v2.0, v2.1 Sep 1, 2019
@amitanandaiyer amitanandaiyer moved this from To do to In progress in Secondary indexes Oct 1, 2019
amitanandaiyer added a commit that referenced this issue Jan 28, 2020
Summary:
High level design doc : https://github.com/yugabyte/yugabyte-db/blob/master/architecture/design/docdb-index-backfill.md

This diff implements the master side changes to be done for a create index. Populating the index tablet at the appropriate time is implemented in D7104.

    - Implement a 4 phase index creation at the master to safely create an index and backfill the data in an online manner. Index is created with the INDEX_PERM_DELETE_ONLY permission, then updated to INDEX_PERM_WRITE_AND_DELETE, then INDEX_PERM_BACKFILLING; and finally, after the backfill is complete, set to INDEX_PERM_READ_WRITE_AND_DELETE.
   - At each phase, the master waits for all the tablets of the indexed table to be updated before moving on to the next phase.
   - Ensure that the CQL proxy/Tablet Server respects the IndexPermission and only performs the operations that are permitted.
   - Ensure that master failover in the middle of a index backfill is handled.

A new flag `disable_index_backfill` is introduced to fence the new 4-stage index create process. This defaults to `true` right now -- until all the remaining parts needed for the backfill land.

Test Plan:
./yb_build.sh --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.*TestCreateIndex*
./yb_build.sh --cxx-test master_failover-itest --gtest_filter MasterFailoverTestIndexCreation.*Index*

Reviewers: mihnea, sergei, hector, bogdan, rahuldesirazu, mikhail

Reviewed By: rahuldesirazu, mikhail

Subscribers: mikhail, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7077
amitanandaiyer added a commit that referenced this issue Feb 6, 2020
…ng non-unique indices.

Summary:
High level design doc : https://github.com/yugabyte/yugabyte-db/blob/master/architecture/design/docdb-index-backfill.md

Depends on D7077 for the master side implementation

This diff implements:
2. BackFill implementation at the tablet level.
   - scan the main table as of the specified time and populate the entries for the indexed tables.
   - apply backfill write op(s) to the index tables at the desired hybrid-time/backfill-time.
   - support backfill time: tunnel it through raft and use it while applying to docdb.

Handled in the next diffs:
3) Unique Indices D7668:
- test create index failure on unique violation
- implement reverse scan for uniqueness check
- disable major compactions from cleaning up the delete markers during backfill.
- Split unique index write batches into multiple chunks upon collision

4) Perf improvements.

Test Plan: ybd --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.*TestTableCreateIndex*

Reviewers: timur, sergei, bogdan, mihnea, hector

Reviewed By: hector

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7104
@bmatican bmatican added this to To do in Index backfill via automation Mar 6, 2020
@amitanandaiyer amitanandaiyer moved this from To do to In progress in Index backfill Mar 8, 2020
@amitanandaiyer amitanandaiyer moved this from In progress to In Review in Index backfill Apr 14, 2020
@rkarthik007 rkarthik007 removed this from the v2.1 milestone Jun 5, 2020
@rkarthik007 rkarthik007 added the roadmap-tracking-issue This issue tracks a major roadmap item, and usually appears in the roadmap list. label Jul 24, 2020
@stevebang stevebang added this to the v2.3 milestone Sep 17, 2020
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Jun 9, 2022
YBase features automation moved this from In progress to Done Jul 27, 2022
Secondary indexes automation moved this from In progress to Done Jul 27, 2022
Index backfill automation moved this from In Review to Done Jul 27, 2022
@ZhenNan2016
Copy link

@robertpang
Excuse me.
I have some questions about the backfilling of secondary index and gin index.
How can I contact you?
Maybe I can ask some questions on yugabyte-db community slack.
But it report an error: bad_ invite_ domain, when I want to login to the community :
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue roadmap-tracking-issue This issue tracks a major roadmap item, and usually appears in the roadmap list.
Projects
Status: Done
Development

No branches or pull requests

8 participants