-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tablet splitting: size-based strategy - monitor tablets and split automatically #1462
Labels
Projects
Comments
Closed
ttyusupov
added
kind/enhancement
This is an enhancement of an existing feature
area/docdb
YugabyteDB core features
labels
Jun 22, 2020
ttyusupov
changed the title
Tablet splitting: Size-based strategy
Tablet splitting: Size-based strategy - monitor tablets and split automatically
Jun 23, 2020
ttyusupov
changed the title
Tablet splitting: Size-based strategy - monitor tablets and split automatically
Tablet splitting: size-based strategy - monitor tablets and split automatically
Jun 23, 2020
ttyusupov
added a commit
that referenced
this issue
Jul 1, 2020
…lit automatically Summary: This revision has two parts: 1) Implements automated tablet splitting size-based strategy: - Added `tablet_split_size_threshold` flag for master. This flag value is propagated to tservers inside `TSHeartbeatResponsePB`. - Implemented TabletSplitHeartbeatDataProvider that monitors tablets size each `tablet_split_monitor_heartbeat_interval_ms` period (flag). Tablets which SST files size exceeds this value are reported back to master inside `TSHeartbearRequestPB`. - `MasterServiceImpl::TSHeartbeat` triggers splitting for tablets to split received inside `TSHeartbearRequestPB`. - Added detecting of whether tablet's regular DB has been fully compacted since creation. For post-split tablets, we only split them again after full compaction because `Tablet::GetEncodedMiddleSplitKey` and `Tablet::GetCurrentVersionSstFilesSize` are not respecting tablet's key bounds for now and for not yet fully compacted post-splits tablet would return the same values as for pre-split tablet (because of actually containing the same hard-linked SST files). 2) Set of changes that fix issues faced during testing this strategy with `YCSB` workload: - Fixed `MetaCache::ProcessTabletLocations` locations pre-check - Updated `MetaCache::ProcessTabletLocations` to support gaps in side received tablet locations (which is possible because some tablets could be not yet running). - Added `SysTablesEntryPB::partitions_version` which value increments each time set of table partitions changes (for now - as a result of tablet splitting). - Updated `MetaCache` to also trigger re-fetching of table partitions when it gets newer `partitions_version` from master (`MetaCache::ProcessTabletLocations`). - Added ClientError Status error code category, used this for propagating stale table partitions error from `MetaCache` to `Batcher`. Limitations: 1) Client app should retry requests in case of getting `usertableorg.postgresql.util.PSQLException: ERROR: Operation failed. Try again. ...` (see core_workload_insertion_retry_limit property for YCSB in test plan). 2) Since we only split again fully compacted tablets - by the time those are fully compacted their size could be larger than `tablet_split_size_threshold`. Test Plan: ``` ./bin/yb-ctl destroy ./bin/yb-ctl --rf=3 create --num_shards_per_tserver=1 --ysql_num_shards_per_tserver=1 --master_flags '"tablet_split_size_threshold=30000000"' --tserver_flags '"memstore_size_mb=10"' ./bin/ysqlsh -c "CREATE DATABASE ycsb;" ./bin/ysqlsh -d ycsb -c "CREATE TABLE usertable ( YCSB_KEY TEXT, FIELD0 TEXT, FIELD1 TEXT, FIELD2 TEXT, FIELD3 TEXT, FIELD4 TEXT, FIELD5 TEXT, FIELD6 TEXT, FIELD7 TEXT, FIELD8 TEXT, FIELD9 TEXT, PRIMARY KEY (YCSB_KEY ASC));" ~/code/YCSB/bin/ycsb load jdbc -s -P ~/code/YCSB/db-local.properties -P ~/code/YCSB/workloads/workloadc -p recordcount=500000 -p operationcount=1000000 -p threadcount=4 ~/code/YCSB/bin/ycsb run jdbc -s -P ~/code/YCSB/db-local.properties -P ~/code/YCSB/workloads/workloadc -p recordcount=500000 -p operationcount=1000000 -p threadcount=4 ``` `~/code/YCSB/db-local.properties` contents: ``` db.driver=org.postgresql.Driver db.url=jdbc:postgresql://127.0.0.1:5433/ycsb;jdbc:postgresql://127.0.0.2:5433/ycsb;jdbc:postgresql://127.0.0.3:5433/ycsb db.user=yugabyte db.passwd= core_workload_insertion_retry_limit=100 ``` Tablet splitting progress could be checked using `http://127.0.0.1:9000/tablets`. Reviewers: sergei, nicolas, hector, bogdan Reviewed By: bogdan Subscribers: hector, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D8727
ttyusupov
added a commit
that referenced
this issue
Jul 1, 2020
…lit automaticall Summary: This revision has two parts: 1) Implements automated tablet splitting size-based strategy: - Added `tablet_split_size_threshold` flag for master. This flag value is propagated to tservers inside `TSHeartbeatResponsePB`. - Implemented TabletSplitHeartbeatDataProvider that monitors tablets size each `tablet_split_monitor_heartbeat_interval_ms` period (flag). Tablets which SST files size exceeds this value are reported back to master inside `TSHeartbearRequestPB`. - `MasterServiceImpl::TSHeartbeat` triggers splitting for tablets to split received inside `TSHeartbearRequestPB`. - Added detecting of whether tablet's regular DB has been fully compacted since creation. For post-split tablets, we only split them again after full compaction because `Tablet::GetEncodedMiddleSplitKey` and `Tablet::GetCurrentVersionSstFilesSize` are not respecting tablet's key bounds for now and for not yet fully compacted post-splits tablet would return the same values as for pre-split tablet (because of actually containing the same hard-linked SST files). 2) Set of changes that fix issues faced during testing this strategy with `YCSB` workload: - Fixed `MetaCache::ProcessTabletLocations` locations pre-check - Updated `MetaCache::ProcessTabletLocations` to support gaps in side received tablet locations (which is possible because some tablets could be not yet running). - Added `SysTablesEntryPB::partitions_version` which value increments each time set of table partitions changes (for now - as a result of tablet splitting). - Updated `MetaCache` to also trigger re-fetching of table partitions when it gets newer `partitions_version` from master (`MetaCache::ProcessTabletLocations`). - Added ClientError Status error code category, used this for propagating stale table partitions error from `MetaCache` to `Batcher`. Limitations: 1) Client app should retry requests in case of getting `usertableorg.postgresql.util.PSQLException: ERROR: Operation failed. Try again. ...` (see core_workload_insertion_retry_limit property for YCSB in test plan). 2) Since we only split again fully compacted tablets - by the time those are fully compacted their size could be larger than `tablet_split_size_threshold`. Test Plan: ``` ./bin/yb-ctl destroy ./bin/yb-ctl --rf=3 create --num_shards_per_tserver=1 --ysql_num_shards_per_tserver=1 --master_flags '"tablet_split_size_threshold=30000000"' --tserver_flags '"memstore_size_mb=10"' ./bin/ysqlsh -c "CREATE DATABASE ycsb;" ./bin/ysqlsh -d ycsb -c "CREATE TABLE usertable ( YCSB_KEY TEXT, FIELD0 TEXT, FIELD1 TEXT, FIELD2 TEXT, FIELD3 TEXT, FIELD4 TEXT, FIELD5 TEXT, FIELD6 TEXT, FIELD7 TEXT, FIELD8 TEXT, FIELD9 TEXT, PRIMARY KEY (YCSB_KEY ASC));" ~/code/YCSB/bin/ycsb load jdbc -s -P ~/code/YCSB/db-local.properties -P ~/code/YCSB/workloads/workloadc -p recordcount=500000 -p operationcount=1000000 -p threadcount=4 ~/code/YCSB/bin/ycsb run jdbc -s -P ~/code/YCSB/db-local.properties -P ~/code/YCSB/workloads/workloadc -p recordcount=500000 -p operationcount=1000000 -p threadcount=4 ``` `~/code/YCSB/db-local.properties` contents: ``` db.driver=org.postgresql.Driver db.url=jdbc:postgresql://127.0.0.1:5433/ycsb;jdbc:postgresql://127.0.0.2:5433/ycsb;jdbc:postgresql://127.0.0.3:5433/ycsb db.user=yugabyte db.passwd= core_workload_insertion_retry_limit=100 ``` Tablet splitting progress could be checked using `http://127.0.0.1:9000/tablets`. Reviewers: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D8785
deeps1991
pushed a commit
to deeps1991/yugabyte-db
that referenced
this issue
Jul 22, 2020
…s and split automatically Summary: This revision has two parts: 1) Implements automated tablet splitting size-based strategy: - Added `tablet_split_size_threshold` flag for master. This flag value is propagated to tservers inside `TSHeartbeatResponsePB`. - Implemented TabletSplitHeartbeatDataProvider that monitors tablets size each `tablet_split_monitor_heartbeat_interval_ms` period (flag). Tablets which SST files size exceeds this value are reported back to master inside `TSHeartbearRequestPB`. - `MasterServiceImpl::TSHeartbeat` triggers splitting for tablets to split received inside `TSHeartbearRequestPB`. - Added detecting of whether tablet's regular DB has been fully compacted since creation. For post-split tablets, we only split them again after full compaction because `Tablet::GetEncodedMiddleSplitKey` and `Tablet::GetCurrentVersionSstFilesSize` are not respecting tablet's key bounds for now and for not yet fully compacted post-splits tablet would return the same values as for pre-split tablet (because of actually containing the same hard-linked SST files). 2) Set of changes that fix issues faced during testing this strategy with `YCSB` workload: - Fixed `MetaCache::ProcessTabletLocations` locations pre-check - Updated `MetaCache::ProcessTabletLocations` to support gaps in side received tablet locations (which is possible because some tablets could be not yet running). - Added `SysTablesEntryPB::partitions_version` which value increments each time set of table partitions changes (for now - as a result of tablet splitting). - Updated `MetaCache` to also trigger re-fetching of table partitions when it gets newer `partitions_version` from master (`MetaCache::ProcessTabletLocations`). - Added ClientError Status error code category, used this for propagating stale table partitions error from `MetaCache` to `Batcher`. Limitations: 1) Client app should retry requests in case of getting `usertableorg.postgresql.util.PSQLException: ERROR: Operation failed. Try again. ...` (see core_workload_insertion_retry_limit property for YCSB in test plan). 2) Since we only split again fully compacted tablets - by the time those are fully compacted their size could be larger than `tablet_split_size_threshold`. Test Plan: ``` ./bin/yb-ctl destroy ./bin/yb-ctl --rf=3 create --num_shards_per_tserver=1 --ysql_num_shards_per_tserver=1 --master_flags '"tablet_split_size_threshold=30000000"' --tserver_flags '"memstore_size_mb=10"' ./bin/ysqlsh -c "CREATE DATABASE ycsb;" ./bin/ysqlsh -d ycsb -c "CREATE TABLE usertable ( YCSB_KEY TEXT, FIELD0 TEXT, FIELD1 TEXT, FIELD2 TEXT, FIELD3 TEXT, FIELD4 TEXT, FIELD5 TEXT, FIELD6 TEXT, FIELD7 TEXT, FIELD8 TEXT, FIELD9 TEXT, PRIMARY KEY (YCSB_KEY ASC));" ~/code/YCSB/bin/ycsb load jdbc -s -P ~/code/YCSB/db-local.properties -P ~/code/YCSB/workloads/workloadc -p recordcount=500000 -p operationcount=1000000 -p threadcount=4 ~/code/YCSB/bin/ycsb run jdbc -s -P ~/code/YCSB/db-local.properties -P ~/code/YCSB/workloads/workloadc -p recordcount=500000 -p operationcount=1000000 -p threadcount=4 ``` `~/code/YCSB/db-local.properties` contents: ``` db.driver=org.postgresql.Driver db.url=jdbc:postgresql://127.0.0.1:5433/ycsb;jdbc:postgresql://127.0.0.2:5433/ycsb;jdbc:postgresql://127.0.0.3:5433/ycsb db.user=yugabyte db.passwd= core_workload_insertion_retry_limit=100 ``` Tablet splitting progress could be checked using `http://127.0.0.1:9000/tablets`. Reviewers: sergei, nicolas, hector, bogdan Reviewed By: bogdan Subscribers: hector, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D8727
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Parent ticket: "Tablet splitting" #1004.
Depends on #1461.
The text was updated successfully, but these errors were encountered: