Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28837 Introduce row statistics coprocessor example #6327

Merged
merged 18 commits into from
Dec 20, 2024

Conversation

eab148
Copy link
Contributor

@eab148 eab148 commented Sep 30, 2024

Design doc

HBase provides many configuration options to help administrators tune their tables and clusters for desired performance and reliability. However, it can be difficult for administrators to leverage these configuration options because they do not have a nuanced understanding of the shape of their data in HBase.

In this PR, we introduce a row statistics coprocessor to the hbase-examples module. This coprocessor allows administrators to collect statistics on the rows in their HBase tables as these rows compact. With more information about the shape of their data in HBase, administrators can leverage the available configuration options to unlock performance and reliability gains for their HBase tables/clusters.

(Thank you to Bryan Beaudreault for collaborating with me on this!)

Jira

cc @hgromer @krconv @rmdmattingly @ndimiduk @bbeaudreault

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@eab148 eab148 marked this pull request as ready for review October 2, 2024 15:01
@ndimiduk
Copy link
Member

ndimiduk commented Oct 2, 2024

@eab148 if you want community eyes on this, I recommend that you send a note to the dev@ mailing list. I think most folks are not watching PRs or Jira issue creation.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@eab148 eab148 changed the title HBASE-28842: Row Statistics Coprocessor HBASE-28837: Row Statistics Coprocessor Oct 11, 2024
@ndimiduk
Copy link
Member

@stoty @virajjasani is this something that Phoenix could use to make better query plans?

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a very important contribution, and we should discuss adding it as a core feature of HBase. I do wonder if there's a better solution than writing across-region-servers from within the coprocessor. The LMAX disruptor helps, but I wonder if there's a more scalable architecture would have all statistics populate something region-local. Maybe we add the statistics to the WAL's compaction marker? Or can we write the statistics to a header/trailer value in the resulting HFile?

hbase-examples/pom.xml Show resolved Hide resolved
regionCount += count;
}
if (regionCount == 0) {
regionEnv.getMetricRegistryForRegionServer().remove(ROW_STATISTICS_DROPPED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to remove metrics? Is this some kind of coprocessor hygiene best practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a table moves off a RegionServer (or is dropped), we leak the metrics, which hang around forever until the RS restarts. This is really only a problem for clusters that have a lot of table creations/deletions (for example clusters that have acceptance tests run against them), but I felt it made sense to have this cleanup logic everywhere.

@ndimiduk
Copy link
Member

Since i cannot leave a comment on the design doc, would you mind filling in some detail about how these statistics are queried and used? Thanks @eab148 !

@stoty
Copy link
Contributor

stoty commented Oct 14, 2024

@stoty @virajjasani is this something that Phoenix could use to make better query plans?

Most of these statistics are not particularly relevant for Phoenix, as extreme row/cell sizes are less typical in Phoenix's relational model.

However, I think this same infrastructure could be extended for key distribution, cardinality, and other statistics that are relevant for query plan optimization. Those would probably need some kind of pluggalble interface where Phoenix can process the rowkeys as its own PK components as the data is collected.

Phoenix doesn't really do optimization based on table contents today, so there is a lot of room for improvement.

Unfortunately, my knowledge on optimization techniques is also lacking, so I cannot tell whether this infrastructure has any flaws that would later limit Phoenix's ability to use it.

@stoty
Copy link
Contributor

stoty commented Oct 14, 2024

As for the data collection, one alternative could be using a special CF for the statistics, instead of a separate table.
That way the statistics could be stored right with the table, and with the correct rowkey design (i.e. prepending the region's start key) it would also be local to the same region server.

Phoenix's local indexes use this trick.

If write performance is a problem, the rowkey for the statistics table could also be salted.

IUC we now have one cell per region per CF, so the the statistics write load might not even be an issue, it's a tradeoff between write and read performance, as usual.

@bbeaudreault
Copy link
Contributor

I think if we were to build this natively into hbase as a feature, it probably shouldn't be a coprocessor. It could be written within the core internals, which would afford us some extra power. Namely, we could store these statistics directly into the HFiles in the metadata. The query layer could handle aggregating across the HFiles of a region. We could provide coprocessor hooks to let users add their own data to the same per-HFile metadata.

@eab148
Copy link
Contributor Author

eab148 commented Oct 17, 2024

Since i cannot leave a comment on the design doc, would you mind filling in some detail about how these statistics are queried and used? Thanks @eab148 !

Querying row statistics:

We have an API that fetches and performs aggregations/filtering on our row statistics. These results are cached for a duration of time.

The available API queries include:

  • Get a given region's row statistics
  • Sample X row statistics for a given table/CF
  • Aggregate all of the row statistics across table/CF pairs
  • Fetch the "top N" row statistics for a given field and table/CF pair
    • In other words, we may provide LARGEST_ROW_NUM_BYTES and fetch the top N row statistics in order of RowStatistics::largestRowNumBytes (descending order)

As a reminder, the cells in our internal row statistics table have the following fields:

  • rowKey: full region name
  • family: 0
  • qualifier: 1 for majorCompaction row statistics, 0 for minor compaction row statistics
  • value: JSON blob of the RowStatistic object

Using row statistics:

At my day job, we've used the row statistics to

  1. Tune the block sizes for tables that service random read workloads, reducing disk I/O on relevant clusters
  2. Remove huge cells (>512 KiB) from our tables. Huge cells are ticking time bombs in HBase, as they cannot be cached in memory without admin intervention/memory configuration changes.
  3. Implement smarter compaction schedules, reducing the daily network data transfer cost of our HBase setup
Tune Block Sizes

Our block size tuning job halves the block size for each family of a table if

  • The table’s "typical row" can fit into the smaller block size. At my day job, clients usually query full rows, so we want all cells for a given row to be in the same block.
  • The cluster’s memory has space for the larger index size
  • The table serves mostly random read traffic.

Tuning the block sizes for sequential read workloads is more complicated as estimating the number of blocks a Scan will require is an ambiguous task when one considers filtered, partial, and full table Scans.

Remove Huge Cells
  1. Use our Row Statistics API to aggregate all of the row statistics for a given table.
  2. Use this response to find the number of cells that exceed the max cache size, which we have set to 512 KiB.
  3. Alert the relevant teams within our organization that they need to remove or break up these cells in their HBase tables.
Smarter Compaction Schedules
  1. Use our Row Statistics API to aggregate get the most recent row statistics for a given region.
  2. Estimate the amount of useful work that a major compaction accomplishes based on the number of StoreFiles in that region, the number of tombstones in that region, and the time since last compaction for that region

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@eab148
Copy link
Contributor Author

eab148 commented Dec 19, 2024

@eab148 please address the static analysis nits (javac and checkstyle, linked in the yetus output above) as appropriate, and this is good to go.

@ndimiduk I addressed the static analysis nits. They are now passing with 💚. Everything should be good now. I just rebased my branch off of master to address the patch warnings. It is in the process of building, but I didn't make any code changes, so I expect it to compile and test without issue. Thank you so much for your eyes on this PR over the past few months!

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 29s master passed
+1 💚 compile 0m 18s master passed
+1 💚 javadoc 0m 15s master passed
+1 💚 shadedjars 5m 38s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ patch 5m 45s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 50s the patch passed
+1 💚 compile 0m 17s the patch passed
+1 💚 javac 0m 17s hbase-examples generated 0 new + 5 unchanged - 3 fixed = 5 total (was 8)
+1 💚 javadoc 0m 14s the patch passed
+1 💚 shadedjars 5m 36s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 1m 43s hbase-examples in the patch passed.
21m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6327/25/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6327
JIRA Issue HBASE-28837
Optional Tests javac javadoc unit shadedjars compile
uname Linux 20babf9cfa75 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6f59296
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6327/25/testReport/
Max. process+thread count 1681 (vs. ulimit of 30000)
modules C: hbase-examples U: hbase-examples
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6327/25/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 19s master passed
+1 💚 compile 0m 28s master passed
+1 💚 checkstyle 0m 9s master passed
+1 💚 spotbugs 0m 32s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
-0 ⚠️ patch 0m 52s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 20s the patch passed
+1 💚 compile 0m 30s the patch passed
+1 💚 javac 0m 30s hbase-examples generated 0 new + 13 unchanged - 2 fixed = 13 total (was 15)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 9s the patch passed
+1 💚 xmllint 0m 0s No new issues.
+1 💚 spotbugs 0m 41s the patch passed
+1 💚 hadoopcheck 11m 9s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 41s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
28m 42s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6327/25/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6327
JIRA Issue HBASE-28837
Optional Tests dupname asflicense javac codespell detsecrets xmllint hadoopcheck spotless compile spotbugs checkstyle hbaseanti
uname Linux 61381697e0cb 5.4.0-200-generic #220-Ubuntu SMP Fri Sep 27 13:19:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6f59296
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-examples U: hbase-examples
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6327/25/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @eab148 for being patient as we checked off all the boxes.

@ndimiduk
Copy link
Member

The patch warning is simply because our pre-commit build tool, Yetus, doesn't seem to like applying a sequence of commits. It appears to assume that it's applying a single patch file, not evaluating a branch. Maybe something that could be improved in that project too.

@ndimiduk ndimiduk merged commit 2990a0f into apache:master Dec 20, 2024
1 check passed
@ndimiduk ndimiduk deleted the HBASE-28842 branch December 20, 2024 11:58
@ndimiduk ndimiduk changed the title HBASE-28837: Row Statistics Coprocessor HBASE-28837 Introduce row statistics coprocessor example Dec 20, 2024
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Dec 20, 2024
Co-authored-by: Evie Boland <eboland@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Dec 20, 2024
Co-authored-by: Evie Boland <eboland@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Dec 20, 2024
Co-authored-by: Evie Boland <eboland@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk added a commit that referenced this pull request Dec 20, 2024
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Co-authored-by: eab148 <54775485+eab148@users.noreply.github.com>
Co-authored-by: Evie Boland <eboland@hubspot.com>
@eab148
Copy link
Contributor Author

eab148 commented Dec 20, 2024

Thank you @eab148 for being patient as we checked off all the boxes.

No, thank you @ndimiduk!

ragarkar pushed a commit to ragarkar/hbase that referenced this pull request Jan 3, 2025
Co-authored-by: Evie Boland <eboland@hubspot.com>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants