Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HDFS operation jmx stats #17078

Merged
merged 2 commits into from
May 7, 2023

Conversation

weijiii
Copy link
Member

@weijiii weijiii commented Apr 17, 2023

Description

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Apr 17, 2023
@github-actions github-actions bot added hive Hive connector tests:hive labels Apr 17, 2023
@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch 16 times, most recently from 362946f to 2f5419e Compare April 25, 2023 04:56
@@ -27,6 +28,7 @@
implements TrinoFileSystemFactory
{
private final HdfsEnvironment environment;
private static final TrinoHdfsFileSystemStats STATS = new TrinoHdfsFileSystemStats();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make this static. It's only static for S3 because Hadoop file systems are singletons. But for this, we can have separate stats per instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look :) Do you mean separate stats per HdfsFileSystem instance? In this case we don't have to make TrinoHdfsFileSystemStats singleton?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I shouldn't have said per instance. We can make TrinoHdfsFileSystemStats injected into HdfsFileSystemFactory and bound as a singleton, rather than making it static. This means that if we have multiple connectors, each one will have separate stats.

Take a look at how NamenodeStats is bound and used.

@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch from 2f5419e to 5746180 Compare April 25, 2023 05:16
@weijiii weijiii marked this pull request as ready for review April 25, 2023 05:56
@weijiii weijiii requested a review from raunaqmorarka April 25, 2023 05:56
@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch 3 times, most recently from 236deec to 51a923f Compare April 26, 2023 20:23
@weijiii
Copy link
Member Author

weijiii commented Apr 26, 2023

Force-pushed to rebase.

@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch from 51a923f to f8afbf7 Compare April 27, 2023 20:58
@weijiii weijiii changed the title Add hdfs operation jmx stats Add HDFS operation jmx stats Apr 28, 2023
@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch 2 times, most recently from 06a0b5c to 1dbbb94 Compare April 28, 2023 16:58
@weijiii
Copy link
Member Author

weijiii commented Apr 28, 2023

@raunaqmorarka Can you give a review on this PR? Also can I re-run this failing task? The error does not look related. Thanks.

Error: LinkageError occurred while loading main class io.trino.tests.product.launcher.cli.Launcher
	java.lang.UnsupportedClassVersionError: io/trino/tests/product/launcher/cli/Launcher has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and then I'll merge this

@@ -30,6 +32,8 @@
public void configure(Binder binder)
{
binder.bind(HdfsFileSystemFactory.class).in(SINGLETON);
binder.bind(TrinoHdfsFileSystemStats.class).toInstance(HdfsFileSystemFactory.getFileSystemStats());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do

binder.bind(TrinoHdfsFileSystemStats.class).in(SINGLETON);

@@ -27,6 +28,7 @@
implements TrinoFileSystemFactory
{
private final HdfsEnvironment environment;
private static final TrinoHdfsFileSystemStats STATS = new TrinoHdfsFileSystemStats();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I shouldn't have said per instance. We can make TrinoHdfsFileSystemStats injected into HdfsFileSystemFactory and bound as a singleton, rather than making it static. This means that if we have multiple connectors, each one will have separate stats.

Take a look at how NamenodeStats is bound and used.

@@ -125,6 +123,10 @@ public void configure(Binder binder)
newExporter(binder).export(TrinoFileSystemCacheStats.class)
.as(generator -> generator.generatedNameOf(io.trino.plugin.hive.fs.TrinoFileSystemCache.class));

binder.bind(HdfsNamenodeStats.class).toInstance(HdfsFileSystemFactory.getNamenodeStats());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this bound as a singleton, like it was before

@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch from 1dbbb94 to 446cbb4 Compare May 6, 2023 21:46
@github-actions github-actions bot added delta-lake Delta Lake connector iceberg Iceberg connector labels May 6, 2023
@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch 2 times, most recently from a20d756 to 220b8a3 Compare May 6, 2023 23:04
@weijiii
Copy link
Member Author

weijiii commented May 7, 2023

@electrum I've rebased and bound the stats to singleton. The test failure seems unrelated and it looks like we have an open issue on this #16315 . Please take a look. Thanks.

TestHiveTransactionalTable > testReadFullAcidPartitioned [groups: hive_transactional]
java.sql.SQLException: Query failed (#20230506_235422_01464_bx6fp): Found file in sub-directory of ACID directory: hdfs://hadoop-master:9000/user/hive/warehouse/test_read_full_acid_true_none_p24qoeaen9/part_col=2/delete_delta_0000002_0000005/delete_delta_0000002_0000005/bucket_00000

@weijiii weijiii force-pushed the add_hdfs_operation_jmx_stats branch from 220b8a3 to af1e759 Compare May 7, 2023 03:40
@weijiii
Copy link
Member Author

weijiii commented May 7, 2023

Force pushed to squash commits

@electrum electrum merged commit e3bdafb into trinodb:master May 7, 2023
@github-actions github-actions bot added this to the 417 milestone May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

2 participants