JMX metrics for HDFS operations #1971

findepi · 2019-11-06T20:30:31Z

Similarly to what we do with S3 file system, we should have JMX-exported metrics for HDFS operations/invocations.

Anurag870 · 2019-11-07T12:59:41Z

@findepi Would like to work on this

findepi · 2019-11-07T13:24:13Z

@Anurag870 sure, thanks! if you need guidance, please join the #dev channel of our slack https://prestosql.io/slack.html

findepi · 2019-11-13T21:44:46Z

For the record, S3 stats are collected in PrestoS3FileSystemStats class.

amommendes · 2020-09-26T02:48:36Z

Hey @findepi, is this PR still to be worked? Inspecting the code, we already have NamenodeStats being exported in HiveHdfsModule. Probably I'm missing something, but here do we need to implement more statistics (as we have for PrestoS2FileSystem)?

findepi · 2020-09-27T18:01:10Z

NamenodeStats expose only few operations (file metadata queries). We could want HDFS stats like file open, or read operations (data throughput).
cc @sopel39

amommendes · 2020-09-28T14:08:11Z

NamenodeStats expose only few operations (file metadata queries). We could want HDFS stats like file open, or read operations (data throughput).
cc @sopel39

Great!

amommendes · 2020-11-11T15:24:30Z

Hey @findepi, I would like to work on this!

tnatssb · 2021-02-18T14:21:26Z

Hi @amommendes, any progress on this?

amommendes · 2021-02-24T14:10:05Z

Hi @amommendes, any progress on this?

Hey @tnatssb , not yet! I had a discussion about these statistics in the slack a time ago.
If someone wants to take this PR, no problem!

tangjiangling · 2021-12-09T12:10:35Z

@findepi As a newcomer, I want to start integrating into the community and contributing from this issue.

tangjiangling · 2021-12-23T20:50:45Z

NamenodeStats expose only few operations (file metadata queries). We could want HDFS stats like file open, or read operations (data throughput).

@findepi I have an idea to implement this feature, and I hope you can review it:

Let me introduce some background first:

Get Hadoop FileSystem will be implemented through the Cache, as shown in the code in io.trino.plugin.hive.HdfsEnvironment:
FileSystemManager.registerCache(TrinoFileSystemCache.INSTANCE);
The value in the cache is io.trino.plugin.hive.fs.TrinoFileSystemCache.FileSystemWrapper, which indirectly inherits from Hadoop FileSystem and overrides some of the interfaces (e.g., open, append, create, etc.)

Therefore, the simplest way to achieve this feature is to extend io.trino.plugin.hive.fs.TrinoFileSystemCache.FileSystemWrapper, override more methods (e.g., delete, etc.), add interface metrics (e.g., latency, error count, etc.).
By the way, we also need a class similar to TrinoS3FileSystemStats, can be called TrinoHdfsFileSystemStats, and merge the metrics in NamenodeStats to TrinoHdfsFileSystemStats.

tangjiangling · 2021-12-27T12:06:47Z

Therefore, the simplest way to achieve this feature is to extend io.trino.plugin.hive.fs.TrinoFileSystemCache.FileSystemWrapper, override more methods (e.g., delete, etc.), add interface metrics (e.g., latency, error count, etc.).

After thinking about it for a while, I shouldn't extend io.trino.plugin.hive.fs.TrinoFileSystemCache.FileSystemWrapper directly, because io.trino.plugin.hive.fs.TrinoFileSystemCache. FileSystemWrapper is the base class of Hadoop ecoFileSystem (e.g., S3, HDFS), extend a class similar to io.trino.plugin.hive.s3.TrinoS3FileSystem, called TrinoHdfsFileSystem, to better monitor operations (e.g., open, create, delete, etc.).

@findepi @electrum

weijiii · 2022-12-04T18:24:34Z

@findepi Hi I'd like to try taking on this :)

hashhar · 2023-05-26T06:09:39Z

Seems to be addressed by #17078

findepi added the enhancement New feature or request label Nov 6, 2019

kokosing added the good first issue Good for newcomers label Nov 7, 2019

ebyhr assigned Anurag870 Nov 7, 2019

Anurag870 removed their assignment Apr 14, 2020

weijiii mentioned this issue Dec 3, 2022

Add time stats for ORC file open time #15260

Closed

weijiii self-assigned this Dec 4, 2022

weijiii mentioned this issue Apr 17, 2023

Add HDFS operation jmx stats #17078

Merged

hashhar closed this as completed May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JMX metrics for HDFS operations #1971

JMX metrics for HDFS operations #1971

findepi commented Nov 6, 2019

Anurag870 commented Nov 7, 2019

findepi commented Nov 7, 2019

findepi commented Nov 13, 2019

amommendes commented Sep 26, 2020 •

edited

Loading

findepi commented Sep 27, 2020

amommendes commented Sep 28, 2020

amommendes commented Nov 11, 2020

tnatssb commented Feb 18, 2021

amommendes commented Feb 24, 2021

tangjiangling commented Dec 9, 2021

tangjiangling commented Dec 23, 2021 •

edited

Loading

tangjiangling commented Dec 27, 2021

weijiii commented Dec 4, 2022

hashhar commented May 26, 2023

JMX metrics for HDFS operations #1971

JMX metrics for HDFS operations #1971

Comments

findepi commented Nov 6, 2019

Anurag870 commented Nov 7, 2019

findepi commented Nov 7, 2019

findepi commented Nov 13, 2019

amommendes commented Sep 26, 2020 • edited Loading

findepi commented Sep 27, 2020

amommendes commented Sep 28, 2020

amommendes commented Nov 11, 2020

tnatssb commented Feb 18, 2021

amommendes commented Feb 24, 2021

tangjiangling commented Dec 9, 2021

tangjiangling commented Dec 23, 2021 • edited Loading

tangjiangling commented Dec 27, 2021

weijiii commented Dec 4, 2022

hashhar commented May 26, 2023

amommendes commented Sep 26, 2020 •

edited

Loading

tangjiangling commented Dec 23, 2021 •

edited

Loading