Implement datasource metrics #75

kortemik · 2024-08-26T15:32:49Z

Description
Implement datasource metrics

Use case or motivation behind the feature request
Currently users do not know the progress of a query which is frustrating. This needs to be fixed. Now that the datasource uses Spark 3 APIs it is possible to provide metric information about the datasource progress.

Please create at least following metrics aggregated into JSON data format:
Driver:

Current archive offset
Kafka offset

Task

Amount of records processed
Amount of bytes processed
Bytes per second
Records per second

Please consider implementing a precreated (hourly/automatic) buckets within the driver for earliest-latest span and binning the processed data in the tasks into these created buckets.

Please define JSON schema once initial development is done.

Related issues
teragrep/ajs_01#70 depends on this

Additional context
See example at #74 and close when implemented.

kortemik · 2024-08-28T12:07:51Z

this feature replaces "metricsLogger" in DPLParserCatalystContext on pth_10.

kortemik added the enhancement New feature or request label Aug 26, 2024

kortemik assigned 51-code Aug 26, 2024

51-code assigned eemhu and unassigned 51-code Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement datasource metrics #75

Implement datasource metrics #75

kortemik commented Aug 26, 2024 •

edited

Loading

kortemik commented Aug 28, 2024

Implement datasource metrics #75

Implement datasource metrics #75

Comments

kortemik commented Aug 26, 2024 • edited Loading

kortemik commented Aug 28, 2024

kortemik commented Aug 26, 2024 •

edited

Loading