[Disccusion] Metrics API design. #2

kevinten10 · 2021-09-09T01:18:01Z

Goal

Design Metrics application-level indicator monitoring API

Progress

We can first refer to some information and define a first version of the API.

Reference

dapr/dapr#2817
mosn/layotto#90
dapr/dapr#2988
dapr/dapr#100
dapr/dapr#3449
dapr/dapr#3455
dapr/dapr#3549
mosn/layotto#214

JasmineJ1230 · 2021-09-20T12:30:58Z

In most business scenarios, Event Logs, Digital Indexes and Action Execution Sequences are widely used in application monitoring. I think we should provide a well support for these metric forms.

Here are some ideas for these functions.
Just a very simple sketch~ I hope the roughly defined APIs can express my understanding and assumptions of this function module.

1. Events

Event Log marks the occurrence of a specified situation, which is often related with some alarms.
There is no need for an event to hold too much information. It should be light and simple. We can be simply build an event with an specified name, which is unique for the current application, and some optional short decriptions.
Perhaps something like this...

service Runtime {
  // log event.
  rpc OnEvent(OnEventRequest) returns (google.protobuf.Empty) {}
}

message Event {
    required string event_name = 1;
    optional string desciption = 2;
    required long timestamp = 3;
}
message OnEventRequest {
    required string app_id = 1;
    required Event event = 2;
}

Alarms can be set for specified events. Email reminding is the most common way to handle the alarm. User can also defined their own handlers as an ehanced function if necessary.

service Runtime {
  // start transaction, get the unique transaction id.
  rpc CreateEventAlarm(CreateEventAlarmRequest) returns (CreateEventAlarmResponse) {}

  // record action in transaction.
  rpc DeleteEventAlarm(DeleteEventAlarmRequest) returns (google.protobuf.Empty) {}
}

message Alarm {
    optional string alarm_name = 1;
    optional repeated string handlers = 2;
}
message EventAlarm {
    Alarm alarm = 1;
    string event_name = 2;
}
message CreateEventAlarmRequest {
    required string app_id = 1;
    required string event_name = 2;
    required string alarm_name = 3;
    optional repeated string handlers = 4;
}
message CreateEventAlarmResponse {
    required string app_id = 1;
    EventAlarm event_alarm = 2;
}

message DeleteEventAlarmRequest {
    required string app_id = 1;
    required string event_name = 2;
    required string alarm_name = 3;
}

2. Digital Index.

Digital Index describes the performance changes of an application over a period of time. They can be processed in different ways and serve well for futher data analysis.

service Runtime {
  // start transaction, get the unique transaction id.
  rpc CreateIndex(CreateIndexRequest) returns (CreateIndexResponse) {}

  // record action in transaction.
  rpc publishIndexData(PublishIndexDataRequest) returns (google.protobuf.Empty) {}
}

message Index {
    string index_name = 1;
    string data_type = 2;
    repeated string processors = 3;
}
message CreateIndexRequest {
    required string app_id = 1;
    required string index_name = 2;
    required string data_type = 3;
    repeated string processors = 4;
}
message CreateIndexResponse {
    required string app_id = 1;
    Index index = 2;
}

message PublishIndexDataRequest {
    required string app_id = 1;
    required string index_name = 2;
    required string value = 3;
    required long timestamp = 4;
}

Also, alarms can be set, and triggered when the index touch a specific amount.

service Runtime {
  rpc CreateIndexAlarm(CreateIndexAlarmRequest) returns (CreateIndexAlarmResponse) {}

  // record action in transaction.
  rpc publishIndexData(PublishIndexDataRequest) returns (google.protobuf.Empty) {}

  rpc DeleteIndexAlarm(DeleteIndexAlarmRequest) returns (google.protobuf.Empty) {}
}

message IndexAlarm {
   string index_name = 1;
   Alarm alarm = 2;
   // perhaps regular expression? or use structures with some pre-defined enums.
   string rule = 3;
}

message CreateIndexAlarmRequest {
    required string app_id = 1;
    required string index_name = 2;
    required string alarm_name = 3;
    repeated string handlers = 4;
    required string rule = 5;
}

message CreateIndexAlarmResponse {
    required string app_id = 1;
    IndexAlarm index_alarm = 2;
}

message DeleteIndexAlarmRequest {
    required string app_id = 1;
    required string index_name = 2;
    required string alarm_name = 3;
}

Although all the fuctions should be customizable, we can also provide some easy accesses for those common metric attributes, expecially for the system indicators such as memory usage, cpu and so on.

3. Action Execution Sequence.

Action Execution Sequence records how an function was performed in detail, which is useful for troubleshooting. It might be the most difficult form of metric logging.
To string the actions together, we have to hold a unique id for the current sequence, and each request in the sequence must hold the same sequence id.

service Runtime {
  // start transaction, get the unique transaction id.
  rpc StartTransaction(StartTransactionRequest) returns (StartTransactionAlarmResponse) {}

  // record action in transaction.
  rpc RecordAction(RecordActionRequest) returns (google.protobuf.Empty) {}
}
message StartTransactionRequest {
    required string app_id = 1;
    required string transaction_name = 2;
}

message StartTransactionAlarmResponse {
    required string app_id = 1;
    string transaction_id = 2;
    string transaction_name = 3;
}

message RecordActionRequest {
    required string app_id = 1;
    required string transaction_id = 2;
    required string action_name = 3;
    optiona map<string, string> action_details = 4;
    long timestamp = 5;
}

Let's make some futher discussion about the design of API. Looking forward for your reply~

kevinten10 · 2021-09-22T06:24:46Z

cool! perfect

Please give me a moment to let me understand your design.

kevinten10 · 2021-09-27T07:51:56Z

@JasmineJ1230 Can you pack these definitions into one proto file

If you have time, can you provide java implementations of these interfaces?

JasmineJ1230 · 2021-09-29T06:10:37Z

@JasmineJ1230 Can you pack these definitions into one proto file

If you have time, can you provide java implementations of these interfaces?

OK~ I will make a more complete api design during the National Day holiday. (Perhaps 10/1~3?)

Let's do more detailed discussion on the api defination after that. I will contact you when there is any progress~

Also, when we completed the design of api, I think it would not be a difficult stuff to provide the java implementations.

kevinten10 · 2021-09-29T06:16:51Z

I put your definition above here: https://github.com/reactivegroup/cloud-runtimes-jvm/blob/feature/metrics/spec/proto.runtime.v1/Metrics.proto

And you can directly give Layotto a proposal

pinxiong · 2021-09-30T03:20:37Z

We'd better refer OpenTelemetry, which is already the public accepted standard for monitoring, tracing, and metrics.

github: https://github.com/open-telemetry

JasmineJ1230 · 2021-10-10T04:11:29Z

I have done some learning about Open Telemetry， as well as finished some demo and experiments.
Open Telemetry defines a set of reasonable models and APIs about telemetry data, which has been widely recognized by mainstream cloud manufacturers. If we follow its specifications and develop with its API and SDK, we can reduce a lot of migration costs in the later stage.
But actually, Open Telemetry is still in the development stage, and some functions are immature and incomplete. Maybe we can have a discussion on how and where to use Open Telemetry for our development to follow the mainstream standard while ensure the stability of our project.

I put my learning report here #5, which can be surved as a reference.

kevinten10 mentioned this issue Sep 9, 2021

[Cloud][Project] Capa Metric模块的设计和开发 reactivegroup/sigs#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Disccusion] Metrics API design. #2

[Disccusion] Metrics API design. #2

kevinten10 commented Sep 9, 2021 •

edited

Loading

JasmineJ1230 commented Sep 20, 2021

kevinten10 commented Sep 22, 2021

kevinten10 commented Sep 27, 2021 •

edited

Loading

JasmineJ1230 commented Sep 29, 2021 •

edited

Loading

kevinten10 commented Sep 29, 2021

pinxiong commented Sep 30, 2021

JasmineJ1230 commented Oct 10, 2021

[Disccusion] Metrics API design. #2

[Disccusion] Metrics API design. #2

Comments

kevinten10 commented Sep 9, 2021 • edited Loading

Goal

Progress

Reference

JasmineJ1230 commented Sep 20, 2021

1. Events

2. Digital Index.

3. Action Execution Sequence.

kevinten10 commented Sep 22, 2021

kevinten10 commented Sep 27, 2021 • edited Loading

JasmineJ1230 commented Sep 29, 2021 • edited Loading

kevinten10 commented Sep 29, 2021

pinxiong commented Sep 30, 2021

JasmineJ1230 commented Oct 10, 2021

kevinten10 commented Sep 9, 2021 •

edited

Loading

kevinten10 commented Sep 27, 2021 •

edited

Loading

JasmineJ1230 commented Sep 29, 2021 •

edited

Loading