-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OTel metrics #98
Add OTel metrics #98
Conversation
core/metric.go
Outdated
name := fmt.Sprintf("%s.processed_block_height", namespaceRoot) | ||
callback := func(ctx context.Context, observer api.Int64Observer) error { | ||
observer.Observe(metricSrcProcessedBlockHeight.Load(), api.WithAttributes(attribute.String("chain", "src"))) | ||
observer.Observe(metricDstProcessedBlockHeight.Load(), api.WithAttributes(attribute.String("chain", "dst"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding attributes expressing more detailed context?
Being ambiguous with chain:src
and chain:dst
only. A relayer process treats multiple src/dest pairs.
We may add attributes like chain_id:xxx
and direction:src|dest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/metric.go
Outdated
|
||
// create the instrument "relayer.processed_block_height" | ||
name := fmt.Sprintf("%s.processed_block_height", namespaceRoot) | ||
callback := func(ctx context.Context, observer api.Int64Observer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: How about writing callbacks more directly as follows?
- Define
func Observe(ctx context.Context, observer api.Observer)
onSyncHeaders
- Make
InitializeMetrics
to take aSyncHeaders
value and registerSyncHeaders.Observe
as callback
The advantage is avoiding global atomic variables metricXXX
and reducing codes treat them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your good suggestion! However there is a reason why I can't adopt this soon.
An instance of SyncHeaders
is created, used, and dropped for each relay cycle.
Therefore it is meaningless to observer SyncHeaders
to obtain metrics.
Not only SyncHeaders
but all other states in the relayer are also short-lived and not held longer than one relay cycle.
Therefore we have to add some "global variables" to store metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to say that we basically want to keep the core structure of the relayer stateless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed my big mistake ... when I tried to move metric.go
to another directory.
metric.go
can be referred to by any other components in the relayer.
Therefore, to avoid cyclic reference, metric.go
itself should not depends on any types or functions in the relayer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct structure seems the following:
- The metrics subsystem (metric.go) keeps:
- The instance of
MeterProvider
- The instance of
Meter
- The instruments like
Counter
s andGauge
s- Since we want to easily list up all metrics exported by the relayer, I put them together.
- The instance of
- The other components (e.g. relayer core, tendermint module, mock prover) keeps:
- Original data needed to calculate final metrics values
- Logics to calculate final metrics values from original data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining! I have a question and a comment. (These aren't fixing requests. I agree the direction implementing core logics stateless.)
Question
Are you saying that SyncHeader
is not ALWAYS long-lived?
SyncHeader
seems long-lived structure when a relayer starts in service mode.
https://github.com/siburu/yui-relayer/blob/e6c5f9eca4fbde03c04218149776f068e948bb1d/core/service.go#L13
https://github.com/siburu/yui-relayer/blob/e6c5f9eca4fbde03c04218149776f068e948bb1d/core/naive-strategy.go#L65
It's OK that metrics are availbable only running in service mode.
Comment
We can also avoid cyclic reference by defining an interface like type SubjectI interface { Observe() }
at metrics.go. (like here)
Other types depend on SubjectI
. metrics.go will keep independence from other go files/packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mizuochik
Right. I was wrong about the lifespan of SyncHeader
.
Please forget the 3 comments that start from #98 (comment).
My current position is #98 (comment).
I think all the gauges that I added are more suitable for synchronous gauge, but the current release of OTel-go doesn't support synchronous gauge yet.
Therefore I defined our own implementation of synchronous gauge in metrics/sync_gauge.go
.
As a result, I didn't need to implement Observe
in SyncHeader
or NaiveStrategy
.
I believe synchronous gauge will be supported by OTel-go in the near future.
open-telemetry/opentelemetry-specification#3540
At that time, we can easily replace metrics/sync_gauge.go
with the official implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I agreed!
SyncGauge
seems suitable to me as well.
core/metric.go
Outdated
|
||
// create the instrument "relayer.backlog_size" | ||
name = fmt.Sprintf("%s.backlog_size", namespaceRoot) | ||
callback = func(ctx context.Context, observer api.Int64Observer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Same as above.
We may move this callback and the backlog variables to NaiveStrategy
's method and fields.
core/metric.go
Outdated
name = fmt.Sprintf("%s.backlog_size", namespaceRoot) | ||
callback = func(ctx context.Context, observer api.Int64Observer) error { | ||
observer.Observe(metricSrcBacklogSize.Load(), api.WithAttributes(attribute.String("chain", "src"))) | ||
observer.Observe(metricDstBacklogSize.Load(), api.WithAttributes(attribute.String("chain", "dst"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as processed_block_height
.
core/metric.go
Outdated
|
||
// create the instrument "relayer.backlog_oldest_height" | ||
name = fmt.Sprintf("%s.backlog_oldest_height", namespaceRoot) | ||
callback = func(ctx context.Context, observer api.Int64Observer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Same as above.
This callback also able to be move to NaiveStrategy
.
In summary, observable callbacks may be implemented as follows.
metrics.go
//
// Measurement subject
//
type SubjectI struct {
Observe(ctx context.Context, observer metricapi.Observer) error
}
func InitializeMetrics(....., subjects SubjectI...) error { // Taking measurement subjects (syncHeaders, naiveStrategy, etc...)
// Initialize metric providers, instruments, etc...
:
// Register measurement subjects (syncHeaders, naiveStrategy, etc...)
for _, subject := range subjects {
if _, err := meter.RegisterCallback(
subject.Observe,
ProcessedBlockHeightGauge,
BacklogSizeGauge,
BacklogOldestHeightGauge,
..... // rest observable instruments
); err != nil {
panic(err)
}
}
:
}
headers.go
func (sh *syncHeaders) Observe(ctx context.Context, observer api.Observer) error {
:
observer.Observe(ProcessedBlockHeightGauge, .....)
:
}
naive-strategy.go
type NaiveStrategy struct {
:
// backlogs may have to be keep here
srcBacklog PacketInfoList
destBacklog PacketInfoList
}
:
// The below callback taking generic `api.Observer` (not like `api.Int64Observer`)
// handles *multiple* observable gauges at once.
func (st *NaiveStrategy) Observe(ctx context.Context, observer api.Observer) error {
:
observer.Observe(BacklogSizeGauge, ...)
:
observer.Observe(BacklogOldestHeightGauge, ...)
:
}
func (st NaiveStrategy) UnrelayedPackets(src, dst *ProvableChain, sh SyncHeaders, includeRelayedButUnfinalized bool) (*RelayPackets, error) {
:
ReceivePacketsFinalizedCounter.Add(1) // This counter is non-observable, so increment in main relayer logics directly
:
}
core/metric.go
Outdated
instProcessedBlockHeight api.Int64ObservableGauge | ||
instBacklogSize api.Int64ObservableGauge | ||
instBacklogOldestHeight api.Int64ObservableGauge | ||
instReceivePacketsFinalized api.Int64Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Naming xxxGauge
and xxxCounter
may be better.
Gauge
and Counter
are domain-specific terms around observability, so better express the intent of variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/metric.go
Outdated
instReceivePacketsFinalized.Add(ctx, 1, api.WithAttributes(attribute.String("chain", "dst"))) | ||
} | ||
} | ||
metricDstBacklog = newDstBacklog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any QA or tests for the above logic?
core/metric.go
Outdated
name = fmt.Sprintf("%s.backlog_oldest_height", namespaceRoot) | ||
callback = func(ctx context.Context, observer api.Int64Observer) error { | ||
observer.Observe(metricSrcBacklogOldestHeight.Load(), api.WithAttributes(attribute.String("chain", "src"))) | ||
observer.Observe(metricDstBacklogOldestHeight.Load(), api.WithAttributes(attribute.String("chain", "dst"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as processed_block_height
.
core/metric.go
Outdated
go func() { | ||
mux := http.NewServeMux() | ||
mux.Handle("/metrics", promhttp.Handler()) | ||
if err := http.ListenAndServe(addr, mux); err != nil && err != http.ErrServerClosed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above handler stop, the entire process never stops. Metrics will be lost even if the process is alive.
We want relayer processes stopping when metrics become unavailable.
Cloud infrastructures like K8s heal those crashed processes by restarting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the error handling here to panic on any types of errors.
https://github.com/hyperledger-labs/yui-relayer/pull/98/files#diff-d715f391f2ffa038c3167bfea1adf741c1f5e39f08349fbbacab780a594146a4R101
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
9029439
to
f1d5fe5
Compare
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
… down Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
@siburu Thanks for fixing. I commented. LGTM about the others! |
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
…rometheus-addr` field from the global config Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
Signed-off-by: Masanori Yoshida <masanori.yoshida@datachain.jp>
meterProvider *metric.MeterProvider | ||
meter api.Meter | ||
|
||
ProcessedBlockHeightGauge *Int64SyncGauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siburu
It would be helpful to write a succinct description for each metric here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I noticed that they are written in the initialization of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I should move them to here! Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
No description provided.