Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous traces and counters #2874

Merged
merged 2 commits into from
Jan 27, 2021
Merged

Miscellaneous traces and counters #2874

merged 2 commits into from
Jan 27, 2021

Conversation

newhoggy
Copy link
Contributor

@newhoggy newhoggy commented Jan 15, 2021

This PR tracks the work done for IntersectMBO/cardano-node#2261.

There is also work being done on the cardano-node repository for the same ticket: IntersectMBO/cardano-node#2289

@newhoggy newhoggy changed the base branch from master to fix-lint-warnings January 15, 2021 10:09
@newhoggy newhoggy force-pushed the fix-lint-warnings branch 2 times, most recently from 25ab7c7 to 6a9c17c Compare January 15, 2021 10:25
@newhoggy newhoggy force-pushed the miscellaneous-traces branch 5 times, most recently from 545bb2a to 4aa347b Compare January 18, 2021 09:06
@nfrisby
Copy link
Contributor

nfrisby commented Jan 19, 2021

@newhoggy I think this is for IntersectMBO/cardano-node#2261, is that right?

The existing "metric" pattern seems to do the accumulation of counters in cardano-node:Cardano.Tracing.Metrics. So the Consensus Layer should just emit the "I just now served a block" event, and the downstream "metric" will interpret that by atomically incrementing a counter. And so on for the similar metrics.

@nfrisby
Copy link
Contributor

nfrisby commented Jan 19, 2021

As you noticed, there is no "I just now served a block" event currently emitted. However, we do trace every mini protocol message sent/received, via the following tracers.

https://github.com/input-output-hk/ouroboros-network/blob/389a00b5c2b89b48aa98a96ca10f72f085b6e32c/ouroboros-consensus/src/Ouroboros/Consensus/Network/NodeToNode.hs#L319-L326

And they can be set here. (I think you can ignore the Serialised variants. I don't yet know what TxSub2 is about.)

https://github.com/input-output-hk/ouroboros-network/blob/389a00b5c2b89b48aa98a96ca10f72f085b6e32c/ouroboros-consensus/src/Ouroboros/Consensus/Node.hs#L122-L126

So in a sense the metrics could use those. But maybe more precise events could should be added for some extra stability, even in the "power user" tracer event API. @dcoutts, what do you think?

@nfrisby
Copy link
Contributor

nfrisby commented Jan 19, 2021

Are you counting roll forwards and rollbacks for the sake of https://jira.iohk.io/browse/CAD-2371? If so, I think you can instead use the existing SwitchedToAFork event. I elaborated in my comments on that Jira Issue -- I think it has sufficient info for that task.

@newhoggy
Copy link
Contributor Author

newhoggy commented Jan 19, 2021

This PR is actually for https://jira.iohk.io/browse/CAD-2487.

I understood roll forward to mean a header was served in chain sync. Is that an accurate interpretation or am I way off base?

@newhoggy
Copy link
Contributor Author

I rebased the PR to an earlier point in history so I don't have to refactor anything when working on cardano-node.

@nfrisby
Copy link
Contributor

nfrisby commented Jan 19, 2021

I understood roll forward to mean a header was served in chain sync. Is that an accurate interpretation or am I way off base?

I think that's accurate, yes. (My suggestion of SwitchedToAFork above was because I thought this PR might also relate to CAD-2371.)

The only caveat to that interpretation is that we might send the same header to the same peer multiple times. I think this is ultimately a caveat on the wording of the Issue IntersectMBO/cardano-node#2261: it's just slightly ambiguous.

My guess is that these metrics should indeed count each send of a repeated header, in which case I agree with your interpretation: every RollForward message contains exactly one block header. (A RollBackward includes neither a header nor a block -- it's just a "point".)

@newhoggy newhoggy force-pushed the miscellaneous-traces branch 2 times, most recently from 4364473 to 9373997 Compare January 20, 2021 05:26
@newhoggy newhoggy changed the base branch from fix-lint-warnings to master January 20, 2021 06:21
@newhoggy newhoggy marked this pull request as ready for review January 20, 2021 09:47
@newhoggy newhoggy requested a review from coot January 20, 2021 09:47
@coot
Copy link
Contributor

coot commented Jan 26, 2021

@nfrisby that's a good point. What I am worried about is that when setting logging output off by passing nullTracer (which is possible though node configuration), one will accidentally remove metrics or vice versa. Also tracing is causing quite a bit cpu usage, if we turn them more of them on this will cause more problems. I don't mind if there are logging tracers that are used for statistics as well, I just prefer the default to be keep them separate, and if not put them on the logging side.

@newhoggy newhoggy force-pushed the miscellaneous-traces branch 2 times, most recently from 7f93791 to b45d148 Compare January 26, 2021 11:07
@newhoggy newhoggy force-pushed the miscellaneous-traces branch 2 times, most recently from 6356302 to 463fa09 Compare January 26, 2021 23:16
@newhoggy newhoggy changed the title Miscellaneous traces Miscellaneous traces and counters Jan 26, 2021
@nfrisby
Copy link
Contributor

nfrisby commented Jan 27, 2021

What I am worried about is that when setting logging output off by passing nullTracer (which is possible though node configuration), one will accidentally remove metrics or vice versa.

Ah, I see. That seems like a good reason to have two different inputs in the Consensus Layer's interface (say, "loggingTracers" vs "metricTracers" or maybe "countingTracers") , which at some point in the Consensus Layer implementation get combined via <>. They could reuse the same event type, which would avoid the concern I shared above about having to classify each event.

That distinction/duplication could be reflected in the -node UX too.

@newhoggy
Copy link
Contributor Author

I've separated the counters from the tracers. How does it look now?

@newhoggy newhoggy force-pushed the miscellaneous-traces branch 5 times, most recently from 94a3ebd to 053da54 Compare January 27, 2021 04:28
@newhoggy newhoggy force-pushed the miscellaneous-traces branch 2 times, most recently from 88ad9eb to 9a4ce52 Compare January 27, 2021 12:09
@deepfire
Copy link
Contributor

@newhoggy, @nfrisby, I'm afraid that the Counter abstraction is making me feel ill at ease -- for several reasons:

  1. It violates the principle of separation of concerns that the contravariant tracers establish -- the library user just posts the information -- the policy (interpretation) happens elsewhere.

  2. The separation of concerns allows you to concentrate the policy definition -- whereas if you violate it, you disperse that definition across all source points.

  3. Another consequence is that tomorrow you can see someone else willing to enrich that trace -- and it will no longer be a Counter -- it already barely is (are two counters bundled together still a counter? Should it be called Counters instead?).

cc @dcoutts

@newhoggy
Copy link
Contributor Author

newhoggy commented Jan 27, 2021

There is concern about getting the abstraction right and the refactoring adds downstream refactoring risk.

Choosing the right abstraction for handling counters and tracers separately seems to be the sole remaining concern.

To make progress on IntersectMBO/cardano-node#2261, I will back out the refactoring changes and merge.

Then I'll create a new PR for the refactoring where more time can be set aside for careful consideration.

@newhoggy
Copy link
Contributor Author

I've moved the refactoring here: #2902

@newhoggy
Copy link
Contributor Author

bors merge

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jan 27, 2021

@iohk-bors iohk-bors bot merged commit ed24495 into master Jan 27, 2021
@iohk-bors iohk-bors bot deleted the miscellaneous-traces branch January 27, 2021 23:12
@nfrisby
Copy link
Contributor

nfrisby commented Jan 28, 2021

I like the decision to spit the PR into two separate decisions design steps. 👍

Unfortunately, just to clarify for future reference, it looks like Ouroboros.Network.Counter was still included in this PR and used in Ouroboros.Network.TxSubmission.Inbound. It's all local, so not a huge problem.

But if the work on the follow-up PR #2902 doesn't proceed very quickly, let's go ahead and remove that essentially unused Counter stuff in a small fixup PR. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants