Context building slog sender #10300

usmanmani1122 · 2024-10-20T12:55:29Z

Description

Adds a slog sender which will build various contexts along the way and report them along with the slogs for better logs querying and identification

Security Considerations

None

Scaling Considerations

This uses a json file storage

Documentation Considerations

This is a new slogger which can be opted into

Testing Considerations

This will be deployed on testnets (already deployed on one of the testnets and log link is added in a comment below)

Upgrade Considerations

This can be configured on existing deployments by bumping the telemetry package

cloudflare-workers-and-pages · 2024-10-20T12:56:06Z

Deploying agoric-sdk with Cloudflare Pages

Latest commit:	`3ddcc6d`
Status:	✅ Deploy successful!
Preview URL:	https://7204180f.agoric-sdk.pages.dev
Branch Preview URL:	https://usman-context-aware-slogs.agoric-sdk.pages.dev

View logs

usmanmani1122 · 2024-10-22T11:09:27Z

Grafana logs link

mhofman

This looks like a really good start!

Besides the specific comments there are 2 changes I'd like to see:

I am really skeptical of using a SQLite DB for persisting a simple JSON object. Can we just sync write as a JSON serialized value to a file? No need to fsync eother, I don't care about failure recovery in this case.
Can we extract the function that does transformation from a slog event into a log record into a separate module that doesn't depend on otel stuff? I believe @warner may be interested in using this context building logic on its own. I can be wrapped in a maker that takes the db object as power to persist / restore the trigger context. The rest is fully stateless.

packages/telemetry/src/context-aware-slog.js

mhofman · 2024-10-22T21:53:28Z

Oh yeah I'd also love to see this processing on an existing slog file from a mainnet follower! I have plenty around if needed.

packages/telemetry/src/context-aware-slog.js

usmanmani1122 · 2024-10-23T13:16:07Z

I am really skeptical of using a SQLite DB for persisting a simple JSON object. Can we just sync write as a JSON serialized value to a file? No need to fsync eother, I don't care about failure recovery in this case.

Done

Can we extract the function that does transformation from a slog event into a log record into a separate module that doesn't depend on otel stuff? I believe @warner may be interested in using this context building logic on its own. I can be wrapped in a maker that takes the db object as power to persist / restore the trigger context. The rest is fully stateless.

Done. Separated a function logCreator for this

mhofman

Thanks for addressing the feedback so far. Some more refactoring suggestions. I also really want us to figure out this timestamp issue as I'd like to be able to upload old slogs to GCP with the original time the events were generated as the native timestamp of the event.

packages/telemetry/src/context-aware-slog.js

mhofman

A few more refactoring nits. Let's continue the timestamp discussion offline.

packages/telemetry/src/context-aware-slog.js

mhofman · 2024-10-24T13:45:32Z

packages/telemetry/src/context-aware-slog.js

+      case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH:
+      case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START:
+      case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: {
+        assert(!!blockContext);
+        break;
+      }
+      case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: {
+        assert(!!blockContext && !triggerContext);


I think all these can be merged together

Suggested change

case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH:

case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START:

case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: {

assert(!!blockContext);

break;

}

case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: {

assert(!!blockContext && !triggerContext);

case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH:

case SLOG_TYPES.COSMIC_SWINGSET.BOOTSTRAP_BLOCK.FINISH:

case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START:

case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH:

case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: {

assert(!!blockContext && !triggerContext);

packages/telemetry/src/context-aware-slog.js

mhofman

This looks good, but holding my approval until I review the output of aa processed slog in case I missed anything. Is the grafana link up top still valid? I'll also look at the processed slog file you sent

packages/telemetry/src/context-aware-slog.js

mhofman · 2024-10-25T19:57:53Z

packages/telemetry/src/otel-context-aware-slog.js

+    const [secondsStr, fractionStr] = String(timestamp).split('.');
+    const seconds = parseInt(secondsStr, 10);
+    const nanoSeconds = parseInt(
+      (fractionStr || String(0)).padEnd(9, String(0)),


I don't trust numbers, and want to be resilient in case the fractionStr is more than 9 chars already.

In general I prefer string literals where possible

Suggested change

(fractionStr || String(0)).padEnd(9, String(0)),

(fractionStr.substr(0, 9) || '').padEnd(9, '0'),

According to microtime.nowDouble documentation, it returns microseconds precision so I don't think we should ever get 9 decimal places right?

Precision of the source data doesn't mean anything for the representation as an IEEE 754 number. Those tend to do weird things often enough that I wouldn't bet on there being a max of 9 digits when stringified.

mhofman

Well I knew I wouldn't get it right just by staring at the code generating slog events. After looking at the output, looks like we have some tweaks to make.

packages/telemetry/src/context-aware-slog.js

mhofman · 2024-10-25T21:53:45Z

packages/telemetry/src/context-aware-slog.js

+      }
+      // eslint-disable-next-line no-restricted-syntax
+      case SLOG_TYPES.COSMIC_SWINGSET.RUN.FINISH: {
+        triggerContext = null;


I made a mistake on the logic of the trigger context persistence. We need to conditionally persist the context here:

Suggested change

triggerContext = null;

persistContext(finalBody.remainingBeans > 0 ? null : triggerContext);

triggerContext = null;

Then we can simply restore the context on run start for runNum === 0 I think.

Actually I think we can persist an empty context instead (so you can keep null as missing).

Suggested change

triggerContext = null;

persistContext(finalBody.remainingBeans > 0 ? {} : triggerContext);

triggerContext = null;

Since we have no assertion on triggerContext here, it could be null at this point. So I think we should either do

persistContext(finalBody.remainingBeans || !triggerContext ? {} : triggerContext);

Or have an assertion:

assert(!!triggerContext); persistContext(finalBody.remainingBeans ? {} : triggerContext)

There should be a trigger context by this point. Let's add an assertion

I just realized that in the final version the > 0 part of the check was dropped, which causes the persisted context to always be empty.

So we are missing the context in case of negative remaining beans right?

We are missing the persisted context roughly always because the remaining beans is almost never 0.

packages/telemetry/src/context-aware-slog.js

packages/telemetry/src/otel-context-aware-slog.js

mhofman

I have verified that the processor seem to generate coherent context data.

Once the slicing of timestamp fraction is fixed, feel free to merge.

Also consider adding a context-aware-slog-file.js (or better name) that simply writes the contextualized data to a file. Here is what I used (to experiment with ingest-slog), but feel free to adapt (or add support for persisting context, probably by extracting the tool from the otel file into a standalone module).

import { makeFsStreamWriter } from '@agoric/internal/src/node/fs-stream.js';
import { makeContextualSlogProcessor } from './context-aware-slog.js';
import { serializeSlogObj } from './serialize-slog-obj.js';

/**
 * @import {MakeSlogSenderOptions as Options} from './index.js'
 */

/**
 * @param {Options} options
 */
export const makeSlogSender = async options => {
  const { CHAIN_ID, CONTEXTUAL_SLOGFILE } = options.env || {};
  if (!CONTEXTUAL_SLOGFILE)
    return console.warn(
      'Ignoring invocation of slogger "context-aware-slog-file" without the presence of "CONTEXTUAL_SLOGFILE"',
    );

  const stream = await makeFsStreamWriter(CONTEXTUAL_SLOGFILE);

  if (!stream) {
    return undefined;
  }

  const contextualSlogProcessor = makeContextualSlogProcessor({
    'chain-id': CHAIN_ID,
  });

  /**
   * @param {import('./context-aware-slog.js').Slog} slog
   */
  const slogSender = slog => {
    const contextualizedSlog = contextualSlogProcessor(slog);

    // eslint-disable-next-line prefer-template
    stream.write(serializeSlogObj(contextualizedSlog) + '\n').catch(() => {});
  };

  return Object.assign(slogSender, {
    forceFlush: () => stream.flush(),
    shutdown: () => stream.close(),
  });
};

packages/telemetry/src/otel-context-aware-slog.js

mhofman

I forgot about the return value of slogSender. Let's make sure we don't end up with unhandled rejections.

packages/telemetry/src/context-aware-slog-file.js

packages/telemetry/src/otel-context-aware-slog.js

refs: #10300 Incidental Best reviewed commit-by-commit ## Description While verifying #10300 I ran into some errors and lack of stdout streaming features. This is what I arrived at to let me process some slog files manually. ### Security Considerations None ### Scaling Considerations None production impacting This adds a new block throttle mechanism to the ingest-slog tool, while relaxing the line based throttle. ### Documentation Considerations None ### Testing Considerations Manually tested with the slog sender detailed in #10300 (review). ### Upgrade Considerations Affects chain software, but only the optional telemetry side. Not consensus affecting.

mhofman

Let's fix the usage of non bound methods.

packages/telemetry/src/otel-context-aware-slog.js

closes: #10269 ## Description Adds a slog sender which will build various contexts along the way and report them along with the slogs for better logs querying and identification ### Security Considerations None ### Scaling Considerations This uses a json file storage ### Documentation Considerations This is a new slogger which can be opted into ### Testing Considerations This will be deployed on testnets (already deployed on one of the testnets and log link is added in a comment below) ### Upgrade Considerations This can be configured on existing deployments by bumping the telemetry package

usmanmani1122 added 2 commits October 18, 2024 17:14

some cases with logging to file

2181df1

cases handled, remaining persistence and reporting to otel libraries

aba689a

usmanmani1122 self-assigned this Oct 20, 2024

usmanmani1122 and others added 7 commits October 21, 2024 12:17

Merge branch 'master' into usman/context-aware-slogs

b285bf3

persistence

36eeddf

otel reporting

99cbfa7

minor final slog changes and incorporate scaling considerations

f258b97

cleanup

1d40420

Merge branch 'master' into usman/context-aware-slogs

175affb

remove unnecessary async usage

7d4a9b4

usmanmani1122 changed the title ~~Usman/context aware slogs~~ Context building slog sender Oct 22, 2024

usmanmani1122 requested a review from mhofman October 22, 2024 11:23

usmanmani1122 marked this pull request as ready for review October 22, 2024 11:23

usmanmani1122 requested a review from a team as a code owner October 22, 2024 11:23

move context data to labels

696d937

mhofman requested changes Oct 22, 2024

View reviewed changes

usmanmani1122 added 2 commits October 23, 2024 16:13

address mathieu comments

43fec2d

merge master

5b432dd

mhofman reviewed Oct 23, 2024

View reviewed changes

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved

usmanmani1122 added 2 commits October 23, 2024 16:39

yarn

00bc5e4

remove async usage

14deff7

mhofman reviewed Oct 23, 2024

View reviewed changes

usmanmani1122 added 3 commits October 24, 2024 14:04

address mathieu comments 2.0

638d536

local runner script

67d577a

remove local runner script

bc8baa1

mhofman reviewed Oct 24, 2024

View reviewed changes

fix timestamp reporting

4cb1b6f

usmanmani1122 requested a review from mhofman October 25, 2024 19:37

mhofman reviewed Oct 25, 2024

View reviewed changes

usmanmani1122 added 3 commits October 26, 2024 14:25

address mathieu comments 4.0

f5d5722

revert temp file

6b6927f

Merge branch 'master' into usman/context-aware-slogs

8eea7b9

usmanmani1122 requested a review from mhofman October 26, 2024 09:27

usmanmani1122 and others added 2 commits October 27, 2024 12:27

address mathieu comments 4.1

f47d99c

Merge branch 'master' into usman/context-aware-slogs

d432dea

mhofman reviewed Oct 27, 2024

View reviewed changes

packages/telemetry/src/otel-context-aware-slog.js Show resolved Hide resolved

mhofman approved these changes Oct 28, 2024

View reviewed changes

mhofman mentioned this pull request Oct 28, 2024

Telemetry fixes #10343

Merged

usmanmani1122 added 2 commits October 28, 2024 14:28

address mathieu comments 4.0

d10fccd

Merge branch 'master' into usman/context-aware-slogs

e292196

mhofman reviewed Oct 28, 2024

View reviewed changes

packages/telemetry/src/context-aware-slog-file.js Outdated Show resolved Hide resolved

packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved

Merge branch 'master' into usman/context-aware-slogs

301e3bf

usmanmani1122 and others added 2 commits October 29, 2024 01:01

address mathien comments 6.0

4392faa

Merge branch 'master' into usman/context-aware-slogs

25723d6

mhofman reviewed Oct 28, 2024

View reviewed changes

packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved

packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved

packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved

bound methods

3ddcc6d

usmanmani1122 requested a review from mhofman October 28, 2024 21:04

mhofman added the automerge:squash Automatically squash merge label Oct 28, 2024

mhofman approved these changes Oct 28, 2024

View reviewed changes

mhofman force-pushed the usman/context-aware-slogs branch from 7046bbb to 3ddcc6d Compare October 28, 2024 23:42

mergify bot merged commit acbd3ae into master Oct 29, 2024
150 checks passed

mergify bot deleted the usman/context-aware-slogs branch October 29, 2024 00:19

usmanmani1122 mentioned this pull request Dec 6, 2024

fix(telemetry): Empty context persisted when remaining beans are negative after run finish #10635

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context building slog sender #10300

Context building slog sender #10300

usmanmani1122 commented Oct 20, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 20, 2024 •

edited

Loading

usmanmani1122 commented Oct 22, 2024

mhofman left a comment

mhofman commented Oct 22, 2024

usmanmani1122 commented Oct 23, 2024

mhofman left a comment

mhofman left a comment

mhofman Oct 24, 2024

mhofman left a comment

mhofman Oct 25, 2024

usmanmani1122 Oct 26, 2024

mhofman Oct 26, 2024

usmanmani1122 Oct 27, 2024

mhofman left a comment

mhofman Oct 25, 2024

mhofman Oct 25, 2024

usmanmani1122 Oct 26, 2024 •

edited

Loading

mhofman Oct 26, 2024

usmanmani1122 Oct 27, 2024

mhofman Dec 5, 2024

usmanmani1122 Dec 6, 2024

mhofman Dec 6, 2024

mhofman left a comment

mhofman left a comment

mhofman left a comment

	(fractionStr \|\| String(0)).padEnd(9, String(0)),
	(fractionStr.substr(0, 9) \|\| '').padEnd(9, '0'),

	triggerContext = null;
	persistContext(finalBody.remainingBeans > 0 ? null : triggerContext);
	triggerContext = null;

Context building slog sender #10300

Context building slog sender #10300

Conversation

usmanmani1122 commented Oct 20, 2024 • edited Loading

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

cloudflare-workers-and-pages bot commented Oct 20, 2024 • edited Loading

Deploying agoric-sdk with Cloudflare Pages

usmanmani1122 commented Oct 22, 2024

mhofman left a comment

Choose a reason for hiding this comment

mhofman commented Oct 22, 2024

usmanmani1122 commented Oct 23, 2024

mhofman left a comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

usmanmani1122 Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

usmanmani1122 commented Oct 20, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 20, 2024 •

edited

Loading

usmanmani1122 Oct 26, 2024 •

edited

Loading