Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context building slog sender #10300

Merged
merged 31 commits into from
Oct 29, 2024
Merged

Context building slog sender #10300

merged 31 commits into from
Oct 29, 2024

Conversation

usmanmani1122
Copy link
Contributor

@usmanmani1122 usmanmani1122 commented Oct 20, 2024

closes: #10269

Description

Adds a slog sender which will build various contexts along the way and report them along with the slogs for better logs querying and identification

Security Considerations

None

Scaling Considerations

This uses a json file storage

Documentation Considerations

This is a new slogger which can be opted into

Testing Considerations

This will be deployed on testnets (already deployed on one of the testnets and log link is added in a comment below)

Upgrade Considerations

This can be configured on existing deployments by bumping the telemetry package

@usmanmani1122 usmanmani1122 self-assigned this Oct 20, 2024
Copy link

cloudflare-workers-and-pages bot commented Oct 20, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3ddcc6d
Status: ✅  Deploy successful!
Preview URL: https://7204180f.agoric-sdk.pages.dev
Branch Preview URL: https://usman-context-aware-slogs.agoric-sdk.pages.dev

View logs

@usmanmani1122 usmanmani1122 changed the title Usman/context aware slogs Context building slog sender Oct 22, 2024
@usmanmani1122
Copy link
Contributor Author

Grafana logs link

@usmanmani1122 usmanmani1122 requested a review from mhofman October 22, 2024 11:23
@usmanmani1122 usmanmani1122 marked this pull request as ready for review October 22, 2024 11:23
@usmanmani1122 usmanmani1122 requested a review from a team as a code owner October 22, 2024 11:23
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a really good start!

Besides the specific comments there are 2 changes I'd like to see:

  • I am really skeptical of using a SQLite DB for persisting a simple JSON object. Can we just sync write as a JSON serialized value to a file? No need to fsync eother, I don't care about failure recovery in this case.
  • Can we extract the function that does transformation from a slog event into a log record into a separate module that doesn't depend on otel stuff? I believe @warner may be interested in using this context building logic on its own. I can be wrapped in a maker that takes the db object as power to persist / restore the trigger context. The rest is fully stateless.

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
@mhofman
Copy link
Member

mhofman commented Oct 22, 2024

Oh yeah I'd also love to see this processing on an existing slog file from a mainnet follower! I have plenty around if needed.

@usmanmani1122
Copy link
Contributor Author

  • I am really skeptical of using a SQLite DB for persisting a simple JSON object. Can we just sync write as a JSON serialized value to a file? No need to fsync eother, I don't care about failure recovery in this case.

Done

  • Can we extract the function that does transformation from a slog event into a log record into a separate module that doesn't depend on otel stuff? I believe @warner may be interested in using this context building logic on its own. I can be wrapped in a maker that takes the db object as power to persist / restore the trigger context. The rest is fully stateless.

Done. Separated a function logCreator for this

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback so far. Some more refactoring suggestions. I also really want us to figure out this timestamp issue as I'd like to be able to upload old slogs to GCP with the original time the events were generated as the native timestamp of the event.

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Show resolved Hide resolved
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more refactoring nits. Let's continue the timestamp discussion offline.

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
Comment on lines 158 to 165
case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH:
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START:
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: {
assert(!!blockContext);
break;
}
case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: {
assert(!!blockContext && !triggerContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these can be merged together

Suggested change
case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH:
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START:
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH: {
assert(!!blockContext);
break;
}
case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: {
assert(!!blockContext && !triggerContext);
case SLOG_TYPES.COSMIC_SWINGSET.END_BLOCK.FINISH:
case SLOG_TYPES.COSMIC_SWINGSET.BOOTSTRAP_BLOCK.FINISH:
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.START:
case SLOG_TYPES.COSMIC_SWINGSET.COMMIT.FINISH:
case SLOG_TYPES.COSMIC_SWINGSET.AFTER_COMMIT_STATS: {
assert(!!blockContext && !triggerContext);

packages/telemetry/src/context-aware-slog.js Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
@usmanmani1122 usmanmani1122 requested a review from mhofman October 25, 2024 19:37
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but holding my approval until I review the output of aa processed slog in case I missed anything. Is the grafana link up top still valid? I'll also look at the processed slog file you sent

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
const [secondsStr, fractionStr] = String(timestamp).split('.');
const seconds = parseInt(secondsStr, 10);
const nanoSeconds = parseInt(
(fractionStr || String(0)).padEnd(9, String(0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't trust numbers, and want to be resilient in case the fractionStr is more than 9 chars already.

In general I prefer string literals where possible

Suggested change
(fractionStr || String(0)).padEnd(9, String(0)),
(fractionStr.substr(0, 9) || '').padEnd(9, '0'),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to microtime.nowDouble documentation, it returns microseconds precision so I don't think we should ever get 9 decimal places right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precision of the source data doesn't mean anything for the representation as an IEEE 754 number. Those tend to do weird things often enough that I wouldn't bet on there being a max of 9 digits when stringified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I knew I wouldn't get it right just by staring at the code generating slog events. After looking at the output, looks like we have some tweaks to make.

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
}
// eslint-disable-next-line no-restricted-syntax
case SLOG_TYPES.COSMIC_SWINGSET.RUN.FINISH: {
triggerContext = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake on the logic of the trigger context persistence. We need to conditionally persist the context here:

Suggested change
triggerContext = null;
persistContext(finalBody.remainingBeans > 0 ? null : triggerContext);
triggerContext = null;

Then we can simply restore the context on run start for runNum === 0 I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we can persist an empty context instead (so you can keep null as missing).

Suggested change
triggerContext = null;
persistContext(finalBody.remainingBeans > 0 ? {} : triggerContext);
triggerContext = null;

Copy link
Contributor Author

@usmanmani1122 usmanmani1122 Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have no assertion on triggerContext here, it could be null at this point. So I think we should either do

persistContext(finalBody.remainingBeans || !triggerContext ? {} : triggerContext);

Or have an assertion:

assert(!!triggerContext);
persistContext(finalBody.remainingBeans ? {} : triggerContext)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a trigger context by this point. Let's add an assertion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that in the final version the > 0 part of the check was dropped, which causes the persisted context to always be empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are missing the context in case of negative remaining beans right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing the persisted context roughly always because the remaining beans is almost never 0.

packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/context-aware-slog.js Outdated Show resolved Hide resolved
@usmanmani1122 usmanmani1122 requested a review from mhofman October 26, 2024 09:27
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified that the processor seem to generate coherent context data.

Once the slicing of timestamp fraction is fixed, feel free to merge.

Also consider adding a context-aware-slog-file.js (or better name) that simply writes the contextualized data to a file. Here is what I used (to experiment with ingest-slog), but feel free to adapt (or add support for persisting context, probably by extracting the tool from the otel file into a standalone module).

import { makeFsStreamWriter } from '@agoric/internal/src/node/fs-stream.js';
import { makeContextualSlogProcessor } from './context-aware-slog.js';
import { serializeSlogObj } from './serialize-slog-obj.js';

/**
 * @import {MakeSlogSenderOptions as Options} from './index.js'
 */

/**
 * @param {Options} options
 */
export const makeSlogSender = async options => {
  const { CHAIN_ID, CONTEXTUAL_SLOGFILE } = options.env || {};
  if (!CONTEXTUAL_SLOGFILE)
    return console.warn(
      'Ignoring invocation of slogger "context-aware-slog-file" without the presence of "CONTEXTUAL_SLOGFILE"',
    );

  const stream = await makeFsStreamWriter(CONTEXTUAL_SLOGFILE);

  if (!stream) {
    return undefined;
  }

  const contextualSlogProcessor = makeContextualSlogProcessor({
    'chain-id': CHAIN_ID,
  });

  /**
   * @param {import('./context-aware-slog.js').Slog} slog
   */
  const slogSender = slog => {
    const contextualizedSlog = contextualSlogProcessor(slog);

    // eslint-disable-next-line prefer-template
    stream.write(serializeSlogObj(contextualizedSlog) + '\n').catch(() => {});
  };

  return Object.assign(slogSender, {
    forceFlush: () => stream.flush(),
    shutdown: () => stream.close(),
  });
};

packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
@mhofman mhofman mentioned this pull request Oct 28, 2024
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about the return value of slogSender. Let's make sure we don't end up with unhandled rejections.

packages/telemetry/src/context-aware-slog-file.js Outdated Show resolved Hide resolved
packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
mergify bot added a commit that referenced this pull request Oct 28, 2024
refs: #10300

Incidental

Best reviewed commit-by-commit

## Description

While verifying #10300 I ran into some errors and lack of stdout streaming features. This is what I arrived at to let me process some slog files manually.

### Security Considerations

None

### Scaling Considerations

None production impacting

This adds a new block throttle mechanism to the ingest-slog tool, while relaxing the line based throttle.

### Documentation Considerations

None

### Testing Considerations

Manually tested with the slog sender detailed in #10300 (review).

### Upgrade Considerations

Affects chain software, but only the optional telemetry side. Not consensus affecting.
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix the usage of non bound methods.

packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
packages/telemetry/src/otel-context-aware-slog.js Outdated Show resolved Hide resolved
@usmanmani1122 usmanmani1122 requested a review from mhofman October 28, 2024 21:04
@mhofman mhofman added the automerge:squash Automatically squash merge label Oct 28, 2024
@mhofman mhofman force-pushed the usman/context-aware-slogs branch from 7046bbb to 3ddcc6d Compare October 28, 2024 23:42
@mergify mergify bot merged commit acbd3ae into master Oct 29, 2024
150 checks passed
@mergify mergify bot deleted the usman/context-aware-slogs branch October 29, 2024 00:19
Jorge-Lopes pushed a commit that referenced this pull request Oct 29, 2024
closes: #10269

## Description
Adds a slog sender which will build various contexts along the way and report them along with the slogs for better logs querying and identification

### Security Considerations
None

### Scaling Considerations
This uses a json file storage

### Documentation Considerations
This is a new slogger which can be opted into

### Testing Considerations
This will be deployed on testnets (already deployed on one of the testnets and log link is added in a comment below)

### Upgrade Considerations
This can be configured on existing deployments by bumping the telemetry package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:squash Automatically squash merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

context building slog sender
2 participants