Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slogSender using positions in file instead of mmap #8889

Merged
merged 11 commits into from
Feb 14, 2024
Merged

Conversation

turadg
Copy link
Member

@turadg turadg commented Feb 10, 2024

refs: #8636

Description

On the path to Node 20 compatibility, we need to drop the bufferfromfile dependency. (It doesn't work in Node 20 and fixing it is prohibitive: agoric-labs/BufferFromFile#2). We decided to give on on mmap and just use file offsets.

This refactors flight-recorder.js to abstract the file IO and implements an alternate IO using sync FS methods.

Then it changes the code using the mmap-ed IO to use the fs IO.

Finally it removes the mmap-ed version. The factoring could be simplified more after that but imo it helps to have the separate concerns in separate functions.

Security Considerations

n/a

Scaling Considerations

Writes are async. Clients can forceFlush when it matters. The tests do this to ensure they're reading after the writes.

Documentation Considerations

Nothing relevant.

Testing Considerations

I think the existing test coverage is enough.

Upgrade Considerations

n/a

@turadg turadg requested review from mhofman and michaelfig February 10, 2024 01:44
Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach and architecture looks good!

I don't know what your plan is for what you want to do before you take this PR out of draft, so please forgive me if I'm pointing out what you consider to be already obvious.

packages/telemetry/src/flight-recorder.js Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
@turadg
Copy link
Member Author

turadg commented Feb 10, 2024

I don't know what your plan is for what you want to do before you take this PR out of draft, so please forgive me if I'm pointing out what you consider to be already obvious.

To get it out of draft I want:

  • CI passing
  • some input on whether it's okay to use synchronous IO everywhere
  • input on a test plan. I've never used this tool so I don't know all its uses and whether CI has enough coverage

@turadg turadg requested a review from michaelfig February 12, 2024 17:12
@mhofman
Copy link
Member

mhofman commented Feb 12, 2024

We discussed using async IO but slogSender itself is synchronous so I don't see a safe way to make it async without changing the slogSender API. If that's advised I'll do that as well.

haven't reviewed or read any comment yet, but while the send is sync, there is an async flush operation which can be used to guarantee previous writes succeeded.

@turadg
Copy link
Member Author

turadg commented Feb 12, 2024

We discussed using async IO but slogSender itself is synchronous so I don't see a safe way to make it async without changing the slogSender API. If that's advised I'll do that as well.

haven't reviewed or read any comment yet, but while the send is sync, there is an async flush operation which can be used to guarantee previous writes succeeded.

Thanks for pointing that out. I'm convinced I can make the writes async without any API changes. But it does add a bit of work so please let me know whether you think it's necessary.

@mhofman
Copy link
Member

mhofman commented Feb 12, 2024

Thanks for pointing that out. I'm convinced I can make the writes async without any API changes. But it does add a bit of work so please let me know whether you think it's necessary.

I am slightly concerned by the perf impact of the sync fs API for something enabled by default by all validators. I would not be as concerned if the slog sender agent defaulted to a subprocess, which we should consider doing instead. If not, we do have utilities around to build async queues.

@turadg turadg marked this pull request as ready for review February 12, 2024 22:11
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass reviewing commit-by-commit, spread over the day when I had time. As such some earlier comments may no longer be accurate.

The request for change is for the queueing mechanism, and the params of the writeJSON.

packages/telemetry/src/make-slog-sender.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Show resolved Hide resolved
@@ -31,10 +31,14 @@ const I_ARENA_START = 4 * BigUint64Array.BYTES_PER_ELEMENT;

const RECORD_HEADER_SIZE = BigUint64Array.BYTES_PER_ELEMENT;

/**
* Initializes a circular buffer with the given size, creating the buffer file if it doesn't exist or is not large enough.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I've observed that with the current mmap implementation, resizing does not actually work (crashes if I remember, I forgot to write down my findings in #8425)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, more reason for this PR I suppose. Worth writing it there now?

packages/telemetry/src/flight-recorder.js Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
packages/telemetry/src/ingest-slog-entrypoint.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
@mhofman
Copy link
Member

mhofman commented Feb 13, 2024

I think the existing test coverage is enough.

For good measure I'll force integration as that causes more stress on the flight-recorder

@mhofman mhofman added the force:integration Force integration tests to run on PR label Feb 13, 2024
@turadg turadg force-pushed the 8636-fs-slogSender branch from a859475 to 3612f98 Compare February 13, 2024 16:33
@turadg turadg requested a review from mhofman February 13, 2024 16:33
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far.

I'd like to re-do a full pass after a rebase, especially if you can split the commit titled feat: simple CircularBuffer with fs offsets

packages/telemetry/src/flight-recorder.js Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
packages/telemetry/src/flight-recorder.js Show resolved Hide resolved
Comment on lines 242 to 227
fs.readSync(file.fd, outbuf, {
length: firstReadLength,
position: Number(readStart) + I_ARENA_START,
});
if (firstReadLength < outbuf.byteLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this suggestion?

packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved
@turadg turadg force-pushed the 8636-fs-slogSender branch from 3612f98 to e7b2d46 Compare February 13, 2024 23:16
@turadg turadg requested a review from mhofman February 13, 2024 23:17
@turadg turadg added the automerge:rebase Automatically rebase updates, then merge label Feb 14, 2024
@mhofman mhofman removed the automerge:rebase Automatically rebase updates, then merge label Feb 14, 2024
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring this!

Approving, but I removed the automerge label in case you want to address the read bytes / length checks.

Comment on lines -221 to +175
arena.set(record.subarray(0, firstWriteLength), Number(circEnd));
if (firstWriteLength < record.byteLength) {
// Write to the beginning of the arena.
arena.set(record.subarray(firstWriteLength, record.byteLength), 0);
}
header.setBigUint64(
I_CIRC_END,
(circEnd + BigInt(record.byteLength)) % arenaSize,
);

return writeRecord(record, firstWriteLength, circEnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that technically the arena header is now updated before the record is written, but that is not consequential

Comment on lines +130 to 122
// Allocate for the data and a header
const record = new Uint8Array(RECORD_HEADER_SIZE + data.byteLength);
// Set the data, after the header
record.set(data, RECORD_HEADER_SIZE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never realized we were copying the data so many times, but that under the future improvement category.

Comment on lines 238 to 229
const bytesRead = fs.readSync(file.fd, outbuf, {
length: firstReadLength,
position: Number(readStart) + I_ARENA_START,
});

if (bytesRead < outbuf.byteLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I went back to read the caller's logic, and basically we can use either bytesRead or firstReadLength here because we should technically be checking that bytesRead === firstReadLength in the first place.

Since we're pre-allocating a file the data should be available, but there are technically signal cases where the read may still return less data than was requested.

None of the read in this implementation seem to check the actual number of bytes read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out bytesRead could come up short. This never had a way to handle that so I'll just assert that it didn't happen.

// TS saying options bag not available
0,
firstWriteLength,
Number(circEnd) + header.byteLength,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to be consistent this should also be I_ARENA_START (or revert the others and use header.byteLength everywhere)

@turadg turadg force-pushed the 8636-fs-slogSender branch from e7b2d46 to 515ef12 Compare February 14, 2024 15:37
@turadg turadg added the automerge:rebase Automatically rebase updates, then merge label Feb 14, 2024
@mergify mergify bot merged commit 23a970d into master Feb 14, 2024
69 of 75 checks passed
@mergify mergify bot deleted the 8636-fs-slogSender branch February 14, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge force:integration Force integration tests to run on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants