slogSender using positions in file instead of mmap #8889

turadg · 2024-02-10T01:44:14Z

Description

On the path to Node 20 compatibility, we need to drop the bufferfromfile dependency. (It doesn't work in Node 20 and fixing it is prohibitive: agoric-labs/BufferFromFile#2). We decided to give on on mmap and just use file offsets.

This refactors flight-recorder.js to abstract the file IO and implements an alternate IO using sync FS methods.

Then it changes the code using the mmap-ed IO to use the fs IO.

Finally it removes the mmap-ed version. The factoring could be simplified more after that but imo it helps to have the separate concerns in separate functions.

Security Considerations

n/a

Scaling Considerations

Writes are async. Clients can forceFlush when it matters. The tests do this to ensure they're reading after the writes.

Documentation Considerations

Nothing relevant.

Testing Considerations

I think the existing test coverage is enough.

Upgrade Considerations

n/a

michaelfig

Approach and architecture looks good!

I don't know what your plan is for what you want to do before you take this PR out of draft, so please forgive me if I'm pointing out what you consider to be already obvious.

packages/telemetry/src/flight-recorder.js

turadg · 2024-02-10T02:39:29Z

I don't know what your plan is for what you want to do before you take this PR out of draft, so please forgive me if I'm pointing out what you consider to be already obvious.

To get it out of draft I want:

CI passing
some input on whether it's okay to use synchronous IO everywhere
input on a test plan. I've never used this tool so I don't know all its uses and whether CI has enough coverage

mhofman · 2024-02-12T17:16:18Z

We discussed using async IO but slogSender itself is synchronous so I don't see a safe way to make it async without changing the slogSender API. If that's advised I'll do that as well.

haven't reviewed or read any comment yet, but while the send is sync, there is an async flush operation which can be used to guarantee previous writes succeeded.

turadg · 2024-02-12T19:18:52Z

We discussed using async IO but slogSender itself is synchronous so I don't see a safe way to make it async without changing the slogSender API. If that's advised I'll do that as well.

haven't reviewed or read any comment yet, but while the send is sync, there is an async flush operation which can be used to guarantee previous writes succeeded.

Thanks for pointing that out. I'm convinced I can make the writes async without any API changes. But it does add a bit of work so please let me know whether you think it's necessary.

mhofman · 2024-02-12T19:43:41Z

Thanks for pointing that out. I'm convinced I can make the writes async without any API changes. But it does add a bit of work so please let me know whether you think it's necessary.

I am slightly concerned by the perf impact of the sync fs API for something enabled by default by all validators. I would not be as concerned if the slog sender agent defaulted to a subprocess, which we should consider doing instead. If not, we do have utilities around to build async queues.

mhofman

I did a first pass reviewing commit-by-commit, spread over the day when I had time. As such some earlier comments may no longer be accurate.

The request for change is for the queueing mechanism, and the params of the writeJSON.

packages/telemetry/src/make-slog-sender.js

packages/telemetry/src/flight-recorder.js

mhofman · 2024-02-12T17:52:44Z

packages/telemetry/src/flight-recorder.js

@@ -31,10 +31,14 @@ const I_ARENA_START = 4 * BigUint64Array.BYTES_PER_ELEMENT;

 const RECORD_HEADER_SIZE = BigUint64Array.BYTES_PER_ELEMENT;

+/**
+ * Initializes a circular buffer with the given size, creating the buffer file if it doesn't exist or is not large enough.


FYI I've observed that with the current mmap implementation, resizing does not actually work (crashes if I remember, I forgot to write down my findings in #8425)

Ah, more reason for this PR I suppose. Worth writing it there now?

packages/telemetry/src/flight-recorder.js

packages/telemetry/src/ingest-slog-entrypoint.js

packages/telemetry/src/flight-recorder.js

mhofman · 2024-02-13T01:16:30Z

I think the existing test coverage is enough.

For good measure I'll force integration as that causes more stress on the flight-recorder

mhofman

Looks great so far.

I'd like to re-do a full pass after a rebase, especially if you can split the commit titled feat: simple CircularBuffer with fs offsets

packages/telemetry/src/flight-recorder.js

mhofman · 2024-02-13T21:23:36Z

packages/telemetry/src/flight-recorder.js

+    fs.readSync(file.fd, outbuf, {
+      length: firstReadLength,
+      position: Number(readStart) + I_ARENA_START,
+    });
+    if (firstReadLength < outbuf.byteLength) {


What do you think about this suggestion?

packages/telemetry/src/flight-recorder.js

mhofman

Thanks for refactoring this!

Approving, but I removed the automerge label in case you want to address the read bytes / length checks.

mhofman · 2024-02-14T02:32:58Z

packages/telemetry/src/flight-recorder.js

-    arena.set(record.subarray(0, firstWriteLength), Number(circEnd));
-    if (firstWriteLength < record.byteLength) {
-      // Write to the beginning of the arena.
-      arena.set(record.subarray(firstWriteLength, record.byteLength), 0);
-    }
    header.setBigUint64(
      I_CIRC_END,
      (circEnd + BigInt(record.byteLength)) % arenaSize,
    );
+
+    return writeRecord(record, firstWriteLength, circEnd);


I note that technically the arena header is now updated before the record is written, but that is not consequential

mhofman · 2024-02-14T02:42:46Z

packages/telemetry/src/flight-recorder.js

+    // Allocate for the data and a header
    const record = new Uint8Array(RECORD_HEADER_SIZE + data.byteLength);
+    // Set the data, after the header
    record.set(data, RECORD_HEADER_SIZE);


I never realized we were copying the data so many times, but that under the future improvement category.

mhofman · 2024-02-14T02:52:21Z

packages/telemetry/src/flight-recorder.js

+    const bytesRead = fs.readSync(file.fd, outbuf, {
+      length: firstReadLength,
+      position: Number(readStart) + I_ARENA_START,
+    });
+
+    if (bytesRead < outbuf.byteLength) {


Ok so I went back to read the caller's logic, and basically we can use either bytesRead or firstReadLength here because we should technically be checking that bytesRead === firstReadLength in the first place.

Since we're pre-allocating a file the data should be available, but there are technically signal cases where the read may still return less data than was requested.

None of the read in this implementation seem to check the actual number of bytes read.

Thanks for pointing out bytesRead could come up short. This never had a way to handle that so I'll just assert that it didn't happen.

mhofman · 2024-02-14T02:57:57Z

packages/telemetry/src/flight-recorder.js

+      // TS saying options bag not available
+      0,
+      firstWriteLength,
+      Number(circEnd) + header.byteLength,


Nit: to be consistent this should also be I_ARENA_START (or revert the others and use header.byteLength everywhere)

endojs/endo#2032

turadg requested review from mhofman and michaelfig February 10, 2024 01:44

michaelfig reviewed Feb 10, 2024

View reviewed changes

packages/telemetry/src/flight-recorder.js Show resolved Hide resolved

packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved

packages/telemetry/src/flight-recorder.js Outdated Show resolved Hide resolved

turadg requested a review from michaelfig February 12, 2024 17:12

turadg marked this pull request as ready for review February 12, 2024 22:11

mhofman requested changes Feb 13, 2024

View reviewed changes

mhofman added the force:integration Force integration tests to run on PR label Feb 13, 2024

turadg force-pushed the 8636-fs-slogSender branch from a859475 to 3612f98 Compare February 13, 2024 16:33

turadg requested a review from mhofman February 13, 2024 16:33

mhofman reviewed Feb 13, 2024

View reviewed changes

turadg force-pushed the 8636-fs-slogSender branch from 3612f98 to e7b2d46 Compare February 13, 2024 23:16

turadg requested a review from mhofman February 13, 2024 23:17

turadg added the automerge:rebase Automatically rebase updates, then merge label Feb 14, 2024

mhofman removed the automerge:rebase Automatically rebase updates, then merge label Feb 14, 2024

mhofman approved these changes Feb 14, 2024

View reviewed changes

turadg added 10 commits February 14, 2024 07:37

lint: work around Endo lint bug

8b3d751

endojs/endo#2032

lint: conform array-foreach

3aeabd2

chore(types): MakeSlogSender

cf8ebe1

refactor(flight-recorder): makeSlogSenderFromBuffer

45fad0e

refactor: fsp

be2221b

chore(flight-record): require buffer size

c0c715b

refactor: IO out of CircularBuffer

7622c94

feat: simple CircularBuffer with fs offsets

8d9cb7a

test: pre-serialization

a043651

feat: use writeSync slogSender

47a2add

chore: remove mmap-based CircularBuffer

515ef12

turadg force-pushed the 8636-fs-slogSender branch from e7b2d46 to 515ef12 Compare February 14, 2024 15:37

turadg added the automerge:rebase Automatically rebase updates, then merge label Feb 14, 2024

mergify bot merged commit 23a970d into master Feb 14, 2024
69 of 75 checks passed

mergify bot deleted the 8636-fs-slogSender branch February 14, 2024 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slogSender using positions in file instead of mmap #8889

slogSender using positions in file instead of mmap #8889

turadg commented Feb 10, 2024 •

edited

Loading

michaelfig left a comment

turadg commented Feb 10, 2024

mhofman commented Feb 12, 2024

turadg commented Feb 12, 2024

mhofman commented Feb 12, 2024

mhofman left a comment •

edited

Loading

mhofman Feb 12, 2024

turadg Feb 13, 2024

mhofman commented Feb 13, 2024

mhofman left a comment

mhofman Feb 13, 2024

mhofman left a comment

mhofman Feb 14, 2024

mhofman Feb 14, 2024

mhofman Feb 14, 2024

turadg Feb 14, 2024

mhofman Feb 14, 2024

slogSender using positions in file instead of mmap #8889

slogSender using positions in file instead of mmap #8889

Conversation

turadg commented Feb 10, 2024 • edited Loading

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

michaelfig left a comment

Choose a reason for hiding this comment

turadg commented Feb 10, 2024

mhofman commented Feb 12, 2024

turadg commented Feb 12, 2024

mhofman commented Feb 12, 2024

mhofman left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman commented Feb 13, 2024

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

turadg commented Feb 10, 2024 •

edited

Loading

mhofman left a comment •

edited

Loading