-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsq_to_file: cleanup #1117
nsq_to_file: cleanup #1117
Conversation
(individual commits are probably easier to review than the combined diff) |
5805d68
to
5c1c4eb
Compare
7c316a2
to
3cbdbe4
Compare
I think I'm done here, still passes @mccutchen's nice "jepsen lite" test program. |
All looks pretty good to me. I guess you probably plan to squash all commits down before merge, but otherwise I see some runs of commits that could be squashed together, leaving 5 or so commits in the sequence that could be merged. |
Hah, I wasn't. So I'm all ears to how you think this should be squashed. |
If you want my 2c on how to squash, sure ;) Leave as-is:
FileLogger refactor commits could be squashed:
FileLogger cleanup/rename commits could be squashed:
Leave as-is:
Gzip/Sync commits could be squashed:
Leave as-is:
Finally, logging commits at the end can be squashed |
This changes the structure of output files to be continuous GZIP streams rather than concatenated GZIP streams. This is likely slightly more compatible and expected.
For ordering correctness, this ensures that any pending GZIP data is written _before_ the fsync, but is unlikely to matter under normal operation. Also check and exit on errors for file-related operations in Close()
bea47f3
to
2feed06
Compare
squashed 💪 |
f.writer = f.gzipWriter | ||
} else { | ||
err = f.out.Sync() | ||
err := f.gzipWriter.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i'm 👎on this switch, but I acknowledge that it is possible to encounter unexpected issues with a stream of indipendent gzip chunks (i.e. equiv to cat file1.gz file2.gz
).
Without separate gzip streams, recovering from a corrupt file (kill -9 w/ partial data written but not sync'd) becomes problematic.
I think it comes down to the fact that Flush does not write a gzip footer which has the checksum. Close does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I’m misunderstanding something, the recoverable section would be equivalent in either case (bound by the configured sync interval)?
You’re suggesting that by writing out closing footers somehow more data would be recoverable? Given gzip’s streamable quality, I assume that a complete block (regardless of a closed stream) would be decompressable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that a complete block (regardless of a closed stream) would be decompressable?
decompressable yes, error checkable no. The footer contains the final crc for a stream which you don't get from a sync()
https://www.forensicswiki.org/wiki/Gzip#File_footer
recoverability
With separate streams for each logical sync, I've found it easy to step through the streams of a file (using compress/gzip.Reader.Multistream
) and use the valid streams to recover a corrupt file. When you don't have a stream boundary recovery is less precise because you can't disambiguate between written but not sync'd data and written and sync'd data. That increases the risk of including records that were never ack'd back to nsqd which isn't desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that feels like a strong argument to revert, thanks.
I've long been frustrated with
nsq_to_file
's codebase, and didn't want to schlep it off onto @mccutchen in #1110, so here goes.