nsq_to_file: cleanup #1117

mreiferson · 2019-01-03T06:30:42Z

I've long been frustrated with nsq_to_file's codebase, and didn't want to schlep it off onto @mccutchen in #1110, so here goes.

mreiferson · 2019-01-03T06:40:08Z

(individual commits are probably easier to review than the combined diff)

mreiferson · 2019-01-04T00:48:34Z

Fixes #880 in 3cbdbe4

mreiferson · 2019-01-04T01:34:44Z

I think I'm done here, still passes @mccutchen's nice "jepsen lite" test program.

apps/nsq_to_file/topic_discoverer.go

ploxiln · 2019-01-04T18:45:52Z

All looks pretty good to me.

I guess you probably plan to squash all commits down before merge, but otherwise I see some runs of commits that could be squashed together, leaving 5 or so commits in the sequence that could be merged.

mreiferson · 2019-01-04T19:02:15Z

I guess you probably plan to squash all commits down before merge

Hah, I wasn't. So I'm all ears to how you think this should be squashed.

mreiferson · 2019-01-04T19:03:23Z

@jehiah you might be particularly interested in deb7c33

ploxiln · 2019-01-04T20:30:11Z

If you want my 2c on how to squash, sure ;)

Leave as-is:

nsq_to_file: move to go-options; pass Options struct around
nsq_to_file: cleanup startup

FileLogger refactor commits could be squashed:

nsq_to_file: remove redundant closing var
nsq_to_file: cleanup variable names/cfl responsibilities
nsq_to_file: drop ConsumerFileLogger; refactor FileLogger

FileLogger cleanup/rename commits could be squashed:

nsq_to_file: un-indent FileLogger.Close()
nsq_to_file: s/atomicRename/exclusiveRename
nsq_to_file: drop superfluous makeOutputPath
nsq_to_file: cleanup FileLogger var names
nsq_to_file: dead return; whitespace cleanup
nsq_to_file: s/needsFileRotate/needsRotation
nsq_to_file: s/makeOutputDir/makeDirFromPath

Leave as-is:

nsq_to_file: don't open multiple GZIP streams

Gzip/Sync commits could be squashed:

nsq_to_file: fsync after GZIP close
nsq_to_file: check and exit on errors for file-related operations

Leave as-is:

nsq_to_file: refactor confusing Write interface

Finally, logging commits at the end can be squashed

This changes the structure of output files to be continuous GZIP streams rather than concatenated GZIP streams. This is likely slightly more compatible and expected.

For ordering correctness, this ensures that any pending GZIP data is written _before_ the fsync, but is unlikely to matter under normal operation. Also check and exit on errors for file-related operations in Close()

mreiferson · 2019-01-04T21:28:49Z

squashed 💪

jehiah · 2019-01-05T02:01:21Z

apps/nsq_to_file/file_logger.go

-		f.writer = f.gzipWriter
-	} else {
-		err = f.out.Sync()
+		err := f.gzipWriter.Flush()


I think i'm 👎on this switch, but I acknowledge that it is possible to encounter unexpected issues with a stream of indipendent gzip chunks (i.e. equiv to cat file1.gz file2.gz).

Without separate gzip streams, recovering from a corrupt file (kill -9 w/ partial data written but not sync'd) becomes problematic.

I think it comes down to the fact that Flush does not write a gzip footer which has the checksum. Close does.

Unless I’m misunderstanding something, the recoverable section would be equivalent in either case (bound by the configured sync interval)?

You’re suggesting that by writing out closing footers somehow more data would be recoverable? Given gzip’s streamable quality, I assume that a complete block (regardless of a closed stream) would be decompressable?

I assume that a complete block (regardless of a closed stream) would be decompressable?

decompressable yes, error checkable no. The footer contains the final crc for a stream which you don't get from a sync()

https://www.forensicswiki.org/wiki/Gzip#File_footer

recoverability

With separate streams for each logical sync, I've found it easy to step through the streams of a file (using compress/gzip.Reader.Multistream) and use the valid streams to recover a corrupt file. When you don't have a stream boundary recovery is less precise because you can't disambiguate between written but not sync'd data and written and sync'd data. That increases the risk of including records that were never ack'd back to nsqd which isn't desirable.

Nice, that feels like a strong argument to revert, thanks.

mreiferson added the chore label Jan 3, 2019

mreiferson force-pushed the nsq_to_file-cleanup branch 2 times, most recently from 5805d68 to 5c1c4eb Compare January 3, 2019 18:20

mreiferson force-pushed the nsq_to_file-cleanup branch 2 times, most recently from 7c316a2 to 3cbdbe4 Compare January 4, 2019 00:57

This was referenced Jan 4, 2019

nsq_to_file: updates #802

Closed

nsq_to_file: log less (or support a "quiet mode") #880

Closed

ploxiln reviewed Jan 4, 2019

View reviewed changes

apps/nsq_to_file/topic_discoverer.go Outdated Show resolved Hide resolved

mreiferson added 6 commits January 4, 2019 13:28

nsq_to_file: move to go-options; pass Options struct around

b052d95

nsq_to_file: refactor FileLogger; drop ConsumerFileLogger

64c8c24

nsq_to_file: don't open multiple GZIP streams

5ea1012

This changes the structure of output files to be continuous GZIP streams rather than concatenated GZIP streams. This is likely slightly more compatible and expected.

nsq_to_file: fsync _after_ GZIP close

c154eba

For ordering correctness, this ensures that any pending GZIP data is written _before_ the fsync, but is unlikely to matter under normal operation. Also check and exit on errors for file-related operations in Close()

nsq_to_file: add --log-level and --log-prefix

51470ca

nsq_to_file: fatally exit on unknown non-flag args

2feed06

mreiferson force-pushed the nsq_to_file-cleanup branch from bea47f3 to 2feed06 Compare January 4, 2019 21:28

ploxiln approved these changes Jan 4, 2019

View reviewed changes

mreiferson merged commit cf4a8c8 into nsqio:master Jan 5, 2019

mreiferson deleted the nsq_to_file-cleanup branch January 5, 2019 00:02

jehiah reviewed Jan 5, 2019

View reviewed changes

ploxiln mentioned this pull request Jan 6, 2019

nsq_to_file: revert gzip behavior change #1120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsq_to_file: cleanup #1117

nsq_to_file: cleanup #1117

mreiferson commented Jan 3, 2019 •

edited

Loading

mreiferson commented Jan 3, 2019

mreiferson commented Jan 4, 2019 •

edited

Loading

mreiferson commented Jan 4, 2019

ploxiln commented Jan 4, 2019

mreiferson commented Jan 4, 2019

mreiferson commented Jan 4, 2019

ploxiln commented Jan 4, 2019

mreiferson commented Jan 4, 2019

jehiah Jan 5, 2019

mreiferson Jan 5, 2019

jehiah Jan 5, 2019 •

edited

Loading

mreiferson Jan 7, 2019

nsq_to_file: cleanup #1117

nsq_to_file: cleanup #1117

Conversation

mreiferson commented Jan 3, 2019 • edited Loading

mreiferson commented Jan 3, 2019

mreiferson commented Jan 4, 2019 • edited Loading

mreiferson commented Jan 4, 2019

ploxiln commented Jan 4, 2019

mreiferson commented Jan 4, 2019

mreiferson commented Jan 4, 2019

ploxiln commented Jan 4, 2019

mreiferson commented Jan 4, 2019

jehiah Jan 5, 2019

Choose a reason for hiding this comment

mreiferson Jan 5, 2019

Choose a reason for hiding this comment

jehiah Jan 5, 2019 • edited Loading

Choose a reason for hiding this comment

recoverability

mreiferson Jan 7, 2019

Choose a reason for hiding this comment

mreiferson commented Jan 3, 2019 •

edited

Loading

mreiferson commented Jan 4, 2019 •

edited

Loading

jehiah Jan 5, 2019 •

edited

Loading