Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: add open_on_every_update setting / support for UTF-16 and UTF-32 #1409

Merged

Conversation

moriyoshi
Copy link
Member

@moriyoshi moriyoshi commented Jan 8, 2017

When do we find this useful?

  • On Windows, the tail target may well be opened in "exclusive" sharing mode, where no other processes are allowed to open it. This patch adds an option called open_on_every_update, which enables the plugin to tail such a target by letting the plugin open it only when it gets updated.
  • In the current line splitter code, lookup for end-of-line characters is always performed byte-wise, which means it cannot deal with the file in an encoding whose minimum encoded unit consist of more than a byte (namely UTF-16(LE|BE), UCS-2(LE|BE) and UTF-32(LE|BE)). This patch removes such a limitation.

Known problem(s)

  • Access to the file by a process that tries to exclusively open it is effectively prevented by any preceding process that opens it in any mode (https://msdn.microsoft.com/library/windows/desktop/aa363874). For getting the things to work well, the implementation simply lays down a premise that the application writing the log to the target file would try to open and write to it later again when it fails to open it.

Why did you end up with a single patch addressing two different issues?

  • First I thought of splitting it into two, but the parts those patches would modify overlaps each other and it wouldn't make much sense to me if I had just one of them merged.

@moriyoshi moriyoshi changed the title Add open_on_every_update setting / support for UTF-16 and UTF-32 in_tail: add open_on_every_update setting / support for UTF-16 and UTF-32 Jan 8, 2017
@moriyoshi moriyoshi force-pushed the moriyoshi/tail-open-on-every-update branch from 394c8ad to cdf7ee1 Compare January 9, 2017 19:08
@moriyoshi
Copy link
Member Author

moriyoshi commented Jan 10, 2017

It seems some builds on travis-ci were stuck. Could anyone please rebuild them? I haven't got a modification permission on CI.

@tagomoris
Copy link
Member

tagomoris commented Jan 10, 2017

I cannot understand why open_on_every_update works. It seems to work well to open files in shared mode via FileWrapper. Does it have any other positive thing?

Another question: What's happen when someone opens the file tailed by in_tail with open_on_every_update in exclusive sharing mode? Does it crash?

@moriyoshi
Copy link
Member Author

moriyoshi commented Jan 10, 2017

I cannot understand why open_on_every_update works. It seems to work well to open files in shared mode via FileWrapper.

Generally, if a file is opened by a process with neither of sharing modes (FILE_SHARE_READ and FILE_SHARE_WRITE) specified, the file cannot be opened by other processes. Inversely, even if a file is opened by a process with some sharing modes specified, it cannot be opened by another process that tries to open it with no sharing modes specified because the process requires exclusive access to it.

FileWrapper opens the given file R/W shared (FILE_SHARE_READ and FILE_SHARE_WRITE). If another process tries to open the target file for exclusive access that has already been opened by fluentd, the attempt simply fails. If fluentd leaves the file opened, the other process will have no chance to open it. open_on_every_update instructs fluentd to open the target file only when it gets updated so that such other process can touch it. That's how it works.

Copy link
Member

@tagomoris tagomoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature added looks reasonable.
Please add some test cases which includes newlines (2 or more events in file) to test non-"\n" EOLs.


def next_line
idx = @buffer.index(@eol)
convert(@buffer.slice!(0, idx + 1)) unless idx.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be @buffer.slice!(0, idx + @eol.bytesize)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is correct. @buffer is searched character-wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this code:

"\n".encode("utf-16").size #=> 2

Copy link
Member Author

@moriyoshi moriyoshi Jan 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When encoded in Encoding::UTF_16, a BOM is prepended to the resulting string.

irb(main):006:0> "\n".encode("utf-16").size
=> 2
irb(main):007:0> "\n".encode("utf-16le").size
=> 1
irb(main):008:0> "\n".encode("utf-16")[0]
=> "\uFEFF"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I've forgotten about BOMs. Sorry.

pe # This pe will be updated in on_rotate after TailWatcher is initialized
end

class StatWatcher < Coolio::StatWatcher
def initialize(path, log, &callback)
def initialize(watcher, &callback)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class just refers watcher.path and watcher.log, and doesn't refer any other attributes of watcher.
What is this change for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other helper classes needed a similar change for constructor parameters, and it made sense for me to apply the same policy to StatWatcher as well.

convert(@buffer.slice!(0, idx + 1)) unless idx.nil?
end

def size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use bytesize for bytesize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

if !io.nil? && @lines.empty?
begin
while true
@fifo << io.readpartial(2048, @iobuf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any reasons to use @iobuf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I was wondering about. I just did so as the original code does.

end

def on_change(prev, cur)
@callback.call
rescue
# TODO log?
@log.error $!.to_s
@log.error_backtrace
@watcher.log.error $!.to_s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add any error messages to make us known where the error occurs, and use error "message", error: e style logging instead of $!.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an adoption for the changes to the instance attributes. Totally agreed but I'd open the separate issue for this.


attr_reader :from_encoding, :encoding, :buffer

def <<(chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the argument chunk object reused in anywhere? (it's related with my comment about using @iobuf).
If no, there seems no reason to re-do force_encoding(orig_encoding).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, this is the weirdest part. As IO.readpartial(len, buf) returns the same object as the second argument, the encoding of the reused buffer (@iobuf) eventually becomes the same encoding as that specfied to chunk.force_encoding(). Reverting back to the original encoding is necessary because for unknown reasons an assertion failure occurs for older versions of Ruby when a buffer with any double-byte or quad-byte encoding (UTF-16 / UCS-2 / UTF-32) is given to readpartial() as the second argument.

Try this:

a = open('/dev/urandom')
buf = ''.force_encoding(Encoding::UTF_16BE)
a.readpartial(2048, buf)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - we need code comment for that here :)

yield @io
end
rescue
@watcher.log.error $!.to_s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error message to find where the error occurs (and use rescue => e and error: e to dump error objects).

end
@inode = inode
@fsize = fsize
end

rescue
@log.error $!.to_s
@log.error_backtrace
@watcher.log.error $!.to_s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -410,7 +398,7 @@ def parse_multilines(lines, tail_watcher)
end

class TailWatcher
def initialize(path, rotate_wait, pe, log, read_from_head, enable_watch_timer, read_lines_limit, update_watcher, line_buffer_timer_flusher, &receive_lines)
def initialize(path, rotate_wait, pe, log, read_from_head, enable_watch_timer, read_lines_limit, update_watcher, line_buffer_timer_flusher, from_encoding, encoding, open_on_every_update, &receive_lines)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think time to add TailWatcherSetting like object to pass parameters.
Hard to maintain these long arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I hate these long arguments!

@tagomoris tagomoris dismissed their stale review January 16, 2017 00:33

Points were fixed and newlines are not problem.

@tagomoris
Copy link
Member

LGTM.

if @from_encoding == @encoding
s
else
c = Encoding::Converter.new(@from_encoding, @encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot pre-create this converter in initialize?
Converter#finish doesn't reset internal state?

Copy link
Member Author

@moriyoshi moriyoshi Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no such descriptions in the document that indicate the method resets the internal state of a converter. It would definitely be better if we could do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in_tail sometimes consumes lots of CPU and it causes high pressure on application server.
So I want to avoid more CPU usage with typical cases if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nurse How about this point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding::Converter#finish just finishes the converter; it doesn't reset the state. And there's no way to recycle the converter.

For this code, it should be s.encode(@encoding, @from_encoding)

inode = stat.ino
if inode == @pe.read_inode # truncated
@pe.update_pos(stat.size)
io_handler = IOHandler.new(io, @pe, @log, @read_lines_limit, &method(:wrap_receive_lines))
@pe.update_pos(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 instead of stat.size is correct?
What happen when file is truncated to non-zero size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is somewhat strange. Because the file position of io, which must have been newly opened by the caller, isn't updated while the position is updated to stat.size here, IOHandler later starts reading from the head anyway.

The new code tries to update the file handle's position to that stored in @pe before reading from the target file, so the position has to be set to 0 in order to preserve the behavior.

If this is a bug, it needs to be corrected to @pe.update_pos(stat.size).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

…use the explicit converter, which I don't remember clearly.
@repeatedly
Copy link
Member

LGTM

@repeatedly repeatedly merged commit 26ab1bd into fluent:master Jan 23, 2017
@moriyoshi
Copy link
Member Author

Sorry, I overlooked the notice. Thank you all for looking into this!

@moriyoshi moriyoshi deleted the moriyoshi/tail-open-on-every-update branch January 27, 2017 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants