in_tail: add `open_on_every_update` setting / support for UTF-16 and UTF-32 #1409

moriyoshi · 2017-01-08T20:28:21Z

When do we find this useful?

On Windows, the tail target may well be opened in "exclusive" sharing mode, where no other processes are allowed to open it. This patch adds an option called open_on_every_update, which enables the plugin to tail such a target by letting the plugin open it only when it gets updated.
In the current line splitter code, lookup for end-of-line characters is always performed byte-wise, which means it cannot deal with the file in an encoding whose minimum encoded unit consist of more than a byte (namely UTF-16(LE|BE), UCS-2(LE|BE) and UTF-32(LE|BE)). This patch removes such a limitation.

Known problem(s)

Access to the file by a process that tries to exclusively open it is effectively prevented by any preceding process that opens it in any mode (https://msdn.microsoft.com/library/windows/desktop/aa363874). For getting the things to work well, the implementation simply lays down a premise that the application writing the log to the target file would try to open and write to it later again when it fails to open it.

Why did you end up with a single patch addressing two different issues?

First I thought of splitting it into two, but the parts those patches would modify overlaps each other and it wouldn't make much sense to me if I had just one of them merged.

…sive sharing mode.

moriyoshi · 2017-01-10T05:42:22Z

It seems some builds on travis-ci were stuck. Could anyone please rebuild them? I haven't got a modification permission on CI.

tagomoris · 2017-01-10T06:12:25Z

I cannot understand why open_on_every_update works. It seems to work well to open files in shared mode via FileWrapper. Does it have any other positive thing?

Another question: What's happen when someone opens the file tailed by in_tail with open_on_every_update in exclusive sharing mode? Does it crash?

moriyoshi · 2017-01-10T06:43:51Z

I cannot understand why open_on_every_update works. It seems to work well to open files in shared mode via FileWrapper.

Generally, if a file is opened by a process with neither of sharing modes (FILE_SHARE_READ and FILE_SHARE_WRITE) specified, the file cannot be opened by other processes. Inversely, even if a file is opened by a process with some sharing modes specified, it cannot be opened by another process that tries to open it with no sharing modes specified because the process requires exclusive access to it.

FileWrapper opens the given file R/W shared (FILE_SHARE_READ and FILE_SHARE_WRITE). If another process tries to open the target file for exclusive access that has already been opened by fluentd, the attempt simply fails. If fluentd leaves the file opened, the other process will have no chance to open it. open_on_every_update instructs fluentd to open the target file only when it gets updated so that such other process can touch it. That's how it works.

tagomoris

The feature added looks reasonable.
Please add some test cases which includes newlines (2 or more events in file) to test non-"\n" EOLs.

tagomoris · 2017-01-11T05:14:57Z

lib/fluent/plugin/in_tail.rb

+
+        def next_line
+          idx = @buffer.index(@eol)
+          convert(@buffer.slice!(0, idx + 1)) unless idx.nil?


This line should be @buffer.slice!(0, idx + @eol.bytesize)

No, this is correct. @buffer is searched character-wise.

I tried this code:

"\n".encode("utf-16").size #=> 2

When encoded in Encoding::UTF_16, a BOM is prepended to the resulting string.

irb(main):006:0> "\n".encode("utf-16").size => 2 irb(main):007:0> "\n".encode("utf-16le").size => 1 irb(main):008:0> "\n".encode("utf-16")[0] => "\uFEFF"

Ahh, I've forgotten about BOMs. Sorry.

tagomoris · 2017-01-11T05:27:36Z

lib/fluent/plugin/in_tail.rb

        pe # This pe will be updated in on_rotate after TailWatcher is initialized
      end

      class StatWatcher < Coolio::StatWatcher
-        def initialize(path, log, &callback)
+        def initialize(watcher, &callback)


This class just refers watcher.path and watcher.log, and doesn't refer any other attributes of watcher.
What is this change for?

Other helper classes needed a similar change for constructor parameters, and it made sense for me to apply the same policy to StatWatcher as well.

tagomoris · 2017-01-11T05:29:02Z

lib/fluent/plugin/in_tail.rb

+          convert(@buffer.slice!(0, idx + 1)) unless idx.nil?
+        end
+
+        def size


Use bytesize for bytesize.

I'll fix this.

tagomoris · 2017-01-11T05:39:11Z

lib/fluent/plugin/in_tail.rb

+              if !io.nil? && @lines.empty?
+                begin
+                  while true
+                    @fifo << io.readpartial(2048, @iobuf)


Are there any reasons to use @iobuf?

That's exactly what I was wondering about. I just did so as the original code does.

tagomoris · 2017-01-11T05:45:01Z

lib/fluent/plugin/in_tail.rb

        end

        def on_change(prev, cur)
          @callback.call
        rescue
          # TODO log?
-          @log.error $!.to_s
-          @log.error_backtrace
+          @watcher.log.error $!.to_s


Please add any error messages to make us known where the error occurs, and use error "message", error: e style logging instead of $!.

It's just an adoption for the changes to the instance attributes. Totally agreed but I'd open the separate issue for this.

tagomoris · 2017-01-11T05:47:04Z

lib/fluent/plugin/in_tail.rb

+
+        attr_reader :from_encoding, :encoding, :buffer
+
+        def <<(chunk)


Is the argument chunk object reused in anywhere? (it's related with my comment about using @iobuf).
If no, there seems no reason to re-do force_encoding(orig_encoding).

Uh, this is the weirdest part. As IO.readpartial(len, buf) returns the same object as the second argument, the encoding of the reused buffer (@iobuf) eventually becomes the same encoding as that specfied to chunk.force_encoding(). Reverting back to the original encoding is necessary because for unknown reasons an assertion failure occurs for older versions of Ruby when a buffer with any double-byte or quad-byte encoding (UTF-16 / UCS-2 / UTF-32) is given to readpartial() as the second argument.

Try this:

a = open('/dev/urandom') buf = ''.force_encoding(Encoding::UTF_16BE) a.readpartial(2048, buf)

Okay - we need code comment for that here :)

tagomoris · 2017-01-11T05:53:51Z

lib/fluent/plugin/in_tail.rb

+              yield @io
+            end
+          rescue
+            @watcher.log.error $!.to_s


Add error message to find where the error occurs (and use rescue => e and error: e to dump error objects).

tagomoris · 2017-01-11T05:55:50Z

lib/fluent/plugin/in_tail.rb

            end
            @inode = inode
            @fsize = fsize
          end

        rescue
-          @log.error $!.to_s
-          @log.error_backtrace
+          @watcher.log.error $!.to_s


repeatedly · 2017-01-11T06:10:34Z

lib/fluent/plugin/in_tail.rb

@@ -410,7 +398,7 @@ def parse_multilines(lines, tail_watcher)
    end

    class TailWatcher
-      def initialize(path, rotate_wait, pe, log, read_from_head, enable_watch_timer, read_lines_limit, update_watcher, line_buffer_timer_flusher, &receive_lines)
+      def initialize(path, rotate_wait, pe, log, read_from_head, enable_watch_timer, read_lines_limit, update_watcher, line_buffer_timer_flusher, from_encoding, encoding, open_on_every_update, &receive_lines)


I think time to add TailWatcherSetting like object to pass parameters.
Hard to maintain these long arguments.

Absolutely, I hate these long arguments!

Points were fixed and newlines are not problem.

tagomoris · 2017-01-16T00:34:59Z

LGTM.

repeatedly · 2017-01-16T03:53:25Z

lib/fluent/plugin/in_tail.rb

+          if @from_encoding == @encoding
+            s
+          else
+            c = Encoding::Converter.new(@from_encoding, @encoding)


Cannot pre-create this converter in initialize?
Converter#finish doesn't reset internal state?

There seems to be no such descriptions in the document that indicate the method resets the internal state of a converter. It would definitely be better if we could do so.

Currently, in_tail sometimes consumes lots of CPU and it causes high pressure on application server.
So I want to avoid more CPU usage with typical cases if possible.

@nurse How about this point?

Encoding::Converter#finish just finishes the converter; it doesn't reset the state. And there's no way to recycle the converter.

For this code, it should be s.encode(@encoding, @from_encoding)

repeatedly · 2017-01-16T04:05:44Z

lib/fluent/plugin/in_tail.rb

            inode = stat.ino
            if inode == @pe.read_inode # truncated
-              @pe.update_pos(stat.size)
-              io_handler = IOHandler.new(io, @pe, @log, @read_lines_limit, &method(:wrap_receive_lines))
+              @pe.update_pos(0)


0 instead of stat.size is correct?
What happen when file is truncated to non-zero size?

The current behavior is somewhat strange. Because the file position of io, which must have been newly opened by the caller, isn't updated while the position is updated to stat.size here, IOHandler later starts reading from the head anyway.

The new code tries to update the file handle's position to that stored in @pe before reading from the target file, so the position has to be set to 0 in order to preserve the behavior.

If this is a bug, it needs to be corrected to @pe.update_pos(stat.size).

…use the explicit converter, which I don't remember clearly.

repeatedly · 2017-01-23T06:40:30Z

LGTM

moriyoshi · 2017-01-27T00:37:19Z

Sorry, I overlooked the notice. Thank you all for looking into this!

moriyoshi changed the title ~~Add open_on_every_update setting / support for UTF-16 and UTF-32~~ in_tail: add open_on_every_update setting / support for UTF-16 and UTF-32 Jan 8, 2017

Add open_on_every_update setting for tailing a file opened in exclu…

cdf7ee1

…sive sharing mode.

moriyoshi force-pushed the moriyoshi/tail-open-on-every-update branch from 394c8ad to cdf7ee1 Compare January 9, 2017 19:08

tagomoris previously requested changes Jan 11, 2017

View reviewed changes

repeatedly reviewed Jan 11, 2017

View reviewed changes

moriyoshi added 2 commits January 14, 2017 11:36

size -> bytesize

a7bb9ee

Add comments as per the request.

6ceb5c7

repeatedly reviewed Jan 16, 2017

View reviewed changes

Use s.encode() instead of Encoding::Converter; there was a reason to …

bd89bb5

…use the explicit converter, which I don't remember clearly.

repeatedly merged commit 26ab1bd into fluent:master Jan 23, 2017

moriyoshi deleted the moriyoshi/tail-open-on-every-update branch January 27, 2017 00:37

in_tail: add open_on_every_update setting / support for UTF-16 and UTF-32 #1409

in_tail: add open_on_every_update setting / support for UTF-16 and UTF-32 #1409

Conversation

moriyoshi commented Jan 8, 2017 • edited Loading

When do we find this useful?

Known problem(s)

Why did you end up with a single patch addressing two different issues?

moriyoshi commented Jan 10, 2017 • edited Loading

tagomoris commented Jan 10, 2017 • edited Loading

moriyoshi commented Jan 10, 2017 • edited Loading

tagomoris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moriyoshi Jan 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tagomoris commented Jan 16, 2017

Choose a reason for hiding this comment

moriyoshi Jan 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

repeatedly commented Jan 23, 2017

moriyoshi commented Jan 27, 2017

in_tail: add `open_on_every_update` setting / support for UTF-16 and UTF-32 #1409

in_tail: add `open_on_every_update` setting / support for UTF-16 and UTF-32 #1409

moriyoshi commented Jan 8, 2017 •

edited

Loading

moriyoshi commented Jan 10, 2017 •

edited

Loading

tagomoris commented Jan 10, 2017 •

edited

Loading

moriyoshi commented Jan 10, 2017 •

edited

Loading

moriyoshi Jan 13, 2017 •

edited

Loading

moriyoshi Jan 16, 2017 •

edited

Loading