undefined method '<' for nil:NilClass #1589

chrisrecalis · 2017-06-01T23:06:16Z

Problem

Consistently getting the following error undefined method '<' for nil:NilClass inside lib/fluent/plugin/output.rb. From first glance it looks as though the meta data timekey is nil somehow. Fluent then backs up and is unable to move on and flush anymore buffers. I'm unable to tell which chunk is causing this

Configuration

fluentd version: 0.14.17

<source>
    @type forward
    bind 0.0.0.0
    port 24284
</source>

<match account_*.event>
    @type bigquery

    json_key "#{ENV['BIGQUERY_JSON_KEY']}"
    project_id "#{ENV['BIGQUERY_PROJECT_ID']}"
    dataset ${tag_parts[0]}

    time_partitioning_type DAY

    num_threads 4

    <tabletracker>
        key "#{ENV['TABLE_TRACKER_KEY'] || 'name' }"
        include_key "#{ENV['TABLE_TRACKER_INCLUDE_KEY'] || 'false' }"
    </tabletracker>

    <buffer tag,time>
        @type file
        path /var/log/cruncher/
        chunk_limit_size 300000000 
        timekey "#{ENV['TIME_KEY'] || '15m' }"
        flush_mode "#{ENV['FLUSH_MODE'] || 'default' }"
    </buffer>
</match>

type bigquery is a custom plugin which inherits from Fluent::Plugin::Output

Error Logs

#0 fluent/engine.rb:218:run: starting fluentd worker pid=10 ppid=1 worker=0
2017-06-01 21:42:07 +0000 [debug]: #0 plugin/buffer.rb:114:start: buffer started instance=26110880 stage_size=2104194374 queue_size=0
2017-06-01 21:42:07 +0000 [info]: #0 plugin/in_forward.rb:157:start: listening port port=24284 bind="0.0.0.0"
2017-06-01 21:42:07 +0000 [info]: #0 fluent/engine.rb:227:run: fluentd worker is now running worker=0
2017-06-01 21:42:07 +0000 [debug]: #0 plugin/output.rb:1207:enqueue_thread_run: enqueue_thread actually running
2017-06-01 21:42:07 +0000 [error]: #0 plugin/output.rb:1235:rescue in enqueue_thread_run: unexpected error while checking flushed chunks. ignored. error_class=NoMethodError error="undefined method `<' for nil:NilClass"
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/output.rb:1231:in `block in enqueue_thread_run'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:355:in `block (2 levels) in enqueue_all'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:353:in `each'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:353:in `block in enqueue_all'
  2017-06-01 21:42:07 +0000 [error]: #0 /usr/local/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:351:in `enqueue_all'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/output.rb:1231:in `enqueue_thread_run'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2017-06-01 21:42:07 +0000 [debug]: #0 plugin/output.rb:1259:flush_thread_run: flush_thread actually running

The text was updated successfully, but these errors were encountered:

chrisrecalis · 2017-06-02T13:56:18Z

After some investigation I have a few meta files that are empty. I used the following to find them

Dir.glob('/var/log/cruncher/*.meta') do |meta_file|
    begin
        f = MessagePack.unpack(File.read(meta_file))
        if f["timekey"].nil?
            puts "BAD METAFILE"
            puts meta_file
        end
    rescue Exception => e
        puts "BAD METAFILE"
        puts meta_file
    end
end

I am unsure how it got in this state where the meta is empty

repeatedly · 2017-06-03T17:47:36Z

Thanks for the report.
How much traffic do you send?
I want to reproduce this problem on my side.

And is your setup only v0.14?

chrisrecalis · 2017-06-05T01:37:14Z

Yup we only use v0.14 and do around 20~40GB a day. This bug actually showed up on version 0.14.9 we tried upgrading to 0.14.17 to see if it would go away before investing time to find the issue. There was in total 6 blank meta files across 3 different servers. The odd thing about this is that they showed up all within 5 mins of each other.

RubenHoms · 2017-07-03T08:23:26Z

Can confirm that I'm seeing the same issue on my production servers. We're doing a similar amount of traffic as @chrisrecalis (around ~30GB a day). However I'm having a lot more bad metafiles than him, 65516 of them to be exact. (which for some reason is nearly the same as the open file limit which is 65536) I'm having some buffering issues with the open file limit being reached but I'll open another issue for that, maybe it is somehow related to this issue.

EDIT: In fact, after I remove all the .log and .meta files in the buffer directory and restarting td-agent the error keeps appearing in the logs.

repeatedly · 2017-07-13T00:24:20Z

I continue to investigate this problem but the problem is not reproduced on my environment.
So I want more information for fixing this bug.

traffic and data pattern, size, the number of tags, one record size
actual configuration like chrisrecalis case
the content of broken meta file. 0 byte or broken msgpack value?
Are there other errors in the log?
used plugin and fluentd version

repeatedly · 2017-07-13T09:51:51Z

Does anyone test #1628 patch for this case?

repeatedly · 2017-07-21T05:33:30Z

I released v0.14.20.rc1 with #1628 patch.
You can test this version by typing fluent-gem install fluentd --prerelease.

repeatedly · 2017-08-04T07:20:37Z

We released v0.14.20 with above patch and build new docker image.
I want the feedback v0.14.20 resolves this problem or not.

chrisrecalis · 2017-08-04T11:43:50Z

Thanks @repeatedly! We just updated our deployment yesterday to v0.14.20 I'll let you know if this issue comes up again or not.

Side Note: we did get the empty meta file again on v0.14.9 it seems as though it happens when the server is abruptly shutdown and doesn't give fluentd the necessary time to gracefully terminate

chrisrecalis · 2018-03-23T17:52:42Z

Sorry for not checking back in quickly. We've been running v0.14.20 since November 2017 and haven't had this issue come up again. I'm going to close this issue as it has been resolved for us

repeatedly self-assigned this Jun 3, 2017

repeatedly added v0.14 bug Something isn't working labels Jun 3, 2017

RubenHoms mentioned this issue Jul 3, 2017

Fluentd open file limit reached while processing buffered files #1612

Closed

repeatedly mentioned this issue Jul 13, 2017

unexpected error while checking flushed chunks fluent/fluent-plugin-s3#155

Closed

tagomoris mentioned this issue Jul 13, 2017

Write buffer metadata into file ASAP #1628

Merged

chrisrecalis closed this as completed Mar 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

undefined method '<' for nil:NilClass #1589

undefined method '<' for nil:NilClass #1589

chrisrecalis commented Jun 1, 2017

chrisrecalis commented Jun 2, 2017

repeatedly commented Jun 3, 2017 •

edited

Loading

chrisrecalis commented Jun 5, 2017

RubenHoms commented Jul 3, 2017 •

edited

Loading

repeatedly commented Jul 13, 2017

repeatedly commented Jul 13, 2017

repeatedly commented Jul 21, 2017

repeatedly commented Aug 4, 2017

chrisrecalis commented Aug 4, 2017

chrisrecalis commented Mar 23, 2018

undefined method '<' for nil:NilClass #1589

undefined method '<' for nil:NilClass #1589

Comments

chrisrecalis commented Jun 1, 2017

Problem

Configuration

Error Logs

chrisrecalis commented Jun 2, 2017

repeatedly commented Jun 3, 2017 • edited Loading

chrisrecalis commented Jun 5, 2017

RubenHoms commented Jul 3, 2017 • edited Loading

repeatedly commented Jul 13, 2017

repeatedly commented Jul 13, 2017

repeatedly commented Jul 21, 2017

repeatedly commented Aug 4, 2017

chrisrecalis commented Aug 4, 2017

chrisrecalis commented Mar 23, 2018

repeatedly commented Jun 3, 2017 •

edited

Loading

RubenHoms commented Jul 3, 2017 •

edited

Loading