Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined method '<' for nil:NilClass #1589

Closed
chrisrecalis opened this issue Jun 1, 2017 · 10 comments
Closed

undefined method '<' for nil:NilClass #1589

chrisrecalis opened this issue Jun 1, 2017 · 10 comments
Assignees
Labels
bug Something isn't working v0.14

Comments

@chrisrecalis
Copy link

Problem

Consistently getting the following error undefined method '<' for nil:NilClass inside lib/fluent/plugin/output.rb. From first glance it looks as though the meta data timekey is nil somehow. Fluent then backs up and is unable to move on and flush anymore buffers. I'm unable to tell which chunk is causing this

Configuration

fluentd version: 0.14.17

<source>
    @type forward
    bind 0.0.0.0
    port 24284
</source>

<match account_*.event>
    @type bigquery

    json_key "#{ENV['BIGQUERY_JSON_KEY']}"
    project_id "#{ENV['BIGQUERY_PROJECT_ID']}"
    dataset ${tag_parts[0]}

    time_partitioning_type DAY

    num_threads 4

    <tabletracker>
        key "#{ENV['TABLE_TRACKER_KEY'] || 'name' }"
        include_key "#{ENV['TABLE_TRACKER_INCLUDE_KEY'] || 'false' }"
    </tabletracker>

    <buffer tag,time>
        @type file
        path /var/log/cruncher/
        chunk_limit_size 300000000 
        timekey "#{ENV['TIME_KEY'] || '15m' }"
        flush_mode "#{ENV['FLUSH_MODE'] || 'default' }"
    </buffer>
</match>

type bigquery is a custom plugin which inherits from Fluent::Plugin::Output

Error Logs

#0 fluent/engine.rb:218:run: starting fluentd worker pid=10 ppid=1 worker=0
2017-06-01 21:42:07 +0000 [debug]: #0 plugin/buffer.rb:114:start: buffer started instance=26110880 stage_size=2104194374 queue_size=0
2017-06-01 21:42:07 +0000 [info]: #0 plugin/in_forward.rb:157:start: listening port port=24284 bind="0.0.0.0"
2017-06-01 21:42:07 +0000 [info]: #0 fluent/engine.rb:227:run: fluentd worker is now running worker=0
2017-06-01 21:42:07 +0000 [debug]: #0 plugin/output.rb:1207:enqueue_thread_run: enqueue_thread actually running
2017-06-01 21:42:07 +0000 [error]: #0 plugin/output.rb:1235:rescue in enqueue_thread_run: unexpected error while checking flushed chunks. ignored. error_class=NoMethodError error="undefined method `<' for nil:NilClass"
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/output.rb:1231:in `block in enqueue_thread_run'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:355:in `block (2 levels) in enqueue_all'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:353:in `each'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:353:in `block in enqueue_all'
  2017-06-01 21:42:07 +0000 [error]: #0 /usr/local/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/buffer.rb:351:in `enqueue_all'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/output.rb:1231:in `enqueue_thread_run'
  2017-06-01 21:42:07 +0000 [error]: #0 /opt/cruncher/vendor/bundle/ruby/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2017-06-01 21:42:07 +0000 [debug]: #0 plugin/output.rb:1259:flush_thread_run: flush_thread actually running
@chrisrecalis
Copy link
Author

After some investigation I have a few meta files that are empty. I used the following to find them

Dir.glob('/var/log/cruncher/*.meta') do |meta_file|
    begin
        f = MessagePack.unpack(File.read(meta_file))
        if f["timekey"].nil?
            puts "BAD METAFILE"
            puts meta_file
        end
    rescue Exception => e
        puts "BAD METAFILE"
        puts meta_file
    end
end

I am unsure how it got in this state where the meta is empty

@repeatedly
Copy link
Member

repeatedly commented Jun 3, 2017

Thanks for the report.
How much traffic do you send?
I want to reproduce this problem on my side.

And is your setup only v0.14?

@repeatedly repeatedly self-assigned this Jun 3, 2017
@repeatedly repeatedly added v0.14 bug Something isn't working labels Jun 3, 2017
@chrisrecalis
Copy link
Author

Yup we only use v0.14 and do around 20~40GB a day. This bug actually showed up on version 0.14.9 we tried upgrading to 0.14.17 to see if it would go away before investing time to find the issue. There was in total 6 blank meta files across 3 different servers. The odd thing about this is that they showed up all within 5 mins of each other.

@RubenHoms
Copy link

RubenHoms commented Jul 3, 2017

Can confirm that I'm seeing the same issue on my production servers. We're doing a similar amount of traffic as @chrisrecalis (around ~30GB a day). However I'm having a lot more bad metafiles than him, 65516 of them to be exact. (which for some reason is nearly the same as the open file limit which is 65536) I'm having some buffering issues with the open file limit being reached but I'll open another issue for that, maybe it is somehow related to this issue.

EDIT: In fact, after I remove all the .log and .meta files in the buffer directory and restarting td-agent the error keeps appearing in the logs.

@repeatedly
Copy link
Member

I continue to investigate this problem but the problem is not reproduced on my environment.
So I want more information for fixing this bug.

  • traffic and data pattern, size, the number of tags, one record size
  • actual configuration like chrisrecalis case
  • the content of broken meta file. 0 byte or broken msgpack value?
  • Are there other errors in the log?
  • used plugin and fluentd version

@repeatedly
Copy link
Member

Does anyone test #1628 patch for this case?

@repeatedly
Copy link
Member

I released v0.14.20.rc1 with #1628 patch.
You can test this version by typing fluent-gem install fluentd --prerelease.

@repeatedly
Copy link
Member

We released v0.14.20 with above patch and build new docker image.
I want the feedback v0.14.20 resolves this problem or not.

@chrisrecalis
Copy link
Author

Thanks @repeatedly! We just updated our deployment yesterday to v0.14.20 I'll let you know if this issue comes up again or not.

Side Note: we did get the empty meta file again on v0.14.9 it seems as though it happens when the server is abruptly shutdown and doesn't give fluentd the necessary time to gracefully terminate

@chrisrecalis
Copy link
Author

Sorry for not checking back in quickly. We've been running v0.14.20 since November 2017 and haven't had this issue come up again. I'm going to close this issue as it has been resolved for us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v0.14
Projects
None yet
Development

No branches or pull requests

3 participants