Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on EOF, write to sincedb #24

Closed
wants to merge 2 commits into from

Conversation

buckelij
Copy link

I was expecting that if :sincedb_write_interval had passed before the tail was shut down, the sincedb would be up-to-date. In fact, up to :sincedb_write_interval worth of writes will be re-read on restart:

#start
echo a > log.txt
sleep $sincedb_write_interval
echo b > log.txt #causes _sincedb_write
echo c > log.txt
#wait a minute (longer than :sincedb_write_interval)
pkill -f /usr/bin/ruby.*test.rb
#on restart, 'c' will be re-read.

This patch checks if :sincedb_write_interval has passed if at EOF.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 19, 2015

@buckelij any change you could rebase this PR, please ?

@wiibaa wiibaa mentioned this pull request Feb 19, 2015
11 tasks
@buckelij
Copy link
Author

@wiibaa I've rebased.

@jsvd jsvd self-assigned this Feb 22, 2015
@wiibaa
Copy link
Contributor

wiibaa commented Feb 23, 2015

@buckelij in fact your issue report could be simplified to tail should always write sincedb when quiting,no ? Why would you make it dependant of sincedb_interval ?

If you look to logstash-input-file plugin, you can see that sincedb_write is called before calling quit in teardown logic https://github.com/wiibaa/logstash-input-file/blob/master/lib/logstash/inputs/file.rb#L146

@jsvd this a design decision, either tail#quit should itself call sincedb_write or it should be clearly documented that it is the responsbility of the lib user, what do you think?

@buckelij
Copy link
Author

I guess that's true, and should be good enough in most cases. That being
said, the logstash docs say

How often (in seconds) to write a since database with the current
position of monitored log files.
It would be nice if it behaved like that, and I speculate writing on EOF
doesn't cause too much overhead.

On Sunday, February 22, 2015, Philippe Weber notifications@github.com
wrote:

@buckelij https://github.com/buckelij in fact your issue report could
be simplified to tail should always write sincedb when quiting,no ? Why
would you make it dependant of sincedb_interval ?

If you look to logstash-input-file plugin, you can see that
sincedb_write is called before calling quit in teardown logic
https://github.com/wiibaa/logstash-input-file/blob/master/lib/logstash/inputs/file.rb#L146

@jsvd https://github.com/jsvd this a design decision, either tail#quit
should itself call sincedb_write or it should be clearly documented that it
is the responsbility of the lib user, what do you think?


Reply to this email directly or view it on GitHub
#24 (comment)
.

@buckelij
Copy link
Author

(I realize that's maybe a logstash docs issue)

On Sunday, February 22, 2015, Elijah Buck elijah@github.com wrote:

I guess that's true, and should be good enough in most cases. That being
said, the logstash docs say

How often (in seconds) to write a since database with the current
position of monitored log files.
It would be nice if it behaved like that, and I speculate writing on EOF
doesn't cause too much overhead.

On Sunday, February 22, 2015, Philippe Weber <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@buckelij https://github.com/buckelij in fact your issue report could
be simplified to tail should always write sincedb when quiting,no ?
Why would you make it dependant of sincedb_interval ?

If you look to logstash-input-file plugin, you can see that
sincedb_write is called before calling quit in teardown logic
https://github.com/wiibaa/logstash-input-file/blob/master/lib/logstash/inputs/file.rb#L146

@jsvd https://github.com/jsvd this a design decision, either tail#quit
should itself call sincedb_write or it should be clearly documented that it
is the responsbility of the lib user, what do you think?


Reply to this email directly or view it on GitHub
#24 (comment)
.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 23, 2015

@buckelij your point is then that sincedb_write_interval should happen regularly to flush the sincedb , even when no changes was made to watched files, This would only improve the non-clean exit.
But then your patch does not help this usecase because EOF is only reached when a watched file has changed.

@jsvd
Copy link
Collaborator

jsvd commented Feb 23, 2015

sincedb_write should definitely be called during quit, I don't see a situation where that is a bad thing. As for calling it on EOF, I'm more skeptical, as that'll be very aggressive (every line read will trigger an EOF, if I'm not mistaken). My suggestion would be to either change this PR to @wiibaa's suggestion or close this one and open one for sincedb_write on tail#quit.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 23, 2015

@jsvd if we go for the change I could incorporate it in #50 as there is a test for it, currently calling sincedb_write before quit. Just let me know

@buckelij
Copy link
Author

This would only improve the non-clean exit.

That's correct.

every line read will trigger an EOF, if I'm not mistaken

No, only when sysread(32768) from _read_file reads to the end of the file.

EOF is only reached when a watched file has changed.

That's fine; this is for the corner case where there have been changes within the last sincedb_write_interval (and then no changes after sincedb_write_interval passes) and then the process is killed.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 25, 2015

What I meant by EOF is only reached when a watched file has changed. is that file read is triggered by the @watch.subscribe when file size change, otherwise the _read_file method is not reached so when it is reached data will be read for the file so the boolean changed will be true and sincedb write depending of interval will be called when exiting the loop after EOF. Looking to the code in details, my feeling is that your proposal does not change anything in the behaviour of #_read_file method

@buckelij
Copy link
Author

changed will be true and sincedb will be written when exiting the loop after EOF

sincedb will not always be written when exiting the loop. It depends on if the sincedb_write_interval has elapsed or not.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 25, 2015

@buckelij corrected my comment, but it does not change the fact that in your initial test-logic _sincedb_write_if_due would be called twice in your patch because changed=true as you come from the when :modify in subscribe

@buckelij
Copy link
Author

@wiibaa ah, that is true that _sincedb_write_if_due would be called twice. I guess I can set changed=false as part of the rescue. Should I make that change, or is there still a question of whether this PR brings any value?

@buckelij
Copy link
Author

Oh, no. Hmm, this doesn't seem right. I would have to call _sincedb_write directly from the EOF rescue, otherwise this doesn't help at all.

@buckelij buckelij changed the title on EOF, check if sincedb should be written on EOF, write to sincedb Feb 25, 2015
@buckelij
Copy link
Author

I've updated the PR to always write when EOF is encountered. @jsvd if you're concerned that writing on EOF is too aggressive, we can just close this.

Regardless of the state of this PR, I think the following should be considered.

sincedb_write_interval should happen regularly to flush the sincedb , even when no changes was made to watched files

tail should always write sincedb when quiting

Thanks!

@bdanziger
Copy link

Guys,

I have a version of this that I submitted a long time back that took care of the EOF issue but it never made it in.  Not writing at EOF was a big problem for me and I have to assume is incorrect behavior.  It really needs to be flushed.

  1. YES, WRITING AT EOF is too aggressive…I did that in my first version just to try it.

  2. I remember what I did instead was ONLY updated at EOF if a since_db write/flush was pending.  
        So, you have a pending flag that gets cleared every time you flush out sincedb file.
        Every time you go to update sincedb….it might  not flush because it only flushes at some interval…so, you set the PENDING flag to TRUE.
        When you get an EOF, you would ONLY flush sincedb if it is PENDING….then, you would clear it after you flush it.

I have a version of this working for me that we use and it works great.  I think if you make this simple modification, it would be great.

Thanks 

On February 25, 2015 at 5:18:08 PM, Elijah Buck (notifications@github.com) wrote:

I've updated the PR to always write when EOF is encountered. @jsvd if you're concerned that writing on EOF is too aggressive, we can just close this.

Regardless of the state of this PR, I think the following should be considered.

sincedb_write_interval should happen regularly to flush the sincedb , even when no changes was made to watched files

tail should always write sincedb when quiting

Thanks!


Reply to this email directly or view it on GitHub.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 26, 2015

@bdanziger thanks for joining the discussion. In fact your PR #29 is next on my radar, I expected this one to be an easy warm-up. Let's try to conclude and move forward with following actions:

  1. add call to _sincedb_write in tail#quit (done in Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50), this impact only pure-filewatch user as logstash-input-file was already calling the public #sincedb_write method before quit.
  2. add a usage documentation to tail.rb about calling #quit on process exit as done in https://github.com/jordansissel/ruby-filewatch/blob/master/bin/globtail#L50
  3. merge Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50
  4. write spec proving the advantage of enhancing sincedb flushing #29
  5. rebase enhancing sincedb flushing #29 after Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50 is merged, potentially taking Merge all other contributions #32 (comment) into account
  6. continue the disscusion in enhancing sincedb flushing #29
  7. close this PR
  8. think about removing the public sincedb_write method to avoid additional issues as I do not see the added value frankly.

Am I missing something ? @jsvd @buckelij do you agree ?

@buckelij
Copy link
Author

close this PR

Works for me; this pr appears redundant to #29.

On Wednesday, February 25, 2015, Philippe Weber <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@bdanziger https://github.com/bdanziger thanks for joining the
discussion. In fact your PR #29
#29 is next on my
radar, I expected this one to be an easy warm-up. Let's try to conclude and
move forward with following actions:

  1. add call to _sincedb_write in tail#quit (done in Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50
    Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50), this impact
    only pure-filewatch user as logstash-input-file was already calling the
    public #sincedb_write method before quit.
  2. add a usage documentation to tail.rb about calling #quit on process
    exit as done in
    https://github.com/jordansissel/ruby-filewatch/blob/master/bin/globtail#L50
  3. merge Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50 Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50
  4. write spec proving the advantage of enhancing sincedb flushing #29
    enhancing sincedb flushing #29
  5. rebase enhancing sincedb flushing #29 enhancing sincedb flushing #29
    after Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50 Clean-up watch inode vs tail sincedb identifier usage + first test on writing sincedb #50 is
    merged, potentially taking Merge all other contributions #32 (comment)
    Merge all other contributions #32 (comment)
    into account
  6. continue the disscusion in enhancing sincedb flushing #29
    enhancing sincedb flushing #29
  7. close this PR
  8. think about removing the public sincedb_write method to avoid
    additional issues as I do not see the added value frankly.

Am I missing something ? @jsvd https://github.com/jsvd @buckelij
https://github.com/buckelij do you agree ?


Reply to this email directly or view it on GitHub
#24 (comment)
.

@jsvd
Copy link
Collaborator

jsvd commented Feb 26, 2015

Agreed!

regarding "1." I opened an issue on logstash-input-file to remove the explicit sincedb_write call once we merge this and release a new filewatch gem. Also needed if we end up doing 8. (which I'm all for it).

Also +1 on refactoring #29 according to #32 (comment)

❤️ to @buckelij @wiibaa @bdanziger

@bdanziger
Copy link

Not to slow anyone down…just curious about the public sincedb_write removal.

I recall when I enhanced this  (it’s been a while)  that logstash called the sincedb_write in it’s cleanup (If I am remembering correctly).

Which is why I passed in a flag to determine if you HAD to flush, or you could just set the pending flag to write.  The thought was that if the called wants to flush sincedb
then, it should be force written.

Are we saying we do not need that because it will always be flushed when the tail finishes?

I seemed to have missed some discussion on this and was confused.

If you don’t have time to answer…that’s fine.  Was just curious what I missed.

On February 26, 2015 at 8:03:02 AM, João Duarte (notifications@github.com) wrote:

Agreed!

regarding "1." I opened an issue on logstash-input-file to remove the explicit sincedb_write call once we merge this and release a new filewatch gem. Also needed if we end up doing 8. (which I'm all for it).

Also +1 on refactoring #29 according to #32 (comment)

to @buckelij @wiibaa @bdanziger


Reply to this email directly or view it on GitHub.

@wiibaa
Copy link
Contributor

wiibaa commented Feb 26, 2015

@bdanziger Indeed this issue is initially not related to logstash and you need to dig in the history.

  1. 2011 #quit method did not exist, so to flush sincedb at exit you needed a way, thus the public sincedb_write as used in https://github.com/jordansissel/ruby-filewatch/blob/master/bin/globtail#L50
  2. 2012 #quit was added and logstash still use the sequence sincedb_write then quit
  3. 2015 thinking about tail.rb design again it is not obvious why an external implementor would want to write the sincedb if it is not to stop the tail, thus we now incude a call to the private _sincedb_write into #quit method and propose removing the public one as everybody should now use the #quit method as mentioned in my point 2., this is a breaking change but will be cleaner, sincedb remaining managed internally only, avoiding potential threadsafety issue etc

@bdanziger
Copy link

Ok, thanks for the info.  I would agree that this should be handled internally.  

On February 26, 2015 at 9:22:18 AM, Philippe Weber (notifications@github.com) wrote:

@bdanziger Indeed this issue is initially not related to logstash and you need to dig in the history.

2011 #quit method did not exist, so to flush sincedb at exit you needed a way, thus the public sincedb_write as used in https://github.com/jordansissel/ruby-filewatch/blob/master/bin/globtail#L50
2012 #quit was added and logstash still use the sequence sincedb_write then quit 3 2015 thinking about tail.rb design again it is not obvious why an external implementor would want to write the sincedb if it is not to stop the tail, thus we now incude a call to the private _sincedb_write into #quit method and propose removing the public one as everybody should now use the #quit method as mentioned in my point 2., this is a breaking change but will be cleaner, sincedb remaining managed internally only, avoiding potential threadsafety issue etc

Reply to this email directly or view it on GitHub.

@guyboertje
Copy link
Contributor

It is not a good idea to write the sincedb on every EOF.
Closing.

@guyboertje guyboertje closed this Jan 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants