Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: promtail known failure modes #924

Merged

Conversation

pracucci
Copy link
Contributor

What this PR does / why we need it:

From the user perspective, I believe it's important to address known failure modes in the documentation, in order to set expectations and point the user into the right direction when it comes to configuration settings.

In this PR I'm suggesting to start documenting promtail known failure modes.

Checklist

  • Documentation added
  • Tests updated

- `/app.log` size is >= than the position before truncating

If the `/app.log` file size is less than the previous position, then the file is detected as truncated and logs will be tailed starting from position `0`. Otherwise, if the `/app.log` file size is >= than the previous position, `promtail` can't detect it was truncated while not running and will continue tailing the file from position `100`.

Copy link
Collaborator

@slim-bean slim-bean Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting if the log file rolled mulitple times while promtail wasn't running and the size is greater than the position from the file it will start at the position from the positions file and not the beginning. Or put another way, promtail does not do anything fancy like track a hash of the log file to know if it's actually continuing from the same file or not

Copy link
Collaborator

@slim-bean slim-bean Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere I feel like we need to document the advantages of using larger log files in regards to promtail and decreasing odds of lost log lines. The less frequently log files are rolled the better success promtail has when things go wrong... Not sure where this might go but maybe it fits on this page?


When `promtail` shutdown gracefully, it saves the last read offsets in the positions file, so that on a subsequent restart it will continue tailing logs without duplicates neither losses.

In the unlikely event of a crash, `promtail` can't save the last read offsets in the positions file. When restarted, `promtail` will read the positions file saved at the last sync period and will continue tailing the files from there. This means that if new log entries have been read and pushed to the ingester between the last sync period and the crash, these log entries will be sent again to the ingester on `promtail` restart.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, i'm not sure how unlikely this is :) but I appreciate your optimism... Though yes crashing is hopefully unlikely but OOM's in a kubernetes type environment could happen.

Maybe also note that resending logs would be ignored by loki as loki currently rejects logs with older timestamps than it has already received? So you don't need to crank down the sync_period, sending of some duplicates is ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also note that resending logs would be ignored by loki as loki currently rejects logs with older timestamps than it has already received? So you don't need to crank down the sync_period, sending of some duplicates is ok

You're definitely right. I will revisit that paragraph accordingly.

docs/promtail.md Outdated Show resolved Hide resolved
@pracucci pracucci force-pushed the document-promtail-known-failure-modes branch 2 times, most recently from 120ef85 to d282ce1 Compare August 20, 2019 16:21
@pracucci
Copy link
Contributor Author

Thanks @slim-bean for taking the time to read it. I've tried to address your comments. May you re-review it please?

@pracucci pracucci force-pushed the document-promtail-known-failure-modes branch from d282ce1 to 0cba30f Compare August 21, 2019 07:46
@pracucci
Copy link
Contributor Author

Thanks again @slim-bean for reviewing it. I have addressed your last comment.

Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pracucci ! LGTM!

@slim-bean slim-bean merged commit 75a3e61 into grafana:master Aug 21, 2019
@pracucci pracucci deleted the document-promtail-known-failure-modes branch August 22, 2019 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants