doc: promtail known failure modes #924

pracucci · 2019-08-20T15:39:25Z

What this PR does / why we need it:

From the user perspective, I believe it's important to address known failure modes in the documentation, in order to set expectations and point the user into the right direction when it comes to configuration settings.

In this PR I'm suggesting to start documenting promtail known failure modes.

Checklist

Documentation added
Tests updated

docs/promtail-failure-modes.md

slim-bean · 2019-08-20T16:01:43Z

docs/promtail-failure-modes.md

+- `/app.log` size is >= than the position before truncating
+
+If the `/app.log` file size is less than the previous position, then the file is detected as truncated and logs will be tailed starting from position `0`. Otherwise, if the `/app.log` file size is >= than the previous position, `promtail` can't detect it was truncated while not running and will continue tailing the file from position `100`.
+


Might be worth noting if the log file rolled mulitple times while promtail wasn't running and the size is greater than the position from the file it will start at the position from the positions file and not the beginning. Or put another way, promtail does not do anything fancy like track a hash of the log file to know if it's actually continuing from the same file or not

Somewhere I feel like we need to document the advantages of using larger log files in regards to promtail and decreasing odds of lost log lines. The less frequently log files are rolled the better success promtail has when things go wrong... Not sure where this might go but maybe it fits on this page?

slim-bean · 2019-08-20T16:05:32Z

docs/promtail-failure-modes.md

+
+When `promtail` shutdown gracefully, it saves the last read offsets in the positions file, so that on a subsequent restart it will continue tailing logs without duplicates neither losses.
+
+In the unlikely event of a crash, `promtail` can't save the last read offsets in the positions file. When restarted, `promtail` will read the positions file saved at the last sync period and will continue tailing the files from there. This means that if new log entries have been read and pushed to the ingester between the last sync period and the crash, these log entries will be sent again to the ingester on `promtail` restart.


haha, i'm not sure how unlikely this is :) but I appreciate your optimism... Though yes crashing is hopefully unlikely but OOM's in a kubernetes type environment could happen.

Maybe also note that resending logs would be ignored by loki as loki currently rejects logs with older timestamps than it has already received? So you don't need to crank down the sync_period, sending of some duplicates is ok

Maybe also note that resending logs would be ignored by loki as loki currently rejects logs with older timestamps than it has already received? So you don't need to crank down the sync_period, sending of some duplicates is ok

You're definitely right. I will revisit that paragraph accordingly.

docs/promtail.md

pracucci · 2019-08-20T16:21:57Z

Thanks @slim-bean for taking the time to read it. I've tried to address your comments. May you re-review it please?

docs/promtail/known-failure-modes.md

pracucci · 2019-08-21T07:47:04Z

Thanks again @slim-bean for reviewing it. I have addressed your last comment.

slim-bean

Thanks @pracucci ! LGTM!

slim-bean reviewed Aug 20, 2019

View reviewed changes

docs/promtail-failure-modes.md Outdated Show resolved Hide resolved

slim-bean reviewed Aug 20, 2019

View reviewed changes

docs/promtail.md Outdated Show resolved Hide resolved

pracucci force-pushed the document-promtail-known-failure-modes branch 2 times, most recently from 120ef85 to d282ce1 Compare August 20, 2019 16:21

slim-bean reviewed Aug 20, 2019

View reviewed changes

docs/promtail/known-failure-modes.md Outdated Show resolved Hide resolved

Documented promtail known failure modes

0cba30f

pracucci force-pushed the document-promtail-known-failure-modes branch from d282ce1 to 0cba30f Compare August 21, 2019 07:46

slim-bean approved these changes Aug 21, 2019

View reviewed changes

slim-bean merged commit 75a3e61 into grafana:master Aug 21, 2019

pracucci deleted the document-promtail-known-failure-modes branch August 22, 2019 07:15

pracucci mentioned this pull request Aug 22, 2019

doc: move promtail doc into dedicated subfolder #933

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: promtail known failure modes #924

doc: promtail known failure modes #924

pracucci commented Aug 20, 2019

slim-bean Aug 20, 2019 •

edited

Loading

slim-bean Aug 20, 2019 •

edited

Loading

slim-bean Aug 20, 2019

pracucci Aug 20, 2019

pracucci commented Aug 20, 2019

pracucci commented Aug 21, 2019

slim-bean left a comment

		- `/app.log` size is >= than the position before truncating

		If the `/app.log` file size is less than the previous position, then the file is detected as truncated and logs will be tailed starting from position `0`. Otherwise, if the `/app.log` file size is >= than the previous position, `promtail` can't detect it was truncated while not running and will continue tailing the file from position `100`.


		When `promtail` shutdown gracefully, it saves the last read offsets in the positions file, so that on a subsequent restart it will continue tailing logs without duplicates neither losses.

		In the unlikely event of a crash, `promtail` can't save the last read offsets in the positions file. When restarted, `promtail` will read the positions file saved at the last sync period and will continue tailing the files from there. This means that if new log entries have been read and pushed to the ingester between the last sync period and the crash, these log entries will be sent again to the ingester on `promtail` restart.

doc: promtail known failure modes #924

doc: promtail known failure modes #924

Conversation

pracucci commented Aug 20, 2019

slim-bean Aug 20, 2019 • edited Loading

Choose a reason for hiding this comment

slim-bean Aug 20, 2019 • edited Loading

Choose a reason for hiding this comment

slim-bean Aug 20, 2019

Choose a reason for hiding this comment

pracucci Aug 20, 2019

Choose a reason for hiding this comment

pracucci commented Aug 20, 2019

pracucci commented Aug 21, 2019

slim-bean left a comment

Choose a reason for hiding this comment

slim-bean Aug 20, 2019 •

edited

Loading

slim-bean Aug 20, 2019 •

edited

Loading