Skip to content

Commit

Permalink
fix(kubernetes_logs source): Always use file checkpoints if they exist
Browse files Browse the repository at this point in the history
The `kubernetes_logs` source exposes a `PathProvider` that breaks one of
the `FileServer`s assumptions that all available files will be listed at
Vector startup time. Instead, the files are only returned once the k8s
metadata is available to the `kubernetes_logs` source. This caused the
`FileServer` to ignore any checkpoints that existed for these files.

As a short-term fix, we just always use the checkpoint, if available,
for any new files that are seen. This fixes the case for the
`kubernetes_logs` source where they are seen as "new" after start-up.

#6564 exists to test this
behavior, but it seems to pass even without this change, so that test
will need to be updated.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
  • Loading branch information
jszwedko committed Apr 15, 2021
1 parent 0c9b441 commit 60d5df1
Showing 1 changed file with 17 additions and 11 deletions.
28 changes: 17 additions & 11 deletions lib/file-source/src/file_server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -392,24 +392,30 @@ where
// Determine the initial _requested_ starting point in the file. This can be overridden
// once the file is actually opened and we determine it is compressed, older than we're
// configured to read, etc.
let read_from = if startup {
// If we are starting up, use the stored checkpoint unless the user has opted out. If
// they have opted out or there is no checkpoint present, fall back to `read_from`.
if self.ignore_checkpoints {
self.read_from
} else {
checkpoints
.get(file_id)
.map(ReadFrom::Checkpoint)
.unwrap_or(self.read_from)
}
let fallback = if startup {
self.read_from
} else {
// Always read new files that show up while we're running from the beginning. There's
// not a good way to determine if they were moved or just created and written very
// quickly, so just make sure we're not missing any data.
ReadFrom::Beginning
};

// Always prefer the stored checkpoint unless the user has opted out.
// Previously, the checkpoint was only loaded for new files when Vector wast started up,
// but the `kubernetes_logs` source returns the files well after start-up, once it has
// populated them from the k8s metadata, so we now just always use the checkpoints unless
// opted out.
// https://github.com/timberio/vector/issues/7139
let read_from = if !self.ignore_checkpoints {
checkpoints
.get(file_id)
.map(ReadFrom::Checkpoint)
.unwrap_or(fallback)
} else {
fallback
};

match FileWatcher::new(
path.clone(),
read_from,
Expand Down

0 comments on commit 60d5df1

Please sign in to comment.