Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the take_over mode for filestream inputs #34292

Merged
merged 24 commits into from
Jan 27, 2023

Conversation

rdner
Copy link
Member

@rdner rdner commented Jan 17, 2023

What does this PR do?

If a filestream input has the configuration parameter take_over set to true, every loginput state record (in the registry) with the source that matches at least one of the filestream's paths/globs will be taken over by this filestream input.

This means the existing loginput state entry gets converted into a filestream entry (the loginput entry gets deleted).

The purpose of this mode is to make migration from loginput to filestream as simple and smooth as possible by adding take_over: true to the new filestream configuration. All offsets for input files will be preserved and the filestream will continue ingesting the files at the same point where the loginput stopped. This solves the previously occurring data duplication (file re-ingestion) problem.

Out of scope
This does not address migration of any integration or other input type (like container #34393) that's using loginput under the hood. This addresses only the user facing migration from log input to filestream input. The rest will follow later.

Why is it important?

To improve UX of the loginput->filestream migration and allow all our integrations to smoothly migrate to filestream as well.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Create a file loginput.log with a few lines like this:
line1
line2
line3
  1. Prepare this configuration file filebeat-log.yml for Filebeat:
filebeat.inputs:
  - type: log
    paths:
      - "/path/to/logs/loginput*"
path.data: "/path/to/data" # make sure you start with an empty registry
logging:
  level: debug
output.console:
  enabled: true

obviously, you'd need to replace /path/to with your test directory, including the following configs/commands below.

  1. Run Filebeat with this configuration file using this command:
./filebeat run -e -c /path/to/config/filebeat-log.yml 2> >(jq . > output-log.json)

You'll see the events on the console containing the lines from the log file:

{"@timestamp":"2023-01-19T18:02:45.534Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"log":{"offset":0,"file":{"path":"/Users/rdner/Projects/es_confs/take-over/logs/loginput.log"}},"message":"line1","input":{"type":"log"},"ecs":{"version":"8.0.0"},"host":{"name":"elastic-space.localdomain"},"agent":{"name":"elastic-space.localdomain","type":"filebeat","version":"8.7.0","ephemeral_id":"800c4352-9540-40e5-90e8-ad5087103553","id":"9559d47c-1c35-4ec7-a105-7db8ce539988"}}
{"@timestamp":"2023-01-19T18:02:45.534Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"input":{"type":"log"},"ecs":{"version":"8.0.0"},"host":{"name":"elastic-space.localdomain"},"agent":{"version":"8.7.0","ephemeral_id":"800c4352-9540-40e5-90e8-ad5087103553","id":"9559d47c-1c35-4ec7-a105-7db8ce539988","name":"elastic-space.localdomain","type":"filebeat"},"log":{"offset":6,"file":{"path":"/Users/rdner/Projects/es_confs/take-over/logs/loginput.log"}},"message":"line2"}
{"@timestamp":"2023-01-19T18:02:45.534Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"input":{"type":"log"},"ecs":{"version":"8.0.0"},"host":{"name":"elastic-space.localdomain"},"agent":{"ephemeral_id":"800c4352-9540-40e5-90e8-ad5087103553","id":"9559d47c-1c35-4ec7-a105-7db8ce539988","name":"elastic-space.localdomain","type":"filebeat","version":"8.7.0"},"log":{"file":{"path":"/Users/rdner/Projects/es_confs/take-over/logs/loginput.log"},"offset":12},"message":"line3"}
  1. Now prepare a new config file filebeat-filestream.yml (or follow the new migration guide):
filebeat.inputs:
  - type: filestream
    id: my-filestream-id
    take_over: true
    enabled: true
    paths:
      - "/path/to/logs/loginput*"
path.data: "/path/to/data" # make sure it's the same path as used before
logging:
  level: debug
output.console:
  enabled: true
  1. Run Filebeat with this new configuration file using this command:
./filebeat run -e -c /path/to/config/filebeat-filestream.yml 2> >(jq . > output-filestream.json)

Wait for ~30 seconds, you should not see any new output on the console.

  1. Add new lines to the log file with this command:
echo "line4\nline5" >> /path/to/logs/loginput.log

After a few seconds you should see the output on the console containing the added lines:

{"@timestamp":"2023-01-19T18:06:40.796Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"host":{"name":"elastic-space.localdomain"},"log":{"offset":18,"file":{"path":"/Users/rdner/Projects/es_confs/take-over/logs/loginput.log"}},"message":"line4","input":{"type":"filestream"},"agent":{"id":"9559d47c-1c35-4ec7-a105-7db8ce539988","name":"elastic-space.localdomain","type":"filebeat","version":"8.7.0","ephemeral_id":"c64baccf-a97a-4450-ad4b-d105afd7303c"},"ecs":{"version":"8.0.0"}}
{"@timestamp":"2023-01-19T18:06:40.797Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"log":{"offset":24,"file":{"path":"/Users/rdner/Projects/es_confs/take-over/logs/loginput.log"}},"message":"line5","input":{"type":"filestream"},"host":{"name":"elastic-space.localdomain"},"agent":{"type":"filebeat","version":"8.7.0","ephemeral_id":"c64baccf-a97a-4450-ad4b-d105afd7303c","id":"9559d47c-1c35-4ec7-a105-7db8ce539988","name":"elastic-space.localdomain"},"ecs":{"version":"8.0.0"}}
  1. Stop Filebeat and check the output-filestream.json file created near the Filebeat binary, it should contain log objects with the take over process:
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.770+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 181
  },
  "message": "recursive glob enabled for filestream `my-filestream-id`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.770+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 195
  },
  "message": "found 1 patterns for filestream `my-filestream-id`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.770+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 168
  },
  "message": "found 1 filestream inputs in `take over` mode",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 129
  },
  "message": "found loginput state `filebeat::logs::native::95638108-16777232` to take over by `filestream::my-filestream-id::native::95638108-16777232`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 60
  },
  "message": "Attempting to find the checkpoint...",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 68
  },
  "message": "Checkpoint not found",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 72
  },
  "message": "Checking if the registry log exists at /Users/rdner/Projects/es_confs/take-over/data/registry/filebeat/log.json...",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 79
  },
  "message": "Found the registry log at /Users/rdner/Projects/es_confs/take-over/data/registry/filebeat/log.json",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 82
  },
  "message": "Creating backups for [/Users/rdner/Projects/es_confs/take-over/data/registry/filebeat/log.json]...",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2023-01-19T19:05:36.771+0100",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 87
  },
  "message": "filestream inputs took over 1 file(s) from loginputs",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}

This verifies that the filestream input took over the state from the log input, didn't duplicate events on startup and ingested new data from the original log file.

Also, you can see that the registry log was backed up (/path/to/data/registry/filebeat, see the configs above):

Screenshot 2023-01-20 at 09 00 12

Related issues

@rdner rdner self-assigned this Jan 17, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 17, 2023
@mergify
Copy link
Contributor

mergify bot commented Jan 17, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @rdner? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@rdner rdner added the backport-skip Skip notification from the automated backport with mergify label Jan 17, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 17, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-01-26T11:28:23.408+0000

  • Duration: 76 min 30 sec

Test stats 🧪

Test Results
Failed 0
Passed 7227
Skipped 746
Total 7973

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@rdner rdner force-pushed the filestream-take-over branch 2 times, most recently from f828e16 to d14a4f9 Compare January 18, 2023 14:19
If a `filestream` input has the configuration parameter `take_over`
set to `true`, every `loginput` state record (in the registry) with
the `source` that matches at least one of the `filestream`'s
paths/globs will be taken over by this `filestream` input.

This means the existing `loginput` state entry gets converted into a
`filestream` entry (the `loginput` entry gets deleted).

The purpose of this mode is to make migration from `loginput` to
`filestream` as simple and smooth as possible by adding `take_over:
true` to the new `filestream` configuration. All offsets for input
files will be preserved and the `filestream` will continue ingesting
the files at the same point where the `loginput` stopped. This solves
the previously occurring data duplication (file re-ingestion) problem.
@rdner rdner marked this pull request as ready for review January 19, 2023 18:21
@rdner rdner requested a review from a team as a code owner January 19, 2023 18:21
@rdner rdner requested review from cmacknz and fearful-symmetry and removed request for a team January 19, 2023 18:21
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@rdner rdner removed the request for review from fearful-symmetry January 19, 2023 18:22
@v1v
Copy link
Member

v1v commented Jan 25, 2023

/test

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a high level review and this looks really good! Very well described and documented.

I want to come back and manually test this before I give it a final approval.

filebeat/input/filestream/takeover/takeover.go Outdated Show resolved Hide resolved
filebeat/docs/howto/migrate-to-filestream.asciidoc Outdated Show resolved Hide resolved
filebeat/docs/howto/migrate-to-filestream.asciidoc Outdated Show resolved Hide resolved
filebeat/docs/howto/migrate-to-filestream.asciidoc Outdated Show resolved Hide resolved
filebeat/docs/howto/migrate-to-filestream.asciidoc Outdated Show resolved Hide resolved
filebeat/docs/howto/migrate-to-filestream.asciidoc Outdated Show resolved Hide resolved
CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Jan 26, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b filestream-take-over upstream/filestream-take-over
git merge upstream/main
git push upstream filestream-take-over

@belimawr
Copy link
Contributor

@belimawr

Also, for the tests, I believe it's better to use a disabled logger.

I looked at other tests and didn't find how to create a disabled logger, could you point to an example?

Here there is an example:
https://github.com/belimawr/elastic-agent/blob/c9024b24c9a867ba42bce93853b3fd7bf41dc4e7/internal/pkg/agent/application/upgrade/upgrade_integration_test.go#L98-L121

It's a function to enable/disable the logger based on the status of test.v. The default state seems to be a disabled logger, but I've seen some tests logging stuff 🤷‍♂️ . No need to spend much time on it ;)

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment about a error message, I believe it needs to be more precise. But I'm open to discuss if you thing differently ;)

CHANGELOG.next.asciidoc Show resolved Hide resolved
filebeat/input/filestream/takeover/takeover.go Outdated Show resolved Hide resolved
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and LGTM! Thanks! I really like the way this turned out. I did have one thought while I was testing this to follow up on.

I think it should be possible to set the take_over flag and execute it as a dry run where it shows you what paths it would take over without actually doing the migration. This will let people be confident that the filestream configuration they wrote only takes over what they actually want it to. I think this will be important when there are many log inputs in the same Filebeat instance. Thoughts @rdner? We don't have to do it for this PR but we can create a follow up issue and do it separately.

For the purposes of migration it would also be quite nice to define the new filestream and log inputs in the same configuration file and run them next to each other to compare the output. Probably this can more easily achieved by just running filebeat twice with two different configuration files and registry locations. Could we document how to set this up in the migration guide?

I think both of those options would make people feel more comfortable switching to filestream, because they can easily verify that their logs are harvested with the right format and that the filestream inputs they wrote will take over from the correct log inputs.

@cmacknz
Copy link
Member

cmacknz commented Jan 26, 2023

It may also be worth testing how long the take over mode takes for very large registries so we can set the expectations properly in the docs if it ends up taking a few minutes or something.

@rdner
Copy link
Member Author

rdner commented Jan 27, 2023

@cmacknz I like the idea of having the dry-run mode but I'm yet to understand how to run it and get the results, it can be a separate command on the Filebeat binary that just outputs the results and exits. Otherwise, the result would be lost in logs.

For the purposes of migration it would also be quite nice to define the new filestream and log inputs in the same configuration file and run them next to each other to compare the output. Probably this can more easily achieved by just running filebeat twice with two different configuration files and registry locations. Could we document how to set this up in the migration guide?

Running log inputs and filestream inputs that take over from them is not supported and the migration guide does explicitly say that. If users choose (or we tell them) to run log inputs first and filestream inputs with take_over: true next, it won't produce the same events on the output (different file offsets), they might be able to compare the formatting but they would not be able to compare the output exactly.

I don't see how it brings any value.

It may also be worth testing how long the take over mode takes for very large registries so we can set the expectations properly in the docs if it ends up taking a few minutes or something.

take_over mode does not depend on the size of the registry. It engages after the whole registry log was loaded and the final state was computed. The only thing that affects the performance of the take_over mode is the amount of files opened for ingestion.

@rdner rdner merged commit 06c4856 into elastic:main Jan 27, 2023
@rdner rdner deleted the filestream-take-over branch January 27, 2023 08:21
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
If a `filestream` input has the configuration parameter `take_over`
set to `true`, every `loginput` state record (in the registry) with
the `source` that matches at least one of the `filestream`'s
paths/globs will be taken over by this `filestream` input.

This means the existing `loginput` state entry gets converted into a
`filestream` entry (the `loginput` entry gets deleted).

The purpose of this mode is to make migration from `loginput` to
`filestream` as simple and smooth as possible by adding `take_over:
true` to the new `filestream` configuration. All offsets for input
files will be preserved and the `filestream` will continue ingesting
the files at the same point where the `loginput` stopped. This solves
the previously occurring data duplication (file re-ingestion) problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.7-candidate backport-skip Skip notification from the automated backport with mergify enhancement Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Design]Automate the migration of log input states to filestream
6 participants