Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat] Add Syslog parser and processor #30541

Merged
merged 13 commits into from
Mar 21, 2022

Conversation

taylor-swanson
Copy link
Contributor

@taylor-swanson taylor-swanson commented Feb 22, 2022

What does this PR do?

  • Add Syslog parser
  • Add Syslog processor
  • Add unit tests and benchmarks
  • Add processor documentation

Why is it important?

This change allows us to detach syslog message parsing from a specific filebeat input. A new processor and parser have been added to libbeat, each providing the ability to parse RFC 3164 and RFC 5424-formatted syslog messages.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Run unit tests in these packages:

libbeat/processors/syslog
libbeat/reader/syslog

Benchmark tests are also available.

Example config for a filebeat processor:

filebeat.inputs:
- type: udp
  host: "localhost:9000"
  processors:
    - syslog:
        field: message
        format: auto
        timezone: Local

Example config for a filebeat parser:

filebeat.inputs:
- type: filestream
  paths:
    - /tmp/syslog.txt
  parsers:
    - syslog:
        format: auto
        timezone: Local

Related issues

- Add Syslog parser
- Add Syslog processor
- Add unit tests and benchmarks
- Add processor documentation
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 22, 2022
@mergify
Copy link
Contributor

mergify bot commented Feb 22, 2022

This pull request does not have a backport label. Could you fix it @taylor-swanson? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 22, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 22, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-21T13:45:23.423+0000

  • Duration: 81 min 47 sec

Test stats 🧪

Test Results
Failed 0
Passed 22523
Skipped 1940
Total 24463

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@taylor-swanson taylor-swanson marked this pull request as ready for review February 22, 2022 19:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some high-level comments. I did not dive into any of the parser.

libbeat/processors/syslog/docs/syslog.asciidoc Outdated Show resolved Hide resolved
libbeat/processors/syslog/syslog.go Outdated Show resolved Hide resolved
libbeat/processors/syslog/syslog.go Outdated Show resolved Hide resolved
libbeat/reader/parser/parser.go Show resolved Hide resolved
libbeat/reader/parser/parser.go Show resolved Hide resolved
- Improve timestamp comments
- Append error.message if field key already exists
- Fix failure and missing counters with respect to ignore options
- Remap syslog fields to ECS fields
- Add tag option to processor for debug annotation
- Add parser documentation
@mergify
Copy link
Contributor

mergify bot commented Feb 28, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b syslog-parser upstream/syslog-parser
git merge upstream/main
git push upstream syslog-parser

- Fix tabs in yaml in docs
- Add link to HTTP endpoint documentation
- Use errors.Is
- Don't set event.original if data was empty string
- Remove whitespace trim when setting message field
@taylor-swanson taylor-swanson requested a review from a team as a code owner March 8, 2022 16:10
<165>1 2003-08-24T05:14:30.000003-07:00 192.0.2.1 myproc 8710 - - at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:77)
<165>1 2003-08-24T05:14:30.000003-07:00 192.0.2.1 myproc 8710 - - at org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.checkBlock(TransportDeleteIndexAction.java:75)
`,
expectedMessages: []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

libbeat/reader/syslog/message.go Outdated Show resolved Hide resolved
libbeat/reader/parser/parser.go Show resolved Hide resolved
- These fields will be set by other processors if desired by the user
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I recommend running golangci-lint before merging.

It might be worthwhile to explain the behavior of dates without years in the docs. Sometimes people ingesting logs from the previous year are surprised when those logs appear to be in the future (e.g. logs from Dec 31 ingested on Jan 1 get the wrong year).

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't do a thorough review but LGTM.

@taylor-swanson
Copy link
Contributor Author

LGTM. I recommend running golangci-lint before merging.

It might be worthwhile to explain the behavior of dates without years in the docs. Sometimes people ingesting logs from the previous year are surprised when those logs appear to be in the future (e.g. logs from Dec 31 ingested on Jan 1 get the wrong year).

What are your thoughts on something like this:

func (m *message) setTimestampBSD(v string, ref time.Time) {
	if parsed, err := time.ParseInLocation(time.Stamp, v, ref.Location()); err == nil {
		year := ref.Year()
		if parsed.Month() == time.December && ref.Month() == time.January {
			year--
		}
		m.timestamp = parsed.AddDate(year, 0, 0)
	}
}

If the parsed month is December and the current month is now January, we decrement the year by 1. This gives a month on either side of the year boundary to make sure the year is correct. If we're outside of this window, all bets are off. Another benefit with this is the function now accepts a time.Time. We can write unit tests to explicitly test the year boundary and any other cases we can think of.

Either way, I'll document this situation and suggest that if logs are meant to be stored more long term and ingested at a later date, then it is critical that they switch to a timestamp that provides more information (ISO 8601/RFC 3339). Heck, the RFC even says this:

It has been found that some network administrators like to archive their syslog messages over long periods of time. [...] Implementers may wish to utilize the ISO 8601 [7] date and time formats if they want to include more explicit date and time information.

@andrewkroh
Copy link
Member

What are your thoughts on something like this:

I would avoid the heuristic based solution (or at least make it optional behavior). There are a lot of edge cases. Logstash has some discussion on the topic in logstash-plugins/logstash-filter-date#51 (comment).

@taylor-swanson
Copy link
Contributor Author

What are your thoughts on something like this:

I would avoid the heuristic based solution (or at least make it optional behavior). There are a lot of edge cases. Logstash has some discussion on the topic in logstash-plugins/logstash-filter-date#51 (comment).

That's fair. In that case, I think this is something that should be handled in a separate issue. The current implementation behaves just like the existing input, so behavior hasn't changed.

Having configuration for this that could handle several different scenarios would provide the better options for customers. What I proposed would not help a customer ingest logs older than a year, for example. It was only meant to handle year boundaries, but even at that there are still edge cases.

@leehinman leehinman merged commit 2e04486 into elastic:main Mar 21, 2022
@taylor-swanson taylor-swanson deleted the syslog-parser branch March 21, 2022 15:22
@kvch kvch mentioned this pull request Apr 19, 2022
21 tasks
kush-elastic pushed a commit to kush-elastic/beats that referenced this pull request May 2, 2022
* [libbeat] Add Syslog parser and processor

- Add Syslog parser
- Add Syslog processor
- Add unit tests and benchmarks
- Add processor documentation
- Append error.message if field key already exists
- Fix failure and missing counters with respect to ignore options
- Remap syslog fields to ECS fields
- Add tag option to processor for debug annotation
- Add parser documentation
- Add docs for time zone and year enrichment
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* [libbeat] Add Syslog parser and processor

- Add Syslog parser
- Add Syslog processor
- Add unit tests and benchmarks
- Add processor documentation
- Append error.message if field key already exists
- Fix failure and missing counters with respect to ignore options
- Remap syslog fields to ECS fields
- Add tag option to processor for debug annotation
- Add parser documentation
- Add docs for time zone and year enrichment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement libbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Syslog Processor
5 participants