Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert apache2.access to ECS #8901

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@ https://github.com/elastic/beats/compare/v6.5.0...v7.0.0-alpha1[View commits]
- Make docker input check if container strings are empty {pull}7960[7960]
- Keep unparsed user agent information in user_agent.original. {pull}8537[8537]
- Allow to force CRI format parsing for better performance {pull}8424[8424]
- Migrate Apache access fileset to ECS. {pull}8901[8901]

*Heartbeat*

Expand Down
14 changes: 2 additions & 12 deletions auditbeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4387,10 +4387,10 @@ URL fields provide a complete URL, with scheme, host, and path. The URL object c



*`url.href`*::
*`url.original`*::
+
--
type: text
type: keyword

example: https://elastic.co:443/search?q=elasticsearch#top

Expand All @@ -4399,16 +4399,6 @@ Full url. The field is stored as keyword.
`href` is an analyzed field so the parsed information can be accessed through `href.analyzed` in queries.


*`url.href.raw`*::
+
--
type: keyword

The full URL. This is a non-analyzed field that is useful for aggregations.


--

--

*`url.scheme`*::
Expand Down
2 changes: 1 addition & 1 deletion auditbeat/include/fields.go

Large diffs are not rendered by default.

62 changes: 62 additions & 0 deletions dev-tools/ecs-migration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@
alias: true
copy_to: false

# Filebeat modules

## Suricata module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to rebase from master, to get the Suricata source_ecs replaced

- from: source_ecs.ip
to: source.ip
alias: true
Expand Down Expand Up @@ -85,6 +89,64 @@
alias: true
copy_to: false

## Apache

- from: apache2.access.remote_ip
to: source.ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alias cannot stay there, since the semantics are different.

In the grok you split this field towards two possible fields. In most cases the ip field will be the one that gets populated, but if HostnameLookups is set to On, part of your events will only populate source.domain.

I think there's value in keeping the ambiguous field around (and therefore not making it an alias), and doing the split towards IP and domain anyway.

alias: true
copy_to: false

- from: apache2.access.user_name
to: user.name
alias: true
copy_to: false

- from: apache2.access.method
to: http.request.method
alias: true
copy_to: false

- from: apache2.access.url
to: url.original
alias: true
copy_to: false

- from: apache2.access.http_version
to: http.version
alias: true
copy_to: false

- from: apache2.access.response_code
to: http.response.status_code
alias: true
copy_to: false

- from: apache2.access.body_sent.bytes
to: http.response.body_sent.bytes
alias: true
copy_to: false

- from: apache2.access.referrer
to: http.request.referer
alias: true
copy_to: false

- from: apache2.access.agent
to: user_agent.original
alias: true
copy_to: false

- from: read_timestamp
to: event.created
alias: false
copy_to: false

# This expands all geoip fields
- from: apache2.access.geoip.*
to: source.geoip.*
alias: false
copy_to: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you get around to actually creating the aliases in the module's field defs, you can check out how I did it for the other modules (example). The definitions are so regular that with judicious find & replace, you can get them adjusted to this module very quickly.


# From Auditbeat's auditd module.
- from: source.hostname
to: source.domain
Expand Down
84 changes: 43 additions & 41 deletions filebeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -45,79 +45,91 @@ Apache2 Module



[float]
== apache2 fields

Apache2 fields.


*`http.response.body_sent.bytes`*::
+
--
type: long

[float]
== access fields
format: bytes

Contains fields for the Apache2 HTTPD access logs.
The number of bytes of the server response body.


--

*`apache2.access.remote_ip`*::
*`source.hostname`*::
+
--
type: keyword

Client IP address.
test


--

*`apache2.access.user_name`*::
*`http.request.referer`*::
+
--
type: keyword

The user name used when basic authentication is used.
Http request referer.


--

*`apache2.access.method`*::
+
--
type: keyword
[float]
== apache2 fields

example: GET
Apache2 fields.

The request HTTP method.


--
[float]
== access fields

Contains fields for the Apache2 HTTPD access logs.

*`apache2.access.url`*::


*`source.ip`*::
+
--
type: keyword
type: alias

The request HTTP URL.
--

*`user.name`*::
+
--
type: alias

--

*`apache2.access.http_version`*::
*`http.request.method`*::
+
--
type: keyword
type: alias

The HTTP version.
--

*`url.original`*::
+
--
type: alias

--

*`apache2.access.response_code`*::
*`http.version`*::
+
--
type: long
type: alias

The HTTP response code.
--

*`http.response.status_code`*::
+
--
type: alias

--

Expand Down Expand Up @@ -2444,10 +2456,10 @@ URL fields provide a complete URL, with scheme, host, and path. The URL object c



*`url.href`*::
*`url.original`*::
+
--
type: text
type: keyword

example: https://elastic.co:443/search?q=elasticsearch#top

Expand All @@ -2456,16 +2468,6 @@ Full url. The field is stored as keyword.
`href` is an analyzed field so the parsed information can be accessed through `href.analyzed` in queries.


*`url.href.raw`*::
+
--
type: keyword

The full URL. This is a non-analyzed field that is useful for aggregations.


--

--

*`url.scheme`*::
Expand Down
2 changes: 1 addition & 1 deletion filebeat/include/fields.go

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions filebeat/module/apache2/_meta/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,26 @@
Apache2 Module
short_config: true
fields:
- name: http.response.body_sent.bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to include the HTTP size metrics in ECS. But I don't think they'll look like that. The metrics available are typically request header size, request body size, response header size, response body size. And for both request and response, one could also be interested in having a total size. I'm not saying all of those need to make it to ECS necessarily. But given all of these options, I doubt the response.body_sent.bytes will stick around as is.

I say we should hash that out first in ECS, and revisit all affected modules afterwards. Personally I'd leave as previous for now.

But not a big deal either way. Once we've figured this out in ECS, we'll revisit here anyway :-)

type: long
format: bytes
description: >
The number of bytes of the server response body.

- name: source.hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source.hostname is no longer in ECS.

type: keyword
description: >
test

- name: http.request.referer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECS doesn't reproduce the typo ;-) It should be http.request.referrer

type: keyword
description: >
Http request referer.

- name: apache2
type: group
description: >
Apache2 fields.
fields:


41 changes: 22 additions & 19 deletions filebeat/module/apache2/access/_meta/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,31 @@
Contains fields for the Apache2 HTTPD access logs.
fields:
- name: remote_ip
type: keyword
description: >
Client IP address.
type: alias
path: source.ip

- name: user_name
type: keyword
description: >
The user name used when basic authentication is used.
type: alias
path: user.name

- name: method
type: keyword
example: GET
description: >
The request HTTP method.
type: alias
path: http.request.method

- name: url
type: keyword
description: >
The request HTTP URL.
type: alias
webmat marked this conversation as resolved.
Show resolved Hide resolved
path: url.original

- name: http_version
type: keyword
description: >
The HTTP version.
type: alias
path: http.version

- name: response_code
type: long
description: >
The HTTP response code.
type: alias
path: http.response.status_code



- name: body_sent.bytes
type: long
format: bytes
Expand All @@ -37,6 +38,7 @@
type: keyword
description: >
The HTTP referrer.

- name: agent
type: text
description: >
Expand Down Expand Up @@ -121,3 +123,4 @@
type: keyword
description: >
Region ISO code.

18 changes: 6 additions & 12 deletions filebeat/module/apache2/access/ingest/default.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
"grok": {
"field": "message",
"patterns":[
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -"
"(%{IP:source.ip}|%{HOSTNAME:source.domain}) - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referer}\")?( \"%{DATA:user_agent.original}\")?",
"(%{IP:source.ip}|%{HOSTNAME:source.domain}) - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:http.response.status_code} -"
],
"ignore_missing": true
}
Expand All @@ -16,7 +16,7 @@
}, {
"rename": {
"field": "@timestamp",
"target_field": "read_timestamp"
"target_field": "event.created"
}
}, {
"date": {
Expand All @@ -31,19 +31,13 @@
}, {
"user_agent": {
"field": "apache2.access.agent",
"target_field": "apache2.access.user_agent",
"ignore_failure": true
}
}, {
"rename": {
"field": "apache2.access.agent",
"target_field": "apache2.access.user_agent.original",
"target_field": "user_agent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your unparsed user_agent is no longer in the apache2.access.agent, so currently the UA parser skips for all of your events. If you look at your JSON expected log results, you'll see they no longer include the parsed user agent :-)

However as I've discovered, you can't start by setting user_agent.original, and then have the UA parser output to user_agent. It won't merge its results around user_agent.original, rather it will overwrite everything there. So you'll have your parsed UA but no longer the original.

The process has to be

  1. Extract original UA to temporary field
  2. Perform UA parsing with default target_field (default is already user_agent)
  3. Move original UA from temporary field to user_agent.original

"ignore_failure": true
}
}, {
"geoip": {
"field": "apache2.access.remote_ip",
"target_field": "apache2.access.geoip"
"field": "source.ip",
"target_field": "source.geo"
}
}],
"on_failure" : [{
Expand Down
Loading