Skip to content

Commit

Permalink
Base structure for further work on ECS (elastic#1)
Browse files Browse the repository at this point in the history
This PR is intended to provide a foundation for all upcoming work on the Elastic Common Schema. It contains some examples schemas,  a script to generate the docs out of this example schema and a basic README.md.

All follow up work will happen in PRs and work can be tracked in Github issue.
  • Loading branch information
ruflin authored Nov 9, 2017
1 parent 5b8fa6d commit add5c46
Show file tree
Hide file tree
Showing 10 changed files with 311 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
generate:
python generate.py
56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,54 @@
# ecs
Elastic Common Schema
**WARNING: THIS IS WORK IN PROGRESS**

# Elastic Common Schema (ECS)

This are the definitions of the Elastic Common Schema (ECS). The schemas are stored in a `.yml` in the directory `schemas`. Each namespace of the schema has it's own file. These files can be used to generate either the docs by running `make generate` or later also to create an Elasticsearch template or Kibana index pattern. It's mostly based on the `fields.yml` structure from Beats.

## Rules

Here come the rules on how to name and create the fields for ECS.

## Docs

The generate ECS documentation output can be found [here](./schema.md).

## Fields

`fields.yml` files are used to describe the Elastic Common Schema in a structured way. These files allow to generate an Elasticsearch index template, Kibana index pattern or documentation output out of it in an automated way.

The structure of the of each document looks as following:

```
- namespace: agent
title: Agent fields
level: 2
description: >
The agent fields contain all the data about the agent/client/shipper that collected / generated the events.
As an example in case of beats for logs this is `agent.name` is `filebeat`.
fields:
- name: version
type: keyword
description: >
Agent version.
example: 6.0.0-rc2
phase: 0
```

Each namespace has it's own file to keep the files itself small. Each namespace contains a list of fields which has all the fields inside. `title` and `description` are used to describe the namespace. `level` is for pure sorting purpose in the documentation output.

Each field under `fields` has first the field `name`. The `type` is the [Elasticsearch field type](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html). `description` is used to add details about the field itself. With `example` an example value can be provided. The `phase` field is used to indicate in which `phase` the current field is (more about this below). In case `phase` is left out, it defaults to 0.

## Phases

The goal of the phase value for each field is to indicate if a field is already part of the standard or not. Different phases exist to make it easy to contribute new fields but still be able to iterate on top of it. The phases are defined as following:

* 0 (alpha): The field is new and is up for discussion if it should be added. The field might be removed at any time again.
* 1 (beta): It's clear that there is value of having the field in ECS and discussions about naming / namespaces etc. started. It's unlikely that the field is removed again but naming might change at any time.
* 2 (rc): The field has been accepted and is unlikely to change. It is now tested in the field.
* 3 (GA): The field is part of ECS and breaking changes to it happen only on major releases.

## Links

* Foundation: https://docs.google.com/spreadsheets/d/1RUS-nwMLaU4U9YistexTCG3EcHzeY-L2p3V0iX0G9h0/edit#gid=1862780542
* Beats draft: https://docs.google.com/document/d/1pmzli3x33AQbhyqxpd024ggi3r9KuAfGBFDS4vYHWw0/edit
18 changes: 18 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
TODO

* Decide on schema name (Elastic Common Schema)
* Decide on naming rules
* Get in all schema fields
* Document example of field structure
* Document verified fields
* How can we link solutions and format
* Format should not know about solutions, but link it back
* Create solutions pages?
* Introduce phase key:
* 0: proposal for new field (default)
* 1: accepted as necessary field, figuring out naming
* 2: accepted as field, tesing in field, same should stay
* 3: verified ecs field. all breaking changes should only happen in major versions
* Goal of Phases: Make it easy to contribute new ideas and then iterate on top of it
* Describe fields.yml in README
* Level is only used for sorting, no logical meaning
82 changes: 82 additions & 0 deletions generate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import yaml
import os
import argparse

if __name__ == "__main__":

# Load schema files into yaml
files = os.listdir("schemas")
content = ""
for file in os.listdir("schemas"):
with open("schemas/" + file) as f:
content = content + f.read()

# Load all fields into object
fields = yaml.load(content)
sortedNamespaces = sorted(fields, key=lambda field: field["level"])

# Create markdown schema output file
output = open("schema.md", 'w')

for namespace in sortedNamespaces:
output.write("# " + namespace["title"] + "\n\n")

# Replaces one newlines with two as otherwise double newlines do not show up in markdown
output.write(namespace["description"].replace("\n", "\n\n") + "\n")

titles = ["Field", "Description", "Type", "Phase", "Example"]

for title in titles:
output.write("| {} ".format(title))
output.write("|\n")

for title in titles:
output.write("|---")
output.write("|\n")

# Sort fields for easier readability
namespaceFields = sorted(namespace["fields"], key=lambda field: field["name"])

# Print fields into a table
for field in namespaceFields:
description = ""
if 'description' in field.keys():
# Remove all spaces and newlines from beginning and end
description = field["description"].strip()

# Replace newlines with HTML representation as otherwise newlines don't work in Markdown
description = description.replace("\n", "<br/>")

example = ""
if 'example' in field.keys():
# Remove all spaces and newlines from beginning and end
example = field["example"].strip()

type = ""
if 'type' in field.keys():
# Remove all spaces and newlines from beginning and end
type = field["type"].strip()

field_name = field["name"]



# Prefix if not base namespace
if namespace["namespace"] != "base":
field_name = namespace["namespace"] + "." + field_name

# Verified and accepted fields are bold
verified = False
if 'verified' in field.keys() and field["verified"]:
field_name = "**" + field_name + "**"

phase = 0
if 'phase' in field.keys():
# Remove all spaces and newlines from beginning and end
phase = field["phase"]

output.write("| {} | {} | {} | {} | {} |\n".format(field_name, description, type, phase, example))

output.write("\n\n")

output.close()
52 changes: 52 additions & 0 deletions schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Base

The base namespace contains all fields from which are on the top level without a namespace. These are fields which are common across all types of events.


| Field | Description | Type | Phase | Example |
|---|---|---|---|---|
| id | Unique id to describe the event. | keyword | 1 | 8a4f500d |
| timestamp | Timestamp when the event was created.<br/>For log events this is expected to be when the event was generated and not when it was read. | date | 1 | 2016-05-23T08:05:34.853Z |


# Agent fields

The agent fields contain all the data about the agent/client/shipper that collected / generated the events.

As an example in case of beats for logs this is `agent.name` is `filebeat`.


| Field | Description | Type | Phase | Example |
|---|---|---|---|---|
| agent.id | Unqiue identifier of this agent if one exists.<br/>In the case of beats this would be beat.id. | keyword | 0 | 8a4f500d |
| agent.name | Agent name.<br/>Name of the agent. | keyword | 0 | filebeat |
| agent.version | Agent version. | keyword | 0 | 6.0.0-rc2 |


# Host fields

All fields related to a host. A host can be a physical machine, a virtual machine but also a docker container.

Normally the host information is related to the machine on which the event was generated / collected but also can be be used differently if needed.


| Field | Description | Type | Phase | Example |
|---|---|---|---|---|
| host.id | Unique host id.<br/>As hostname is not always unique, this often can be configured by the user. An example here is the current usage of `beat.name`. | keyword | 1 | |
| host.name | Name of the host | keyword | 1 | |
| host.timezone | Timezone of the host | date | 1 | |


# Elasticsearch fields

Common fields for Elasticsearch metrics and logs


| Field | Description | Type | Phase | Example |
|---|---|---|---|---|
| elasticsearch.cluster.id | Elasticsearch cluster id | keyword | 1 | |
| elasticsearch.cluster.name | Elasticsearch cluster name | keyword | 1 | |
| elasticsearch.node.name | Elasticsearch node name | keyword | 1 | |
| elasticsearch.node.version | Elasticsearch node version | keyword | 1 | |


28 changes: 28 additions & 0 deletions schemas/agent.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
- namespace: agent
title: Agent fields
level: 2
description: >
The agent fields contain all the data about the agent/client/shipper that collected / generated the events.
As an example in case of beats for logs this is `agent.name` is `filebeat`.
fields:
- name: version
type: keyword
description: >
Agent version.
example: 6.0.0-rc2
- name: name
type: keyword
description: >
Agent name.
Name of the agent.
example: filebeat
- name: id
type: keyword
description: >
Unqiue identifier of this agent if one exists.
In the case of beats this would be beat.id.
example: 8a4f500d
21 changes: 21 additions & 0 deletions schemas/base.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
- namespace: base
title: Base
level: 1
description: >
The base namespace contains all fields from which are on the top level without a namespace.
These are fields which are common across all types of events.
fields:
- name: id
type: keyword
description: >
Unique id to describe the event.
example: 8a4f500d
phase: 1
- name: timestamp
type: date
phase: 1
example: "2016-05-23T08:05:34.853Z"
description: >
Timestamp when the event was created.
For log events this is expected to be when the event was generated and not when it was read.
26 changes: 26 additions & 0 deletions schemas/elasticsearch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
- namespace: elasticsearch
title: Elasticsearch fields
level: 3
description: >
Common fields for Elasticsearch metrics and logs
fields:
- name: cluster.id
type: keyword
description: >
Elasticsearch cluster id
phase: 1
- name: cluster.name
type: keyword
description: >
Elasticsearch cluster name
phase: 1
- name: node.version
type: keyword
description: >
Elasticsearch node version
phase: 1
- name: node.name
type: keyword
description: >
Elasticsearch node name
phase: 1
27 changes: 27 additions & 0 deletions schemas/host.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
- namespace: host
title: Host fields
level: 2
description: >
All fields related to a host. A host can be a physical machine, a virtual machine but also a docker container.
Normally the host information is related to the machine on which the event was generated / collected but also can be
be used differently if needed.
fields:
- name: timezone
type: date
description: >
Timezone of the host
phase: 1
- name: name
type: keyword
description: >
Name of the host
phase: 1
- name: id
type: keyword
phase: 1
description: >
Unique host id.
As hostname is not always unique, this often can be configured by the user.
An example here is the current usage of `beat.name`.

0 comments on commit add5c46

Please sign in to comment.