Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create syslog receiver to run as daemon process #238

Open
ThomasLohner opened this issue Feb 13, 2022 · 1 comment
Open

Create syslog receiver to run as daemon process #238

ThomasLohner opened this issue Feb 13, 2022 · 1 comment
Assignees

Comments

@ThomasLohner
Copy link
Member

Logexplorer stores data in clickhouse database. Sending messages to Logexplorer is very easy via the REST API but this does not scale very well. Syslog protocol scales much better and is non-blocking when used in udp instead of tcp. This is perfect for applications so they don't suffer from an outage of Logexplorer.

We will create a syslog receiver that buffers messages and writes them in bulk inserts into clickhouse. Checkout the syslog protcol description here: https://datatracker.ietf.org/doc/html/rfc5424

The daemon should be implemented in Swoole (https://openswoole.com) for maximum performance.

The TAG in the syslog message matches the table name in clickhouse. Messages can either be json, or a string which is then parsed via a GROK pattern. For this we need to find a php implementation of GROK patterns.

Config per TAG / Table:

tag: <name of clickhouse table>

type: json or grok

pattern (only if type=grok): <grok pattern>
@nhatdo-nfq nhatdo-nfq self-assigned this Feb 14, 2022
@nhatdo-nfq nhatdo-nfq assigned ThomasLohner and unassigned nhatdo-nfq Mar 1, 2022
@ThomasLohner
Copy link
Member Author

I have tested this ticket and i think we need some more improvements before this is production ready:

Change syslog pattern
In rfc 5424 there is STRUCTURED-DATA, so we must assume that some syslog-clients will send this. New pattern should be:

<%{POSINT:pri}>%{POSINT:version} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{USERNAME:table_name} %{USERNAME:proc_id} %{USERNAME:app_name} (\[%{DATA:structured_data}\]|\-) %{GREEDYDATA:message}

Invalid JSON crashes server process
If an invalid JSON is sent the server process just dies. To reproduce:

docker-compose exec php logger --udp --server api --port 9506 --tag {table_name} '{"text":"hello", FOO}'

Ignore unkown fileds in message
If the message contains less fields than the clickhouse table it will still be written to the table. This is correct behavior. But if the message contains fields that are missing in the table then nothing is written to clickhouse. To fix this, we need to load table structure on server start and compare this to the message before executing clickhouse query.

Make Timestamp optional
If there is no timestamp in the data part of the message then use timestamp from syslog header.

Use Tag or App-Name for table detection
Some applications like nginx don't allow to set a custom syslog tag but they will send a syslog app-name. So we need to first check for syslog tag and if this is empty then use app-name to extract the table name from the message.

Verbose logging
Add a config option to have some verbose logging to stdout. It's much easier to debug if you can see which message string was received and if parsing has worked (grok or json)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants