Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector validate causes the dnstap socket file to get removed & recreated, preventing further connections #19064

Open
james-stevens opened this issue Nov 6, 2023 · 5 comments
Labels
source: dnstap Anything `dnstap` source related type: bug A code related bug.

Comments

@james-stevens
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When you run vector validate the unix socket file specified for a dnstap source will get removed & recreated, so the running instance of vector is now no longer able to accept connections on that socket name until you restart it.

This does not happen for vector's syslog unix socket files.

The file will also get recreated as owned by the user that ran the validation, e.g. root:root, instead of the user that vector is running under (vector:vector by default in the supplied RPM).

Only tested on RHEL9 using the supplied RPM.

We regularly run server validations which includes running vector validate, we also run vector validate as part of a standard ansible run, even when vector's config file has not changed. This causes our DNS software to be no longer able to connect to the dnstap source and so we are no longer able to record DNS data.

We are seeing this issue with both unbound and dnsdist. unbound will actually syslog Connection refused for the new sock file, but dnsdist doesn't log any error.

Although this is quite a minor issue, it's quite annoying.

Incredibly minor, but also a little annoying, is

syslog.path
syslog.socket_file_mode

but

dnstap.socket_path
dnstap.socket_file_mode

socket_path for both would be a little nicer ;)
(told you it was incredibly minor)

Configuration

Can be reproduced using `unbound` and the config in this issue

https://github.com/vectordotdev/vector/issues/18854

-----------------------------------------------------------
  dnstap:
    type: dnstap
    socket_path: /var/lib/vector/dnstap.sock
    socket_file_mode: 0o777
-----------------------------------------------------------

Version

vector 0.33.1 (x86_64-unknown-linux-gnu 3cc27b9 2023-10-30 16:50:49.747931844)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@james-stevens james-stevens added the type: bug A code related bug. label Nov 6, 2023
@james-stevens
Copy link
Author

james-stevens commented Nov 7, 2023

NOTE: this bug only happens if the vector validate is run by a user that has permission to remove the sock file. As a systemctl restart vector requires root, I would normally run the vector validate (in ansible) as root

To reproduce this & see it happening you can just run vector (on its own) with a dnstap source. Then

  1. run ls -li on the sock file & note the inode number in the first column
  2. run vector validate
  3. repeat (1) and note that the inode number has changed - if you run the validate as a different user than the running vector, then the ownership of the sock will also change, e.g. to root:root.

To see it actually causing a problem

  1. recreate the config with unbound in In v0.33.0 dnstap can no longer parse DNS records with DNSSEC/RRSIG RRs #18854
  2. run vector validate
  3. restart unbound - unbound will work just fine, but you should now see Connection refused errors logged by unbound and it won't be able to record query & response data to the dnstap.

vector validate should be non-destructive, really - if I'm editing the config and want to check my changes before restarting vector, I don't want/expect the functionality of the currently running vector to be destroyed by the config file check.

Unfortunately, for the application I am working on, this means vector is OK to use in development (so long as you are aware of this bug), but if deployed in production with this bug we would probably see all sorts of unexpected failures to capture the DNS query & response data, which would put us in breach of contract.

@StephenWakely
Copy link
Contributor

StephenWakely commented Nov 7, 2023

Yes, I agree this could get annoying!

It's likely this could be fixed my moving the code that binds to the socket into the async block at the end. This is how the syslog source works, the two should be consistent.

In the meantime, you may find the --no-environment flag useful. https://vector.dev/docs/reference/cli/#validate

@StephenWakely
Copy link
Contributor

I've pulled the config option naming out into a separate issue #19074.

Whilst it may seem minor, details matter!

@james-stevens
Copy link
Author

I've pulled the config option naming out into a separate issue #19074.

Whilst it may seem minor, details matter!

I didn't have the heart to create a separate issue for it, it seemed so minor. thank you.

@james-stevens
Copy link
Author

While you're playing with copying code from syslog to dnstap, any chance this could get some attention?

#15203

Seems to me all it needs is the full socket code copied from syslog to dnstap

@dsmith3197 dsmith3197 added the source: dnstap Anything `dnstap` source related label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: dnstap Anything `dnstap` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants