-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a file lock to the data directory on startup to prevent multiple agents. #18483
Conversation
Pinging @elastic/ingest-management (Team:Ingest Management) |
@@ -42,7 +42,7 @@ func NewDownloader(config *artifact.Config) *Downloader { | |||
func (e *Downloader) Download(_ context.Context, programName, version string) (string, error) { | |||
// create a destination directory root/program | |||
destinationDir := filepath.Join(e.config.TargetDirectory, programName) | |||
if err := os.MkdirAll(destinationDir, os.ModeDir); err != nil { | |||
if err := os.MkdirAll(destinationDir, 0755); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a drive-by fix on Mac os.ModeDir
does not create the directory with the proper permissions. 0755
must be used.
💚 Build SucceededExpand to view the summary
Build stats
|
if err := locker.TryLock(); err != nil { | ||
return err | ||
} | ||
defer locker.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make sure this is handled even if we are killed. defer statements are skipped if SIGINT or SIGTERM are received and it can prevent us from restarting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified that this does get called in all the cases defined below with signals.
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGINT, syscall.SIGKILL, syscall.SIGTERM, syscall.SIGQUIT)
<-signals
So the defer does get called. I did find a bug in periodic that was preventing app.Start
from returning to catch the signals. I have fixed that in my most recent commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested using various combination of KILL, STOP, TERMINAL STOP ... looks ok
…agents. (elastic#18483) * Add a file lock to the data directory on startup to prevent multiple agents. * Add export comments to AppLocker. * Fix periodic to not block startup. (cherry picked from commit e1a4741)
…w-oss * upstream/master: (27 commits) Disable host fields for "cloud", panw, cef modules (elastic#18223) [docs] Rename monitoring collection from legacy internal collection to legacy collection (elastic#18504) Introduce auto detection of format (elastic#18095) Add additional fields to address issue elastic#18465 for googlecloud audit log (elastic#18472) Fix libbeat import path in seccomp policy template (elastic#18418) Address Okta input issue elastic#18530 (elastic#18534) [Ingest Manager] Avoid Chown on windows (elastic#18512) Fix Cisco ASA/FTD msgs that use a host name as NAT address (elastic#18376) [CI] Optimise stash/unstash performance (elastic#18473) Libbeat: Remove global loggers from libbeat/metric and libbeat/cloudid (elastic#18500) Fix PANW bad mapping of client/source and server/dest packets and bytes (elastic#18525) Add a file lock to the data directory on startup to prevent multiple agents. (elastic#18483) Followup to 12606 (elastic#18316) changed input from syslog to tcp/udp due to unsupported RFC (elastic#18447) Improve ECS field mappings in Sysmon module. (elastic#18381) [Elastic Agent] Cleaner output of inspect command (elastic#18405) [Elastic Agent] Pick up version from libbeat (elastic#18350) Update communitybeats.asciidoc (elastic#18470) [Metricbeat] Change visualization interval from 15m to >=15m (elastic#18466) docs: Fix typo in kerberos docs (elastic#18503) ...
What does this PR do?
Adds an
agent.lock
to thepath.data
directory.Why is it important?
Prevents the ability to run multiple agents on the same host.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesAuthor's Checklist
How to test this PR locally
Try to start two
elastic-agent
at the same time on the same host and see that the second one started errors out withanother elastic-agent is already running
.Related issues