-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Improve agent observability #78188
Comments
@mukeshelastic I filed this design issue for planning purposes. Please review and update as desired. |
@mostlyjason it says here "...Potential scope, PM will need to better define it..." when do you think this issue will be ready to be picked up? |
@mukeshelastic is the PM lead for this issue so I'll defer to him. I believe some parts are ready such as including the logstream component on the agent details page #77189 |
@hbharding and I discussed the two buckets in which we will need design support:
|
Small update: per @mukeshelastic + @ravikesarwani, we want to scope the initial work for this ticket in #81872 and treat this issue more as an ongoing epic that will extend beyond 7.11. |
Pinging @elastic/fleet (Team:Fleet) |
We had an offline conversation with @joshdover around improvements. There is a noticeable amount of SDH issues coming, which end up with a root cause, or one of the possible causes, like proxy connectivity issues. The customer has to dive into logs to figure out if the used proxy operates properly (whether connections are established, no 503s, etc.). I believe we could more proactive and verify the connectivity between Agent and Elasticsearch, Agent and Fleet Server. I was thinking about a special technical policy first to verify all connections and settings, but maybe we can start with picking up the It would definitely help with researching customer problems ("Has your proxy ever worked?" vs "Is there an proxy outage now?"). |
Summary of the problem
We'd like to improve the observability for agents so that operators have better insights into problems and have enough information to troubleshoot and fix them in a timely manner. Additionally, the most insight we can share with users to fix issues on their own, the less often they will get stuck and need to file a support issue.
Potential scope, PM will need to better define it:
User stories*
List known (technical) restrictions and requirements
Other
PM Lead @mukeshelastic
Design lead @hbharding
Collaborators @mostlyjason
The text was updated successfully, but these errors were encountered: