-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Jenkins alerts and reporting #3088
Comments
I have reconsidered this issue and, drawing from the Security WG's experience in implementing the OpenSSF Scorecard Monitor, I believe we can adopt a similar approach. We can create a Github Action that parses the inventory file, extracts the IPs, and attempts to ping or SSH into the machines (in the future). The output will be stored as a markdown file (similar to this one), making it easy to identify which machines are UP/DOWN. We can even automatically generate new issues (similar to this one) when a machine becomes unreachable. This process can be initiated on demand and/or scheduled as a daily CRON job. |
I created this Github Action Jenkins status alerts and reporting in the marketplace based on my last message idea. I still need to do some work to fine tune details like unit testing, but in general is already stable and we can use it. What this Action can do for us?
Additional features:
Setup proposal I will need to create a new github action pipeline in name: "Jenkins Nodes"
on:
workflow_dispatch:
permissions:
contents: write
pull-requests: none
issues: write
packages: none
jobs:
security-scoring:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Jenkins Alert and Reporting
uses: UlisesGascon/jenkins-status-alerts-and-reporting@v1.0.0
id: jenkins-status-alerts-and-reporting
with:
database: experimental/database.json
jenkins-domain: 'ci.nodejs.org'
jenkins-username: ${{ secrets.JENKINS_USERNAME }}
jenkins-token: ${{ secrets.JENKINS_TOKEN }}
# Issues
generate-issue: true
issue-assignees: 'UlisesGascon'
issue-labels: 'incident,infra'
create-issues-for-new-offline-nodes: false
# Report
report: experimental/jenkins-report.md
report-tags-enabled: true
# Git changes
auto-commit: true
auto-push: true
github-token: ${{ secrets.GITHUB_TOKEN }} This setup will require to generate a Jekins API token and include it in the repo settings ( Next steps I will love to try the tool in the Build team and collect feedback to improve the Github Action for the next release as we did for OpenSSF Scorecard Monitor in the Security WG when we adopted the tool (nodejs/security-wg#886) There is an opportunity also to evolve the tool to create tickets when nodes disk usage is high, to give us time to fix them before they go offline. What do think @nodejs/build ? Should we try it? Do we want to wait to discuss it in the next meeting? |
As agreed in #3299 I will transfer the alerts demo repository to the Node.js Org. I made a rename and I will do a separate PR once is migrated to clean up the |
The migration seems to be completed in https://github.com/nodejs/jenkins-alerts |
I believe that we will need some kind of settings change in order to make the @nodejs/build team owner of the repo with the expected write access and so on. |
We should probably have followed the steps in https://github.com/nodejs/admin/blob/main/transfer-repo-into-the-org.md before doing the transfer. Maybe open an issue in the admin repo explaining it was suggested transferring the repo into the org during the Build WG call and detailing whatever needs to happen next? |
Thank you for bringing this to my attention, @richardlau. I did not read the documentation before submitting the transfer request, and I also mistakenly believed that the transfer would not be automatic 🤦. I will create an issue in the Admin repository to clarify the transfer process and outline the expected future for this tool. |
I added the build team to the repo with the "Maintain" role. |
Next steps, as agreed in #3362 :
|
@richardlau can you grant me access? I requested the Github integration access between the This will push the notifications to the |
@UlisesGascon We probably should run that by https://github.com/nodejs/admin |
Thanks for the suggestion @richardlau! I moved the discussion to nodejs/admin#799 |
As the pending items in #3088 (comment) are completed. I will close the issue. 🎉 |
I was checking some issues regarding down machines(#3083, #3084...) and I thought that maybe we can implement a little dashboard in Grafana to check the machine status (ping + latencies) maybe SSH connectivity in the future and trigger alerts (if we want).
I created this POC repo that parsers the current inventory (excluding localhost IPs, etc..) and generate a local dockerized environment (Telegraf + influxDb + Grafana). It is just a fast raw prototype to illustrate the idea.
I saw in #3084 that we use the same stack, so it won't be very complex to adapt. What do you think? Should we work on it? Are there other alternatives like Jenkins-status that cover this gap currently?
The text was updated successfully, but these errors were encountered: