Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

chore: double wait time for SSH #2066

Merged
merged 3 commits into from
Jan 28, 2022
Merged

Conversation

mdelapenya
Copy link
Contributor

It seems that for CentOS machines, SSH takes more than 1min to come

What does this PR do?

It doubles the timeout period when waiting for SSH to come after the AWS instance has been created, from 60 seconds to 120

Why is it important?

We have noticed that it takes more than 1 minute only for CentOS, and I've double checked that SSH is enabled for the AMI used for CentOS, launching it from AWS console manually

[2022-01-28T05:09:35.845Z] TASK [Wait for SSH to come up] *************************************************
[2022-01-28T05:10:43.578Z] failed: [localhost] (item={'id': 'i-0c532060aaefd5c30', 'ami_launch_index': '0', 'private_ip': '172.31.13.135', 'private_dns_name': 'ip-172-31-13-135.us-east-2.compute.internal', 'public_ip': '3.145.48.242', 'dns_name': 'ec2-3-145-48-242.us-east-2.compute.amazonaws.com', 'public_dns_name': 'ec2-3-145-48-242.us-east-2.compute.amazonaws.com', 'state_code': 16, 'architecture': 'x86_64', 'image_id': 'ami-045b0a05944af45c1', 'key_name': 'e2essh-893fd0ca', 'placement': 'us-east-2a', 'region': 'us-east-2', 'kernel': None, 'ramdisk': None, 'launch_time': '2022-01-28T05:09:29.000Z', 'instance_type': 'c5.4xlarge', 'root_device_type': 'ebs', 'root_device_name': '/dev/sda1', 'state': 'running', 'hypervisor': 'xen', 'tags': {'BuildURL': 'https://beats-ci.elastic.co/job/e2e-tests/job/e2e-testing-mbp/job/7.17/181/', 'Name': 'e2e-centos8_amd64-893fd0ca', 'ReaperMark': 'e2e-testing-vm', 'Kind': 'centos8_amd64', 'GitSHA': 'd43198bb010c072d7f3a32884f897321c2df5967'}, 'groups': {'sg-0235f273e64143f28': 'e2e'}, 'virtualization_type': 'hvm', 'ebs_optimized': True, 'block_device_mapping': {'/dev/sda1': {'status': 'attached', 'volume_id': 'vol-0510dfd02a6696361', 'delete_on_termination': True}, '/dev/xvda': {'status': 'attached', 'volume_id': 'vol-0ca940b8fef614abc', 'delete_on_termination': True}}, 'tenancy': 'default'}) => {"ansible_loop_var": "nodeItem", "changed": false, "elapsed": 60, "msg": "Timeout when waiting for 3.145.48.242:22", "nodeItem": {"ami_launch_index": "0", "architecture": "x86_64", "block_device_mapping": {"/dev/sda1": {"delete_on_termination": true, "status": "attached", "volume_id": "vol-0510dfd02a6696361"}, "/dev/xvda": {"delete_on_termination": true, "status": "attached", "volume_id": "vol-0ca940b8fef614abc"}}, "dns_name": "ec2-3-145-48-242.us-east-2.compute.amazonaws.com", "ebs_optimized": true, "groups": {"sg-0235f273e64143f28": "e2e"}, "hypervisor": "xen", "id": "i-0c532060aaefd5c30", "image_id": "ami-045b0a05944af45c1", "instance_type": "c5.4xlarge", "kernel": null, "key_name": "e2essh-893fd0ca", "launch_time": "2022-01-28T05:09:29.000Z", "placement": "us-east-2a", "private_dns_name": "ip-172-31-13-135.us-east-2.compute.internal", "private_ip": "172.31.13.135", "public_dns_name": "ec2-3-145-48-242.us-east-2.compute.amazonaws.com", "public_ip": "3.145.48.242", "ramdisk": null, "region": "us-east-2", "root_device_name": "/dev/sda1", "root_device_type": "ebs", "state": "running", "state_code": 16, "tags": {"BuildURL": "https://beats-ci.elastic.co/job/e2e-tests/job/e2e-testing-mbp/job/7.17/181/", "GitSHA": "d43198bb010c072d7f3a32884f897321c2df5967", "Kind": "centos8_amd64", "Name": "e2e-centos8_amd64-893fd0ca", "ReaperMark": "e2e-testing-vm"}, "tenancy": "default", "virtualization_type": "hvm"}}

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have run the Unit tests (make unit-test), and they are passing locally
  • I have run the End-2-End tests for the suite I'm working on, and they are passing locally
  • I have noticed new Go dependencies (run make notice in the proper directory)

Author's Checklist

  • An instance using CentOS AMI has been manually created and SSH comes in time

It seems that for CentOS machines, SSH takes more than 1min to come
@mdelapenya mdelapenya added backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify labels Jan 28, 2022
@mdelapenya mdelapenya self-assigned this Jan 28, 2022
@mdelapenya mdelapenya requested a review from a team January 28, 2022 10:03
@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 28, 2022

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Genuine test errors 66

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Initializing / End-To-End Tests / helm_debian_amd64_filebeat / [empty] – TEST-x86_64-helm-71df4378-2022-01-28-12:07:13.xml
  • Name: Initializing / End-To-End Tests / helm_debian_amd64_metricbeat / [empty] – TEST-x86_64-helm-d4de1488-2022-01-28-12:07:29.xml
  • Name: Initializing / End-To-End Tests / helm_debian_amd64_apm-server / [empty] – TEST-x86_64-helm-9df470a9-2022-01-28-12:08:06.xml
  • Name: Initializing / End-To-End Tests / fleet_sles15_linux_integration / Adding the Linux Integration to an Agent ... – Linux Integration
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_linux_integration / Adding the Linux Integration to an Agent ... – Linux Integration
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_linux_integration / Adding the Linux Integration to an Agent ... – Linux Integration
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #1 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #1 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #2 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #2 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_sles15_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #1 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_sles15_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #1 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_sles15_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #2 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_sles15_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #2 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #1 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #1 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of filebeat #2 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_running_on_beats / Deploying the Elastic-Agent with enroll and then run on top of metricbeat #2 – Running on top of Beats
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Deploying the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Restarting the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Deploying the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Restarting the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Deploying the agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Stopping the agent stops backend processes – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Restarting the installed agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Un-enrolling the agent stops backend processes – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Re-enrolling the agent starts the elastic-agent process – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Un-installing the installed agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Un-enrolling Elastic Agent stops Elastic Endpoint – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_amd64_backend_processes / Removing Endpoint from Agent policy stops the connected Endpoint – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Deploying the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Restarting the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_sles15_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Deploying the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Restarting the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-enrolling the agent deactivates the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Re-enrolling the agent activates the agent in Fleet – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Revoking the enrollment token for the agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_centos8_amd64_fleet_mode_agent / Un-installing the installed agent – Fleet Mode Agent
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Deploying the agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Stopping the agent stops backend processes – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Restarting the installed agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Un-enrolling the agent stops backend processes – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Re-enrolling the agent starts the elastic-agent process – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Un-installing the installed agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Un-enrolling Elastic Agent stops Elastic Endpoint – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_sles15_backend_processes / Removing Endpoint from Agent policy stops the connected Endpoint – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Deploying the agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Stopping the agent stops backend processes – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Restarting the installed agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Un-enrolling the agent stops backend processes – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Re-enrolling the agent starts the elastic-agent process – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Un-installing the installed agent – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Un-enrolling Elastic Agent stops Elastic Endpoint – Backend Processes
  • Name: Initializing / End-To-End Tests / fleet_debian_arm64_backend_processes / Removing Endpoint from Agent policy stops the connected Endpoint – Backend Processes

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@mdelapenya mdelapenya changed the title chore: dible wait time for SSH chore: double wait time for SSH Jan 28, 2022
@mdelapenya
Copy link
Contributor Author

Mmm, interestingly, SSH came but it did thanks to the retry, not to Ansible's wait_for timeout

@elastic/observablt-robots do you see anything weird here?

@mdelapenya
Copy link
Contributor Author

BTW @v1v we could enable the ansible otel plugin for this deployment!

@v1v
Copy link
Member

v1v commented Jan 28, 2022

Mmm, interestingly, SSH came but it did thanks to the retry, not to Ansible's wait_for timeout

If you use the multiline format in the ansible tasks, does it work?

wait_for:
  host: {{ inventory_hostname }}
  port: 22
  delay: 10
  timeout: 120

BTW @v1v we could enable the ansible otel plugin for this deployment!

The below snippet within the withOtelEnv step context:

ansible-galaxy collection install community.general
ansible-playbook ...

in addition to a change in the ansible.cfg:

[defaults]
callbacks_enabled = community.general.opentelemetry

And it should just work

@mdelapenya
Copy link
Contributor Author

@v1v I tried the multiline definition, but I can see in the Jenkins logs that it takes around 1 min to execute the task: 12:05 > 12:06

[2022-01-28T12:05:26.497Z] TASK [Wait for SSH to come up] *************************************************
[2022-01-28T12:06:34.241Z] failed: [localhost] (item={'id': 'i-03a31a37d77fe9c7c', 'ami_launch_index': '0', 'private_ip': '172.31.12.193', 'private_dns_name': 'ip-172-31-12-193.us-east-2.compute.internal', 'public_ip': '18.224.73.233', 'dns_name':

@mdelapenya
Copy link
Contributor Author

The errors are coming from the latest beat snapshot, not from the SSH access. I'd say this is ready to go.

@mdelapenya mdelapenya merged commit a176d63 into elastic:main Jan 28, 2022
mergify bot pushed a commit that referenced this pull request Jan 28, 2022
* chore: dible wait time for SSH

It seems that for CentOS machines, SSH takes more than 1min to come

* chore: use multiline

* chore: include OpenTelemetry traces for Ansible

(cherry picked from commit a176d63)
mergify bot pushed a commit that referenced this pull request Jan 28, 2022
* chore: dible wait time for SSH

It seems that for CentOS machines, SSH takes more than 1min to come

* chore: use multiline

* chore: include OpenTelemetry traces for Ansible

(cherry picked from commit a176d63)
mergify bot pushed a commit that referenced this pull request Jan 28, 2022
* chore: dible wait time for SSH

It seems that for CentOS machines, SSH takes more than 1min to come

* chore: use multiline

* chore: include OpenTelemetry traces for Ansible

(cherry picked from commit a176d63)
mdelapenya added a commit to mdelapenya/e2e-testing that referenced this pull request Jan 28, 2022
* main:
  chore: double wait time for SSH (elastic#2066)
mdelapenya added a commit that referenced this pull request Jan 28, 2022
* chore: dible wait time for SSH

It seems that for CentOS machines, SSH takes more than 1min to come

* chore: use multiline

* chore: include OpenTelemetry traces for Ansible

(cherry picked from commit a176d63)

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
mdelapenya added a commit that referenced this pull request Jan 28, 2022
* chore: dible wait time for SSH

It seems that for CentOS machines, SSH takes more than 1min to come

* chore: use multiline

* chore: include OpenTelemetry traces for Ansible

(cherry picked from commit a176d63)

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
mdelapenya added a commit that referenced this pull request Jan 28, 2022
* chore: dible wait time for SSH

It seems that for CentOS machines, SSH takes more than 1min to come

* chore: use multiline

* chore: include OpenTelemetry traces for Ansible

(cherry picked from commit a176d63)

Co-authored-by: Manuel de la Peña <mdelapenya@gmail.com>
@mdelapenya mdelapenya deleted the fix-ssh-timeout branch March 9, 2022 06:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport-v7.16.0 Automated backport with mergify backport-v7.17.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants