-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate the Installation assistant test pipelines to GHA #20
Comments
Update ReportInvestigationI have been investigating the mentioned Jenkins pipeline, and we decided to migrate the Related to the rest of the tests:
|
Update Report
|
Update ReportTest installation assistant workflow
These steps have been tested here: https://github.com/wazuh/wazuh-installation-assistant/actions/runs/10576987236/job/29303972255 Currently adding the necessary steps to execute the
Problem with CentOS 8 🔴Caution It seems that the CentOS 8 allocator VM does not have python installed. This is necessary to execute Ansible playbooks. We need to determine if we add the CentOS 8 AMI which was being used in the old Jenkins pipeline, or if we update the specified CentOS 8 AMI with another with Python installed. The reported error is the following: fatal: [ec2-3-95-210-126.compute-1.amazonaws.com]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"failed": true, "module_stderr": "OpenSSH_8.9p1 Ubuntu-3ubuntu0.10, OpenSSL 3.0.2 15 Mar 2022\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files\r\ndebug1: /etc/ssh/ssh_config line 21: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: mux_client_request_session: master session id: 2\r\nShared connection to ec2-3-95-210-126.compute-1.amazonaws.com closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}}, "msg": "The following modules failed to execute: ansible.legacy.setup\n"} Problem with CentOS 8 - Solved 🟢After some time debugging, I found the way to install Python in the remote machine (CentOS 8) before executing the playbook. The commit with the changes is: eae9a6c The workaround is:
Important By this way, it is not needed to set the Python interpreter before executing the playbook; it is not needed to change the CentOS 8 Allocator AMI; and it's a very useful approach to install packages before running the playbook. The changes were tested in the following runs:
|
Update ReportThe problem was that, in case of Amazon Linux 2, the package manager could not be fetched, and then, the YUM module could not be used, because internally it used the package manager variable (a variable that defines which package manager uses the system): fatal: [ec2-44-212-66-219.compute-1.amazonaws.com]: FAILED! => {"changed": false, "msg": "Could not find a matching action for the \"unknown\" package manager."} Caution The YUM and the APT modules were not the only ones affected. Some Ansible variables such as the After many failed attempts to solve this, the provisional solution I came up with was installing a specific version of Ansible core. The mentioned problem was provoked with the deprecation of the YUM module and YUM action here: https://github.com/ansible/ansible/blob/stable-2.17/changelogs/CHANGELOG-v2.17.rst#removed-features-previously-deprecated. Then, the approach was to change the Ansible installation in the GH runner. See this commit. ✔️ Now, Ansible is installed using - name: Install Ansible
run: sudo apt-get update && sudo apt install -y python3 && python3 -m pip install --user ansible-core==2.16 With this change, the two use cases (running in Ubuntu 22 and running in AL2) succeeded:
Warning We are aware that installing a specific version of Ansible core is not the best solution because we highly depend on that version. However, due to the migration's urgency, we will take this approach as a provisional solution until we find a better solution or, on the other hand, we may rework these GHAs in 5.0. |
On hold due to: #44 |
Update ReportProblem with Allocator VM deletionI encountered a problem when setting the final conditional of the Allocator VM deletion. The original task was: - name: Delete allocated VM
if: always() && steps.allocator_instance.outcome == 'success' && inputs.DESTROY == 'true'
run: python3 wazuh-automation/deployability/modules/allocation/main.py --action delete --track-output /tmp/allocator_instance/track.yml This task was always not executed. After many hours debugging, I found that the
✔️ Changing Here I have tested with the condition that I have passed the possible scenarios that have occurred to me:
Cancelation cases:
Next stepsNow, I have to work on the following:
TASK [Install assistant installer] *********************************************
task path: /home/runner/work/wazuh-installation-assistant/wazuh-installation-assistant/.github/workflows/ansible-playbooks/aio.yml:27
<ec2-3-83-65-190.compute-1.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<ec2-3-83-65-190.compute-1.amazonaws.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=2200 -o 'IdentityFile="/tmp/allocator_instance/gha_10664819701_assistant_test-5903/gha_10664819701_assistant_test-key-2194"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o 'ControlPath="/home/runner/.ansible/cp/285abfb711"' ec2-3-83-65-190.compute-1.amazonaws.com '/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''
<ec2-3-83-65-190.compute-1.amazonaws.com> (0, b'/home/ubuntu\n', b'')
<ec2-3-83-65-190.compute-1.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<ec2-3-83-65-190.compute-1.amazonaws.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=2200 -o 'IdentityFile="/tmp/allocator_instance/gha_10664819701_assistant_test-5903/gha_10664819701_assistant_test-key-2194"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o 'ControlPath="/home/runner/.ansible/cp/285abfb711"' ec2-3-83-65-190.compute-1.amazonaws.com '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/ubuntu/.ansible/tmp `"&& mkdir "` echo /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821 `" && echo ansible-tmp-1725270564.1284177-2850-24144868290821="` echo /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821 `" ) && sleep 0'"'"''
<ec2-3-83-65-190.compute-1.amazonaws.com> (0, b'ansible-tmp-1725270564.1284177-2850-24144868290821=/home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821\n', b'')
Using module file /home/runner/.local/lib/python3.10/site-packages/ansible/modules/command.py
<ec2-3-83-65-190.compute-1.amazonaws.com> PUT /home/runner/.ansible/tmp/ansible-local-2821wja4gasf/tmpl5cix_ik TO /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821/AnsiballZ_command.py
<ec2-3-83-65-190.compute-1.amazonaws.com> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o Port=2200 -o 'IdentityFile="/tmp/allocator_instance/gha_10664819701_assistant_test-5903/gha_10664819701_assistant_test-key-2194"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o 'ControlPath="/home/runner/.ansible/cp/285abfb711"' '[ec2-3-83-65-190.compute-1.amazonaws.com]'
<ec2-3-83-65-190.compute-1.amazonaws.com> (0, b'sftp> put /home/runner/.ansible/tmp/ansible-local-2821wja4gasf/tmpl5cix_ik /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821/AnsiballZ_command.py\n', b'')
<ec2-3-83-65-190.compute-1.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<ec2-3-83-65-190.compute-1.amazonaws.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=2200 -o 'IdentityFile="/tmp/allocator_instance/gha_10664819701_assistant_test-5903/gha_10664819701_assistant_test-key-2194"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o 'ControlPath="/home/runner/.ansible/cp/285abfb711"' ec2-3-83-65-190.compute-1.amazonaws.com '/bin/sh -c '"'"'chmod u+x /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-[241](https://github.com/wazuh/wazuh-installation-assistant/actions/runs/10664819701/job/29556874659#step:12:242)44868290821/ /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821/AnsiballZ_command.py && sleep 0'"'"''
<ec2-3-83-65-190.compute-1.amazonaws.com> (0, b'', b'')
<ec2-3-83-65-190.compute-1.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<ec2-3-83-65-190.compute-1.amazonaws.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=2200 -o 'IdentityFile="/tmp/allocator_instance/gha_10664819701_assistant_test-5903/gha_10664819701_assistant_test-key-2194"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o 'ControlPath="/home/runner/.ansible/cp/285abfb711"' -tt ec2-3-83-65-190.compute-1.amazonaws.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-wscnucwrcojkmeiydxxxyssdwcaapjnp ; /usr/bin/python3 /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821/AnsiballZ_command.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded
<ec2-3-83-65-190.compute-1.amazonaws.com> (255, b'', b'Shared connection to ec2-3-83-65-190.compute-1.amazonaws.com closed.\r\n')
<ec2-3-83-65-190.compute-1.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
<ec2-3-83-65-190.compute-1.amazonaws.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o Port=2200 -o 'IdentityFile="/tmp/allocator_instance/gha_10664819701_assistant_test-5903/gha_10664819701_assistant_test-key-2194"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o 'ControlPath="/home/runner/.ansible/cp/285abfb711"' ec2-3-83-65-190.compute-1.amazonaws.com '/bin/sh -c '"'"'rm -f -r /home/ubuntu/.ansible/tmp/ansible-tmp-1725270564.1284177-2850-24144868290821/ > /dev/null 2>&1 && sleep 0'"'"''
<ec2-3-83-65-190.compute-1.amazonaws.com> (0, b'', b'')
fatal: [ec2-3-83-65-190.compute-1.amazonaws.com]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: Shared connection to ec2-3-83-65-190.compute-1.amazonaws.com closed.",
"unreachable": true
}
|
Update ReportProgressesAllocator VM uploadI have developed the upload artifact logic with the following code: - name: Compress Allocator directory
id: compress_allocator_files
if: always() && steps.allocator_instance.outcome == 'success' && inputs.DESTROY == false
run: |
zip -P "${{ secrets.ZIP_ARTIFACTS_PASSWORD }}" -r $ALLOCATOR_PATH.zip $ALLOCATOR_PATH
- name: Upload Allocator directory as artifact
if: always() && steps.compress_allocator_files.outcome == 'success' && inputs.DESTROY == false
uses: actions/upload-artifact@v4
with:
name: allocator-instance
path: ${{ env.ALLOCATOR_PATH }}.zip The new development was tested:
SSH connection problemThe SSH connection problem was solved by modifying the Ansible playbook. It seems that, if the playbook takes a considerable time to execute, Ansible closes the connection. This was solved with the following code: - name: Install assistant installer
command: "bash {{ script_name }} -a -v"
args:
chdir: "{{ script_path }}"
register: install_results
async: 500
poll: 5 The new development was tested in the OSs where the playbooks was failing: |
Update ReportThe current state of the migration of the
TASK [Gather facts] ************************************************************
fatal: [ec2-18-204-213-215.compute-1.amazonaws.com]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "msg": "ansible-core requires a minimum of Python2 version 2.7 or Python3 version 3.6. Current version: 3.5.2 (default, Jul 10 2019, 11:58:48) [GCC 5.4.0 20160609]"}
TASK [Set up Python 3.9 repository] ********************************************
fatal: [ec2-44-212-63-189.compute-1.amazonaws.com]: FAILED! => {"changed": false, "msg": "failed to fetch PPA information, error was: Connection failure: The read operation timed out"} |
Update ReportUbuntu 20 problem ✔️As the Python 3.9 version was only being installed in Ubuntu Jammy (22.04) and the repository to install Python 3.9 was being added in every Ubuntu distribution, I changed the conditional of the repository addition to be only added in Ubuntu Jammy. These tasks were grouped in a Ubuntu 18 problem ✔️Ubuntu 18 (Bionic) presented another problem related to pip installation: "ERROR: This script does not work on Python 3.6. The minimum supported Python version is 3.8. Please use https://bootstrap.pypa.io/pip/3.6/get-pip.py instead.", "stdout_lines": ["ERROR: This script does not work on Python 3.6. The minimum supported Python version is 3.8. Please use https://bootstrap.pypa.io/pip/3.6/get-pip.py instead." Then, I tried to change the link as the error mentioned and the result was another error: root@ip-172-31-85-170:/home/ubuntu# curl https://bootstrap.pypa.io/pip/3.6/get-pip.py | python3 -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2108k 100 2108k 0 0 39.6M 0 --:--:-- --:--:-- --:--:-- 39.6M
Traceback (most recent call last):
File "<stdin>", line 27079, in <module>
File "<stdin>", line 137, in main
File "<stdin>", line 113, in bootstrap
File "<stdin>", line 94, in monkeypatch_for_cert
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/cli/base_command.py", line 13, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/cli/cmdoptions.py", line 23, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/cli/parser.py", line 12, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/configuration.py", line 26, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/utils/logging.py", line 13, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/utils/misc.py", line 40, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/locations/__init__.py", line 14, in <module>
File "/tmp/tmphpdgbd10/pip.zip/pip/_internal/locations/_distutils.py", line 9, in <module>
ModuleNotFoundError: No module named 'distutils.cmd' Installing the Ubuntu 16 problem ✔️In Ubuntu 16, Python 3.6 is needed. Before, we could use the
Also, it was needed to create a link to |
Update Report - Tier assistant testI started to develop the tier of the assistant test. This worklfow will launch several runs of the assistant test workflow: #46 Some things to highlight:
In a first test, the workflow has been executed correctly, parsing the operating systems and launching different instances of the assistant workflow: https://github.com/wazuh/wazuh-installation-assistant/actions/runs/10696903344/job/29653020167 This execution has launched the following executions:
Note The assistant test worklfows have failed because they are using a branch based on 4.10.0, and there are no 4.10.0 packages yet to install. Now working on the tier waits the child runs. |
After talking with the team leaders, we are going to change the approach. We want to test if it's possible to unify the |
Update ReportThe jobs:
run-test:
runs-on: ubuntu-latest
strategy:
fail-fast: false # If a job fails, the rest of jobs will not be cancelled
matrix:
system: ${{ fromJson(inputs.SYSTEMS) }} Then, the steps given below will be executed in different jobs, where each job will execute the test with one of the selected OS. |
Update Report - DistributedCurrently working on the distributed pipeline migration. |
Update Report - DistributedJenkins pipeline explanationWe have decided to rework the distributed pipeline. The Jenkins pipeline performed the following installation:
This is a weird scenario, where the distributed installation is being checked, but we think this is not a real scenario to test it. Instead, we want to rework this test, taking into account that this would be done in 5.0.0. Also, we are doing this in order to avoid the effort and time of the development it would take to migrate exactly the old Jenkins pipeline, when it seems not to be very practical. Possible workaroundThe proposed solution is to rework this pipeline. If we want to test the distributed installation using the assistant, we will perform the steps given in the Wazuh indexer,, Wazuh server and Wazuh dashboard documentation in this kind of installation. The infrastructure would be the following:
Important This scenario does not test the connectivity between instances of different OSs. We should review if we also want to test this scenario or not. Thus, the workflow will be the following task:
|
This scenario will not be tested for now. |
Update Report - DistributedThe task I have performed are the following:
Allocating instances ✔️Related to instance allocation and destruction, these tasks have been parallelized to provide and destroy the machines simultaneously. This makes the workflow consume less time in these tasks. # Provision instance in parallel
(
python3 wazuh-automation/deployability/modules/allocation/main.py \
--action create --provider aws --size large \
--composite-name ${{ env.COMPOSITE_NAME }} \
--working-dir $ALLOCATOR_PATH --track-output $ALLOCATOR_PATH/track_${instance_name}.yml \
--inventory-output $ALLOCATOR_PATH/inventory_${instance_name}.yml \
--instance-name gha_${{ github.run_id }}_${{ env.TEST_NAME }}_${instance_name} --label-team devops --label-termination-date 1d
...
) &
done
# Wait for all provisioning tasks to complete
wait Note This strategy can be used in other GHAs where we need to deploy several instances using the allocator. Generating the certficiates ✔️Related to the certificates generation, this was done using a Jinja2 template, similar to the old Jenkins pipeline due to the complexity of building the The Jinja2 template: nodes:
# Wazuh indexer nodes
indexer:
{% for indexer in groups['indexers'] %}
- name: {{ hostvars[indexer]['inventory_hostname'] }}
ip: "{{ hostvars[indexer]['private_ip'] }}"
{% endfor %}
server:
{% for manager in groups['managers'] %}
- name: {{ hostvars[manager]['inventory_hostname'] }}
ip: "{{ hostvars[manager]['private_ip'] }}"
node_type: "{{ hostvars[manager]['manager_type'] }}"
{% endfor %}
dashboard:
{% for dashboard in groups['dashboards'] %}
- name: {{ hostvars[dashboard]['inventory_hostname'] }}
ip: "{{ hostvars[dashboard]['private_ip'] }}"
{% endfor %}
Wazuh server installation ⛏️Now working on the Wazuh server installation in the several nodes. I found the following problem:
It seems that the Wazuh manager worker nodes are failing in the manager check. Firstly, I thought that this problem was related to the simultaneous installation of the Wazuh manager, which leads the worker connect to a non-existing Wazuh manager master node installation. After debugging this behavior, I ensured that this was not the problem. Important To check this, I installed in a playbook the Wazuh manager master node, and in another playbook (executed longafter) the Wazuh manager worker nodes installation, and the result was the same. Evidence: https://github.com/wazuh/wazuh-installation-assistant/actions/runs/10794298716/job/29938263624 I have asked @wazuh/devel-pyserver and @wazuh/devel-cppserver about the Wazuh API check. This means that it is necessary to perform an API call to every Wazuh manager node, or on the other hand, perform an API call only to the Wazuh manager master node. |
Update Report - DistributedWazuh server installation ⛏️There was a problem in this stage, reported in #51. The related PR tends to fix the manager check service in the distributed deployment. With the PR, the Wazuh server installation could be done simultaneously in every node (master and workers). However, we want to replicate the most reproduced scenario. This actually is to install the Wazuh manager nodes sequentially, as specified in the documentation.
Now working on a new logic: the Wazuh worker nodes should wait for the Wazuh master node to be installed, in order to start their installation |
Update Report - DistributedWazuh server installation ✔️I finally developed a logic in which the Wazuh manager worker nodes wait to the Wazuh manager master node to be installed: - name: Install Wazuh server on master
block:
- name: Install Wazuh server (Master)
command: "bash {{ tmp_path }}/wazuh-install.sh -ws {{ inventory_hostname }} -v"
register: wazuh
- name: Save Wazuh installation log (Master)
blockinfile:
marker: ""
path: "{{ test_dir }}/{{ test_name }}_{{ inventory_hostname }}.log"
block: |
{{ wazuh.stderr }}
--------------------------------
{{ wazuh.stdout }}
when: hostvars[inventory_hostname].manager_type == 'master'
- name: Install Wazuh server on worker nodes
block:
- name: Wait for Wazuh master to be ready on port {{ check_port }}
wait_for:
host: "{{ master_ip }}"
port: "{{ check_port }}"
delay: "{{ delay }}"
timeout: 300
when: hostvars[inventory_hostname].manager_type == 'worker'
async: 500
poll: 5
- name: Install Wazuh server (Workers)
command: "bash {{ tmp_path }}/wazuh-install.sh -ws {{ inventory_hostname }} -v"
register: wazuh Here, the workers make petitions to the API master node (port 55000), and when the port is opened and ready, the worker nodes start their installation. Successful GHA here: https://github.com/wazuh/wazuh-installation-assistant/actions/runs/10833326420/job/30059832055 Wazuh dashboard installation ✔️The Wazuh dashboard installation follows the same logic as the indexer installation. Successful GHA here: https://github.com/wazuh/wazuh-installation-assistant/actions/runs/10833846856/job/30061601201 Python tests execution ✔️The Extra
|
Update Report - DistributedRandom
|
Description
Because of the
Wazuh packages redesign tier 2
objective we need to migrate the Wazuh installation assistant pipeline (Test_unattended_distributed
,Test_unattended_distributed_cases
,Test_unattended
andTest_unattended_tier
) to GHA.Tasks
Test_unattended_distributed
Test_unattended_distributed_cases
Test_unattended
Test_unattended_tier
unattended_installer
toinstallation_assistant
if appliesRelated
The text was updated successfully, but these errors were encountered: