Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

Fix multi-host provisioning #110

Merged
merged 12 commits into from
Nov 4, 2021
Merged

Conversation

apatard
Copy link
Member

@apatard apatard commented May 25, 2021

This patchset is fixing multiplatform support. Until now, when multiple instances are declared, the create.yml playbook is looping over the instance list and calls the vagrant module. This leads to one Vagrantfile per instance which are overwritten at each step of the loop.
This make debugging harder and - in some cases - may confuse vagrant.
This patchset is allowing to directly give molecule.yml instances list to the vagrant module and then the module is producing a proper Vagrantfile.

@apatard
Copy link
Member Author

apatard commented May 25, 2021

recheck

@sio
Copy link
Contributor

sio commented May 25, 2021

Vagrant Cloud is broken at the moment, these CI failures are not because of your changes: hashicorp/vagrant#12390

@apatard
Copy link
Member Author

apatard commented May 25, 2021

@sio oh... Thanks for the info. I will refrain me from sending new PR or recheck then.

@sio
Copy link
Contributor

sio commented May 25, 2021

I didn't mean to dissuade you from submitting new PRs, I was just letting you know that CI will not be functional for some time due to upstream service provider outage :)

It took me quite some time this morning to understand that the problem is not on my end and I wanted to save you the same trouble.

@apatard
Copy link
Member Author

apatard commented May 25, 2021

recheck

@apatard
Copy link
Member Author

apatard commented May 26, 2021

hm. weird. Probably not related to this PR but ...

2021-05-26T07:59:49.8041450Z TASK [Bootstrap python for Ansible] ********************************************
2021-05-26T07:59:49.8151860Z Wednesday 26 May 2021  07:59:49 +0000 (0:00:00.065)       0:00:00.065 *********
2021-05-26T08:00:06.3782880Z �[32mok: [instance]�[0m

So the prepare.yml did install python but

2021-05-26T08:00:08.6030260Z TASK [sample task] *************************************************************
2021-05-26T08:00:08.6045050Z Wednesday 26 May 2021  08:00:08 +0000 (0:00:00.039)       0:00:00.039 *********
2021-05-26T08:00:09.4077040Z �[1;35m[WARNING]: No python interpreters found for host instance (tried�[0m
2021-05-26T08:00:09.4098840Z �[1;35m['/usr/bin/python', 'python3.9', 'python3.8', 'python3.7', 'python3.6',�[0m
2021-05-26T08:00:09.4157470Z �[1;35m'python3.5', 'python2.7', 'python2.6', '/usr/libexec/platform-python',�[0m
2021-05-26T08:00:09.4160370Z �[1;35m'/usr/bin/python3', 'python'])�[0m
2021-05-26T08:00:09.4163550Z �[31mfatal: [instance]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python"}, "changed": false, "module_stderr": "Shared connection to 127.0.0.1 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python: not found\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}�[0m
2021-05-26T08:00:09.4165610Z

Given that the scenario is the default configuration, aka:

---
dependency:
  name: galaxy
driver:
  name: vagrant
  provider:
    name: libvirt
platforms:
  - name: instance
provisioner:
  name: ansible

I'm not convinced that setting the box to use the test one is a good idea....

@apatard apatard marked this pull request as ready for review May 29, 2021 08:42
@ssbarnea ssbarnea added the bug Something isn't working label Jul 23, 2021
@ssbarnea ssbarnea changed the title Proper Multiplatform support Fix multi-host provisioning Jul 23, 2021
@apatard apatard requested a review from ssbarnea as a code owner July 23, 2021 14:13
Copy link
Contributor

@yajo yajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this PR in my local dev workflow.

Without it, I was almost about to dump molecule-vagrant. It became too complicated.

With it, it works like a charm. VMs spin up faster. If I need to debug anything or connect to one of them, I just have to cd to the scenario ephemeral directory and run vagrant ssh server0.

Non-deep code review looks good also.

Thanks! This is a must have. 😃

@apatard
Copy link
Member Author

apatard commented Aug 30, 2021

With it, it works like a charm. VMs spin up faster. If I need to debug anything or connect to one of them, I just have to cd to the scenario ephemeral directory and run vagrant ssh server0.

hm. molecule login is not working for you ?

@yajo
Copy link
Contributor

yajo commented Aug 31, 2021

hm. molecule login is not working for you ?

I didn't know it existed 😅

It's working great! (I mean: without the PR)

@yajo
Copy link
Contributor

yajo commented Aug 31, 2021

In any case, I still think this different approach is a good enhancement. It is more similar to what one would do if not using molecule, and is faster.

One problem I noticed is that, when using this PR, machines were created without a network interface.

When using a4bd8b1, they got created with networking.

My platforms definition is:

platforms:
  - &server
    box: generic/ubuntu2004
    groups:
      - k8s_server
      - k8s_node
    name: server0
  - <<: *server
    name: server1
  - <<: *server
    name: server2

@apatard
Copy link
Member Author

apatard commented Aug 31, 2021

@yajo Networking should work. Can you share your molecule.yml so that I can try to reproduce and fix over the week ? It would be bad to see this PR merged if there's a know bug...

@yajo
Copy link
Contributor

yajo commented Aug 31, 2021

Here it is:

dependency:
  # name: galaxy
  # options:
  #   role-file: requirements.yaml
  #   requirements-file: requirements.yaml
  name: shell
  command: ansible-galaxy install -r requirements.yaml
driver:
  name: vagrant
  provider:
    name: libvirt
platforms:
  - &server
    box: generic/ubuntu2004
    groups:
      - k8s_server
      - k8s_node
    name: server0
  - <<: *server
    name: server1
  - <<: *server
    name: server2
provisioner:
  name: ansible
  connection_options:
    ansible_become: true
  config_options:
    defaults:
      vault_password_file: ${MOLECULE_PROJECT_DIRECTORY}/.vault_password.txt
  inventory:
    links:
      group_vars: ../../group_vars
verifier:
  name: ansible

@apatard
Copy link
Member Author

apatard commented Sep 3, 2021

@yajo tried quickly your setup and I can use molecule login to connect so the network seems to work. Can you have a look at the vagrant .err and .out log files to see if there's a clue ?

@yajo
Copy link
Contributor

yajo commented Sep 7, 2021

It seems related to using libvirt.qemu_use_session = true which seems to have some problems on creating network interfaces. Not something specific to this PR.

@yajo
Copy link
Contributor

yajo commented Sep 28, 2021

Definitely the issues I'm experiencing are lower in the stack. Check vagrant-libvirt/vagrant-libvirt#1342 if you're interested. But nothing related to this PR.

Current code only handle 1 vagrant VM. Adding a 'instances'
parameter and a loop in the jinja template is mostly enough.
Unfortunately, some extra work has been needed:

New options
- An option called 'cachier' used to control vagrant-cachier
  has been added. From what I understant it's not possible to configure
  it at instance level, so the cachier option has been removed from
  instance definition.
- An option called 'default_box' has been added since it can't be
  specified when using instances list directly from molecule.yml.
- An option called 'parallel' has been added to allow setting VAGRANT_NO_PARALLEL
  environment variable, since people may want to disable parallel VM
  creation with multiple instances.

Moreover, some care has been added to avoid breaking current playbooks.
This allows people updating theirs and the ones from molecule-vagrant
will be updated in a later commit.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
Now that the vagrant module can take directly the list of instances
from molecule, update the playbooks accordingly.
This should solve a bunch of issues when the module is overwriting
the Vagrantfile when bringing up each VM. Moreover this should make
things easier to debug and more robust.

Since the provision option is not VM specific, the templates are now
expecting it in the molecule.yml driver: dict.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
…ule.yml: fix provision usage

the provision parameter should now be specified outside the instances
list, so update the molecule configuration for that.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
… options

The default scenario should really be minimal, so remove extra options.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
Since the vagrant module is supposed to remain compatible atm with old
create/destroy playbooks, add a test scenario for that.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
…enario

This scenario had some troubles:
- the prepare.yml scenario was useless
- no verify.yml
- the converge.yml tasks were not so useful since better testing
  is provided by verify.yml
- fix molecule.yml network definition to be provider-agnostic and
  not use .1 IP since it's usually the IP of the hypervisor with libvirt
- add provider options for libvirt to not use session, since we're not
  setting network configuration to allow working such configuration.
- reduce memory usage to 256 for each box, to avoid needing to much
  memory

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
- add the default and default-compat scenarii
- add the multi-node scenario and ensure that the generated Vagrantfile
  has both instances in it.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
Update README.rst molecule.yml section according to the changes done
in the create.yml/destroy.yml playbooks.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
pre-commit-ci bot and others added 2 commits November 3, 2021 16:55
… network

Avoid using dhcp as there are high chances it'll use a network forbidden
by default vbox configuration in github actions.

Signed-off-by: Arnaud Patard <apatard@hupstream.com>
@ssbarnea ssbarnea merged commit 929c3ad into ansible-community:main Nov 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants