PARKED - switch from add_host to dynamic inventory #804

IPvSean · 2020-04-28T23:36:34Z

SUMMARY

this switches from the add_host method to using the aws_ec2 dynamic inventory method

PROBLEM STATEMENT WITH ADD_HOST

the add_host required multiple loops for each type of instance. If I had 50 node1s that would require you to loop 50 times (just for node1). If i had four nodes (ansible, node1, node2, node3) times 50 student, this would require 200 loops. This also required a get_info (e.g. ec2_instance_info) four times (once for each instance type). (see example below)
as you can see below the add_host is also long... 13 lines+ for each host, which literally was creating a 57+ line add_Host section per workshop type or 360 lines of Ansible Playbook just dealing with inventory creation. The dynamic inventory approach can shrink 6 files of 360+ lines and multiple tasks into one file of 44 lines (and honestly without security workshop would be like 30 lines!)
the templates with add_host were horrendous! For example we could not use Ansible's natural parallelism to template out files on each device until we got inventory sorted, so you would have to loop through facts for each type of instance, map that to the correct student using tags, then subsequent roles would be clean again (see example below)

there may be an issue with this method as outlined in this github issue here: ansible/ansible#69227

I am going to work on a workaround

ISSUE TYPE

Feature Pull Request

COMPONENT NAME

provisioner

ADDITIONAL INFORMATION

@liquidat

My favorite person ever @liquidat

windows, network, rhel_90 and rhel are all working with dynamic inventory now

also tagging ansible/ansible#69227 in here

makesure workshop_inventory directory is present, failing zuul tests

cigamit · 2020-04-29T04:47:22Z

This will break workshops when VMs fail to Tag because of AWS Rate Limiting (with add_hosts they still get setup properly).

This will also break the ability to have provisioned multiple workshops at once. The 2nd workshop will override "{{playbook_dir}}/workshop_inventory/aws_ec2.yml". This stops you from being able to tear down the 1st workshop since it relies on the inventory script.

IPvSean · 2020-04-29T12:13:24Z

This will also break the ability to have provisioned multiple workshops at once. The 2nd workshop will override "{{playbook_dir}}/workshop_inventory/aws_ec2.yml". This stops you from being able to tear down the 1st workshop since it relies on the inventory script.

this makes sense to me, I have been thinking about how to solve this with Sloane.

This will break workshops when VMs fail to Tag because of AWS Rate Limiting (with add_hosts they still get setup properly).

the instances get tagged with ec2_tag, not with inventory. I am confused here.

IPvSean · 2020-04-29T13:47:12Z

add_host method:
ansible-playbook provision_lab.yml -e @sean.yml  275.34s user 62.35s system 16% cpu 33:15.19 total

dynamic inventory method:
ansible-playbook provision_lab.yml -e @sean.yml  226.91s user 57.68s system 15% cpu 30:10.12 total

time savings of dynamic inventory from initial testing of 2 students is 3 minutes and 5 seconds

making provisioner more stable

flowerysong · 2020-04-29T15:09:16Z

provisioner/roles/manage_ec2_instances/templates/aws_ec2.j2

+{% for n in range(1, student_total + 1) %}
+  student{{ n }}: "'student{{ n }}' == tags.Student"
+{% endfor %}


Suggested change

{% for n in range(1, student_total + 1) %}

student{{ n }}: "'student{{ n }}' == tags.Student"

{% endfor %}

keyed_groups:

- key: tags.Student

prefix: ''

separator: ''

cigamit · 2020-04-29T15:13:40Z

This will break workshops when VMs fail to Tag because of AWS Rate Limiting (with add_hosts they still get setup properly).

the instances get tagged with ec2_tag, not with inventory. I am confused here.

Currently, the ec2 tagging can fail because of rate limiting. Its a known problem that doesn't affect the running of the workshop (the add_hosts ensure that they are still in the running inventory). They really need to fix the module as some of the other are to detect rate limiting and retry.

With the move to using a dynamic inventory though, the playbook won't know about these hosts as they now don't match the filter. Thus they don't get added to the running inventory, and don't get setup. The only person that will notice is the students when they go to use their lab environment, as the nodes won't be able to be connected to. The playbook run itself will pass, because it will setup every host that it "knows" about.

fix f5 verification

robustness for ansible-galaxy

IPvSean · 2020-04-30T11:40:01Z

Currently, the ec2 tagging can fail because of rate limiting. Its a known problem that doesn't affect the running of the workshop (the add_hosts ensure that they are still in the running inventory). They really need to fix the module as some of the other are to detect rate limiting and retry.

but we use tags to use the add_host method->

---
- name: grab facts for node1 node
  ec2_instance_info:
    region: "{{ ec2_region }}"
    filters:
      instance-state-name: running
      "tag:Workshop_node1": "{{ec2_name_prefix}}-node1"
  register: node1_node_facts

are you saying the count_tag is working no matter what, while the ec2_tag module gets rate-limited?

also removing old templates

fix windows workshop, accidentally removed this template

cigamit · 2020-04-30T14:44:15Z

We do a ec2 call to create the instances (but we apply no tags at creation, not even the same tag we use for count_tag.... we should change that!) and then we run those through a loop and apply the ec2_tags.

The instances I see that are untagged have zero tags applied to them, so yes, they wouldn't even be picked up by our "ec2_instance_info", so it seems they aren't being setup even today. I wish we could reproduce this reliably, then we could check the hosts and see exactly what is going on.

For reference, in Skylight, we don't use the ec2_instance_info call at all, we already have all the information we need about the hosts from the ec2 variables, so why do more API calls to get the same data?

IPvSean · 2020-04-30T14:54:16Z

For reference, in Skylight, we don't use the ec2_instance_info call at all, we already have all the information we need about the hosts from the ec2 variables, so why do more API calls to get the same data?

ah this makes way more sense to me now, i need to noodle on this... we are on the same wavelength now

cigamit · 2020-04-30T15:09:08Z

I will point out, that you use the info call, because you use the variable in places such as the inventory template, and you use the tags from the variable too, which aren't present in the ec2 variable, since you tag later.

@liquidat

fixing username with suggestino from @liquidat

IPvSean · 2020-05-01T21:18:30Z

this PR is parked, it will wait until we have a import_inventory at least

IPvSean added 8 commits April 28, 2020 19:31

for Roland

ec2bccc

My favorite person ever @liquidat

sycning fixes for april 26

8990e54

windows is working now

a061e69

windows, network, rhel_90 and rhel are all working with dynamic inventory now

more dynamic inventory enchancements

a173cf3

also tagging ansible/ansible#69227 in here

fixing ansibe.cfg

4fad476

syncing fixes

2ae5e88

fixing bugs with control node role

9505482

Update dynamic.yml

8327510

makesure workshop_inventory directory is present, failing zuul tests

fixing CI to pass testing

fcd5d7c

Update tower.yml

fec619d

making provisioner more stable

flowerysong reviewed Apr 29, 2020

View reviewed changes

flowerysong mentioned this pull request Apr 29, 2020

ec2_tag: use AWSRetry ansible-collections/amazon.aws#41

Merged

IPvSean added 3 commits April 29, 2020 14:22

Update pipeline.groovy

70da2aa

fix f5 verification

fixing f5 exercise testing

57b0448

Update provision_lab.yml

29c3c23

robustness for ansible-galaxy

IPvSean added 3 commits April 30, 2020 08:18

syncing more fixes for security_veriy.yml

b3e5a9e

also removing old templates

Create skylight_gitlab_userdata.j2

a187a6c

fix windows workshop, accidentally removed this template

fixing security check

043f2c0

IPvSean added 2 commits April 30, 2020 21:49

Update instructor-security.j2

311de52

fixing username with suggestino from @liquidat

Update security_verify.yml

2d3683c

IPvSean changed the title ~~switch from add_host to dynamic inventory~~ PARKED - switch from add_host to dynamic inventory - DO NOT CLOSE May 1, 2020

IPvSean closed this May 2, 2020

IPvSean changed the title ~~PARKED - switch from add_host to dynamic inventory - DO NOT CLOSE~~ PARKED - switch from add_host to dynamic inventory May 2, 2020

IPvSean deleted the gh-pages branch May 2, 2020 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARKED - switch from add_host to dynamic inventory #804

PARKED - switch from add_host to dynamic inventory #804

IPvSean commented Apr 28, 2020

cigamit commented Apr 29, 2020

IPvSean commented Apr 29, 2020

IPvSean commented Apr 29, 2020

flowerysong Apr 29, 2020

cigamit commented Apr 29, 2020

IPvSean commented Apr 30, 2020

cigamit commented Apr 30, 2020

IPvSean commented Apr 30, 2020

cigamit commented Apr 30, 2020

IPvSean commented May 1, 2020

-{% for n in range(1, student_total + 1) %}
-  student{{ n }}: "'student{{ n }}' == tags.Student"
-{% endfor %}
+keyed_groups:
+  - key: tags.Student
+    prefix: ''
+    separator: ''

PARKED - switch from add_host to dynamic inventory #804

PARKED - switch from add_host to dynamic inventory #804

Conversation

IPvSean commented Apr 28, 2020

SUMMARY

PROBLEM STATEMENT WITH ADD_HOST

ISSUE TYPE

COMPONENT NAME

ADDITIONAL INFORMATION

cigamit commented Apr 29, 2020

IPvSean commented Apr 29, 2020

IPvSean commented Apr 29, 2020

flowerysong Apr 29, 2020

Choose a reason for hiding this comment

cigamit commented Apr 29, 2020

IPvSean commented Apr 30, 2020

cigamit commented Apr 30, 2020

IPvSean commented Apr 30, 2020

cigamit commented Apr 30, 2020

IPvSean commented May 1, 2020