-
create issues on terraform for issue #1
-
create issues on terraform for issue #2
-
create issues on terraform for issue #6
-
refactoring: move tf plan to separate script, always source this in apply.sh and source this in destroy.sh if plan file is not present. Need to be aware of development mode features
-
investigate on azure os disk size + filesystem size
-
azure disk encryption
-
don't destroy DNS zone
-
make cluster config path configurable
This list is just kept for development documentation purposes. All these issues have been solved prior to the first Open Source release
-
Resource group not ready when second module is invoked -> error -> solved by outputting resource_group_name and referencing output variable in other modules
-
create security roles for application security groups rather than CIDRs
-
split .tf files per module in type groups
-
split vars into mandatory and (optional with default)
-
make everything configurable (cidr etc.)
-
var descriptions
-
refactor resource var names
-
create nodes
-
split compute and infra nodes
-
make plural/singular on node(s), master(s)
-
count examples: https://github.com/coreos/tectonic-installer/blob/1.9.6-tectonic.3/modules/azure/vnet/nic-master.tf https://github.com/coreos/tectonic-installer/blob/1.9.6-tectonic.3/modules/azure/vnet/outputs.tf https://github.com/Azure/terraform-azurerm-network-security-group/blob/master/main.tf
-
DNS
-
multiple bastions + refactor vars to plural
-
separate bastion resources from module
-
static IPs for ansible ? -> YES https://www.terraform.io/docs/configuration/functions/cidrhost.html
-
ensure nic1 is bound to master1 etc.
-
tags
-
vm scale set -> use availibility zones instead -> https://social.technet.microsoft.com/wiki/contents/articles/51828.azure-vms-availability-sets-and-availability-zones.aspx
-
environment as tag
-
count breaks re-run at the moment -> upgraded to terraform 0.12.4, working now
-
refactor to HCL2 expressions
-
availibility zones per region? -> document which regions are supported av. zone docs: https://social.technet.microsoft.com/wiki/contents/articles/51828.azure-vms-availability-sets-and-availability-zones.aspx https://blogs.msdn.microsoft.com/igorpag/2018/05/03/azure-availability-zones-quick-tour-and-guide/ https://msdnshared.blob.core.windows.net/media/2018/05/3.jpg https://docs.microsoft.com/en-us/azure/availability-zones/az-overview https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-standard-public-zone-redundant-cli https://www.c-sharpcorner.com/article/availability-set-fault-domains-and-update-domains-in-azure-virtual-machie/
-
set all vars explicitly
-
static IP config won't work if adding new nodes -> start masters with ip 10, compute nodes with ip 100, infra nodes with ip 200
-
apply and destroy scripts should call docker run
-
bug: disk RG in caps? -> known issue
-
dont remove rg and vaults on destroy
-
backup
-
switch to 2 networks instead of 2 subnets -> not reasonable as subnet are needed for nics anyway
-
create Loadbalancer security rules
-
only add 1 nic to bastion? reasons: SSH to bastion not working when private nic is primary nic SSH to masters not working when only private nic app sec group is allowed to ssh to masters in net rules
-
NAT gateway for node outbound -> Cluster internet access not working Already automatically working? https://docs.microsoft.com/de-de/azure/load-balancer/load-balancer-outbound-connections#lb -> is automatically working when disable_outbound_snat is NOT set true on the loadbalancer rules
-
remove default security rules form security_group -> Blocked all traffic with rule for cluster with prio 110 -> the lower the number, the higher the priority -> NSG not working for interet routing, put firewall in front https://stackoverflow.com/questions/41559854/azure-load-balancer-nsg-rules-remove-access-directly -> fixed, see comments in net sec rules
-
local var clusterFQDN in main playbook + pass this everywhere
-
script azure login and subscription ID etc. ENV configuration -> Use azure answerfile and > into it in the container tmpfs
- try to get values from ENV
- get values from stdin
-
refactor comments from # to //
-
PER MODULE: fix todos, sort vars and update vars with patterns from conventions above, also fix compute_nodes to computenodes etc.
- vnet
- loadbalancer
- dns -> removed
- masters
- nodes
- bastions
- essentials
- backup
-
statefile in cluster directory
-
test if existing backup in vault (manually created) is kept when applying delete (as vault is not deleted anymore) -> works, backup is kept with the vault
-
known issues to github issues
-
terraform issue for: KNOWNISSUE - setting tags here results in terraform always changing as these tags are not really created by terraform and can also not be added to backup targets in Azure portal and can also not be added to backup policies in Azure portal
-
overthink module separation -> dns module removed, backup migrated to separate module
-
netzwerk architektur bild
-
tectonic comparison
-
only remove rg and vaults from state on destroy.sh when env development_mode is not set -> document that this will fail if backup is present. -> dont prompt for "yes" on terraform destroy when in dev mode -> document this
-
output that master disk was not deleted if var was set - get this value from terraform out
-
Remove bash stdout where reasonable (docker pull -> better "update FormKube" etc. )
-
test if backup actually works when NOT in dev mode (look in backup items after apply)
-
Generate ssh keys automatically by azure and place them in azure key store ->#7
-
ssh access not working when not setting admin pw AND disabling password authentication -> admin_password has to be set to ""
-
search for todos in code
-
DEV + PRD example inventory INCLUDING kubespray + openshift-ansible okd inventory
- dev config
- dev okd
- dev kubespray
- prd config
- prd okd
- prd kubespray
- docs + howto example
-
document
- destroy excluding rg and vaults
- cluster config folder MUST be named like clusterFQDN
- ssh keys are placed inside cluster config after initial bootstrap
- supported regions