Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to vendored terraform codebase rather than invoking separate binary #822

Merged

Conversation

abhinavdahiya
Copy link
Contributor

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 7, 2018
@abhinavdahiya abhinavdahiya changed the title Vendor terraform Switch to vendored terraform codebase rather than invoking separate binary Dec 7, 2018
Adds a go subproject `pkg/terraform/exec` with its own vendor dir.

`exec` package vendors terraform@v0.11.10 and few other dependecies required to glue terraform commands into a library.
adds `exec.go` that mimics the bare minimum functionality of `https://github.com/hashicorp/terraform/blob/v0.11.10/main.go`.

It provides `Apply`, `Destroy`, `Init` and `Version` funcitons corresponding to `terraform` subcommands.
Switches using os.Exec to `exec` package functions.
All output from stdout of terraform is directed to `logrus.Debug` and
output from stderr of terraform is directed to `logrus.Error`
@abhinavdahiya
Copy link
Contributor Author

abhinavdahiya commented Dec 7, 2018

So an example run lools like

Installer Output

[9:39:22]installer git:(terraform_vendor) ✗ TAGS=libvirt_destroy ./hack/build.sh && rm -rf dev && yes | ./scripts/maintenance/virsh-cleanup.sh && OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE= && ./bin/openshift-install --dir dev create cluster
+ minimum_go_version=1.10
++ go version
++ cut -d ' ' -f 3
+ current_go_version=go1.11.2
++ version 1.11.2
++ IFS=.
++ printf '%03d%03d%03d\n' 1 11 2
++ unset IFS
++ version 1.10
++ IFS=.
++ printf '%03d%03d%03d\n' 1 10
++ unset IFS
+ '[' 001011002 -lt 001010000 ']'
+ LAUNCH_PATH=/home/adahiya/go/src/github.com/openshift/installer
++ dirname ./hack/build.sh
+ cd ./hack/..
++ go list -e -f '{{.Dir}}' github.com/openshift/installer
+ PACKAGE_PATH=/home/adahiya/go/src/github.com/openshift/installer
+ test -z /home/adahiya/go/src/github.com/openshift/installer
+ LOCAL_PATH=/home/adahiya/go/src/github.com/openshift/installer
+ test /home/adahiya/go/src/github.com/openshift/installer '!=' /home/adahiya/go/src/github.com/openshift/installer
+ MODE=release
++ git describe --always --abbrev=40 --dirty
+ LDFLAGS=' -X main.version=v0.5.0-master-47-g2fccf1282236e63d44168d1881f7cbba5a4f4a2b-dirty'
+ TAGS=libvirt_destroy
+ OUTPUT=bin/openshift-install
+ export CGO_ENABLED=0
+ CGO_ENABLED=0
+ case "${MODE}" in
+ TAGS='libvirt_destroy release'
+ test '' '!=' y
+ go generate ./data
writing assets_vfsdata.go
+ echo 'libvirt_destroy release'
+ grep -q libvirt_destroy
+ export CGO_ENABLED=1
+ CGO_ENABLED=1
+ go build -ldflags ' -X main.version=v0.5.0-master-47-g2fccf1282236e63d44168d1881f7cbba5a4f4a2b-dirty' -tags 'libvirt_destroy release' -o bin/openshift-install ./cmd/openshift-install
Warning: This will destroy effectively all libvirt resources
? SSH Public Key /home/adahiya/.ssh/id_rsa_libvirt.pub
? Base Domain tt.testing
? Cluster Name adahiya-0
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
INFO Using Terraform to create cluster...
INFO Waiting 30m0s for the Kubernetes API...
INFO API v1.11.0+9d2874f up
INFO Waiting 30m0s for the bootstrap-complete event...
ERROR: logging before flag.Parse: E1207 09:57:33.404694   27949 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 2091
INFO Destroying the bootstrap resources...
INFO Using Terraform to destroy bootstrap resources...
INFO kubeadmin user password: [REDACTED]
INFO Install complete! The kubeconfig is located here: /home/adahiya/go/src/github.com/openshift/installer/dev/auth/kubeconfig

.openshift_install.log
openshift_intall.log

also the binary size is almost the same as #797 (comment)

$ ls -lah ./bin/openshift-install
-rwxrwxr-x. 1 adahiya adahiya 84M Dec  7 09:48 ./bin/openshift-install
$ size ./bin/openshift-install
   text    data     bss     dec     hex filename
64282294        1334634  287568 65904496        3ed9f70 ./bin/openshift-install

@abhinavdahiya
Copy link
Contributor Author

/retest

@cdrage
Copy link
Contributor

cdrage commented Dec 7, 2018

@abhinavdahiya This means we won't have to install the binary separately?

What about the libvirt / kvm Terraform plugin? Perhaps we can vendor that too.

@abhinavdahiya
Copy link
Contributor Author

@abhinavdahiya This means we won't have to install the binary separately?

Yes, you don't need to download terraform binary separately.

What about the libvirt / kvm Terraform plugin? Perhaps we can vendor that too.

libvirt is or dev use, so getting that plugin is not that bad. And as for AWS, terraform already downloads the correct plugin for you so the users don't have to do anything extra.

@abhinavdahiya
Copy link
Contributor Author

/retest

1 similar comment
@abhinavdahiya
Copy link
Contributor Author

/retest

@crawford
Copy link
Contributor

crawford commented Dec 7, 2018

/lgtm

Waiting for the v0.6.0 tag...
/hold

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Dec 7, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member

wking commented Dec 8, 2018

Are we landing this in 0.7.0 without doc updates, etc.? For example, we should be able to drop get-terraform.sh, but will need to also update the Dockerfiles if we do that. We should be able to update the at least the README (get-terraform.sh section) and overview.md ("invoking a locally-installed Terraform") in this PR without too much trouble.

@derekwaynecarr
Copy link
Member

Can we omit the explicit reference of Terraform and just state the action performed and not the implementation tool? This has been requested from PM.

/cc @abhinavdahiya @crawford @eparis @smarterclayton

@openshift-ci-robot
Copy link
Contributor

@derekwaynecarr: GitHub didn't allow me to request PR reviews from the following users: abhinavdahiya.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Can we omit the explicit reference of Terraform and just state the action performed and not the implementation tool? This has been requested from PM.

/cc @abhinavdahiya @crawford @eparis @smarterclayton

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@crawford
Copy link
Contributor

crawford commented Dec 8, 2018

@derekwaynecarr done in #840.

@crawford
Copy link
Contributor

crawford commented Dec 8, 2018

/retest

@crawford
Copy link
Contributor

@wking we should probably just pull this into 0.6.0.

/hold cancel
/retest

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 10, 2018
@abhinavdahiya
Copy link
Contributor Author

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/822/pull-ci-openshift-installer-master-e2e-aws/2097

level=fatal msg="Error executing openshift-install: waiting for bootstrap-complete: timed out waiting for the condition"

The bootstrapping is not completing because the cluster-kube-apiserver-opertor is reporting Failing blocking creation of all other operators.
CVO reporting error:

I1210 07:17:35.456979       1 operatorstatus.go:84] ClusterOperator /openshift-cluster-kube-apiserver-operator is reporting (v1.ClusterOperatorStatus) {
 Conditions: ([]v1.ClusterOperatorStatusCondition) (len=3 cap=4) {
  (v1.ClusterOperatorStatusCondition) {
   Type: (v1.ClusterStatusConditionType) (len=7) "Failing",
   Status: (v1.ConditionStatus) (len=4) "True",
   LastTransitionTime: (v1.Time) 2018-12-10 07:11:57 +0000 UTC,
   Reason: (string) (len=7) "Failing",
   Message: (string) (len=38) "Failing: etcdserver: request timed out"
  },
  (v1.ClusterOperatorStatusCondition) {
   Type: (v1.ClusterStatusConditionType) (len=9) "Available",
   Status: (v1.ConditionStatus) (len=4) "True",
   LastTransitionTime: (v1.Time) 2018-12-10 07:11:57 +0000 UTC,
   Reason: (string) "",
   Message: (string) (len=30) "3 of 3 nodes are at revision 1"
  },
  (v1.ClusterOperatorStatusCondition) {
   Type: (v1.ClusterStatusConditionType) (len=11) "Progressing",
   Status: (v1.ConditionStatus) (len=5) "False",
   LastTransitionTime: (v1.Time) 2018-12-10 07:11:57 +0000 UTC,
   Reason: (string) (len=24) "AllNodesAtLatestRevision",
   Message: (string) (len=30) "0 of 3 nodes are at revision 1"
  }
 },
 Version: (string) "",
 Extension: (runtime.RawExtension) &RawExtension{Raw:nil,}
}
I1210 07:17:35.459292       1 operatorstatus.go:109] ClusterOperator /openshift-cluster-kube-apiserver-operator is not done for version ; it is version=, available=true, progressing=false, failing=true

cluster-kube-apiserver-operator reporting error

sh-4.2$ oc -n openshift-cluster-kube-apiserver-operator get kubeapiserveroperatorconfigs.kubeapiserver.operator.openshift.io instance -oyaml
apiVersion: kubeapiserver.operator.openshift.io/v1alpha1
kind: KubeAPIServerOperatorConfig
metadata:
  creationTimestamp: 2018-12-10T06:50:11Z
  generation: 3
  name: instance
  resourceVersion: "4397"
  selfLink: /apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs/instance
  uid: d6ff918e-fc47-11e8-b7a9-12e3447f3af0
spec:
  forceRedeploymentReason: ""
  managementState: Managed
  observedConfig:
    admissionPluginConfig:
      openshift.io/RestrictedEndpointsAdmission:
        configuration:
          restrictedCIDRs:
          - 10.128.0.0/14
          - 172.30.0.0/16
    servicesSubnet: 172.30.0.0/16
    storageConfig:
      urls:
      - https://ci-op-wbixcq14-1d3f3-etcd-0.origin-ci-int-aws.dev.rhcloud.com:2379
      - https://ci-op-wbixcq14-1d3f3-etcd-1.origin-ci-int-aws.dev.rhcloud.com:2379
      - https://ci-op-wbixcq14-1d3f3-etcd-2.origin-ci-int-aws.dev.rhcloud.com:2379
  operandSpecs: null
  unsupportedConfigOverrides:
    oauthConfig:
      masterCA: /etc/kubernetes/static-pod-resources/configmaps/client-ca/ca-bundle.crt
      masterPublicURL: https://ci-op-wbixcq14-1d3f3-api.origin-ci-int-aws.dev.rhcloud.com:6443
      masterURL: https://ci-op-wbixcq14-1d3f3-api.origin-ci-int-aws.dev.rhcloud.com:6443
status:
  conditions:
  - lastTransitionTime: 2018-12-10T06:50:11Z
    status: "False"
    type: ConfigObservationFailing
  - lastTransitionTime: 2018-12-10T06:50:12Z
    status: "False"
    type: InstallerControllerFailing
  - lastTransitionTime: null
    message: 3 of 3 nodes are at revision 1
    status: "True"
    type: Available
  - lastTransitionTime: null
    message: 0 of 3 nodes are at revision 1
    reason: AllNodesAtLatestRevision
    status: "False"
    type: Progressing
  - lastTransitionTime: null
    status: "False"
    type: RevisionControllerFailing
  - lastTransitionTime: 2018-12-10T06:50:26Z
    status: "False"
    type: TargetConfigReconcilerFailing
  - lastTransitionTime: 2018-12-10T06:52:44Z
    message: 'etcdserver: request timed out'
    reason: StatusUpdateError
    status: "True"
    type: Failing
  generations: null
  latestAvailableRevision: 1
  nodeStatuses:
  - currentRevision: 1
    lastFailedRevision: 0
    lastFailedRevisionErrors: null
    nodeName: ip-10-0-8-225.ec2.internal
    targetRevision: 0
  - currentRevision: 1
    lastFailedRevision: 0
    lastFailedRevisionErrors: null
    nodeName: ip-10-0-25-131.ec2.internal
    targetRevision: 0
  - currentRevision: 1
    lastFailedRevision: 0
    lastFailedRevisionErrors: null
    nodeName: ip-10-0-40-97.ec2.internal
    targetRevision: 0
  readyReplicas: 0
  version: ""

/retest

@wking
Copy link
Member

wking commented Dec 10, 2018

The bootstrapping is not completing because the cluster-kube-apiserver-opertor is reporting Failing blocking creation of all other operators.

I was concerned that we'd still be running against old 4.0.0-0.alpha-2018-12-07-201539 update payloads, because 4.0.0-0.alpha-2018-12-10-052428 failed CI. But comparing the two payloads for a random constituent:

$ oc adm release info registry.svc.ci.openshift.org/openshift/origin-release:4.0.0-0.alpha-2018-12-10-052428 --changes-from=registry.svc.ci.openshift.org/openshift/origin-release:4.0.0-0.alpha-2018-12-07-201539 | grep hyperkube
  hyperkube                 sha256:4d0106d7428828c87ed905728742fbc11bd8b30d0c87165359699d0a475e2315 sha256:4ec2101aa8958c7d654a53180107e8840f8af253957ae9b10df90a2242e39380

And looking in the logs of your job 2097 for the new digest:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/822/pull-ci-openshift-installer-master-e2e-aws/2097/artifacts/e2e-aws/pods.json | grep grep sha256:4ec2101aa8958c7d654a53180107e8840f8af253957ae9b10df90a2242e39380
                                "value": "registry.svc.ci.openshift.org/ci-op-wbixcq14/stable@sha256:4ec2101aa8958c7d654a53180107e8840f8af253957ae9b10df90a2242e39380"
                                "value": "registry.svc.ci.openshift.org/ci-op-wbixcq14/stable@sha256:4ec2101aa8958c7d654a53180107e8840f8af253957ae9b10df90a2242e39380"
                        "image": "registry.svc.ci.openshift.org/ci-op-wbixcq14/stable@sha256:4ec2101aa8958c7d654a53180107e8840f8af253957ae9b10df90a2242e39380",
...

So it looks like we are indeed testing against the recent (and good, although it failed a CI test) 4.0.0-0.alpha-2018-12-10-052428.

@abhinavdahiya
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor

PRs test against the latest images from master of other components always

@abhinavdahiya
Copy link
Contributor Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 79f057c into openshift:master Dec 10, 2018
wking added a commit to wking/openshift-installer that referenced this pull request Dec 10, 2018
Through 79f057c (Merge pull request openshift#822 from
abhinavdahiya/terraform_vendor, 2018-12-10).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 10, 2018
Through 79f057c (Merge pull request openshift#822 from
abhinavdahiya/terraform_vendor, 2018-12-10).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 10, 2018
Through 79f057c (Merge pull request openshift#822 from
abhinavdahiya/terraform_vendor, 2018-12-10).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 10, 2018
Through 79f057c (Merge pull request openshift#822 from
abhinavdahiya/terraform_vendor, 2018-12-10).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 11, 2018
Through e810901 (Merge pull request openshift#817 from
staebler/add_openshiftClusterID, 2018-12-10, openshift#822).
abhinavdahiya added a commit to abhinavdahiya/installer that referenced this pull request Dec 13, 2018
The old implementation depended on terraform auto picking up `*.auto.tfvars` in the current directory.
And since we ran terraform as separate process we changes the CWD of that to the `tempDir`.

With openshift#822 terraform in run as part of installer so var and state files need to be explicit to the `tempDir`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants