Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installer/pkg/config/libvirt: Caching for libvirt pulls #280

Merged
merged 4 commits into from
Sep 23, 2018

Conversation

wking
Copy link
Member

@wking wking commented Sep 19, 2018

Checking our RHCOS source against ETag / If-None-Match and Last-Modified / If-Modified-Since, it seems to support In-None-Match well, but only supports If-Modified-Since for exact matches:

$ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
$ curl -I "${URL}"
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Wed, 19 Sep 2018 04:32:19 GMT
Content-Type: application/octet-stream
Content-Length: 684934062
Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
Connection: keep-alive
ETag: "5ba15a84-28d343ae"
Accept-Ranges: bytes
$ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
HTTP/1.1 304 Not Modified
$ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
HTTP/1.1 304 Not Modified
$ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
HTTP/1.1 200 OK
$ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
HTTP/1.1 200 OK
Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT

That last entry should have 304ed, although the spec has:

When used for cache updates, a cache will typically use the value of the cached message's Last-Modified field to generate the field value of If-Modified-Since. This behavior is most interoperable for cases where clocks are poorly synchronized or when the server has chosen to only honor exact timestamp matches (due to a problem with Last-Modified dates that appear to go "back in time" when the origin server's clock is corrected or a representation is restored from an archived backup)...

So the server is violating the SHOULD by not 304ing dates greater than Last-Modified, but it's not violating a MUST-level requirement. The server requirements around If-None-Match are MUST-level, so using it should be more portable. The RFC also seems to prefer clients use If(-None)-Match.

I'm using gregjones/httpcache for the caching, since that implemenation seems reasonably popular and the repo's been around for a few years. That library uses the belt-and-suspenders approach of setting both If-None-Match (to the cached ETag) and If-Modified-Since (to the cached Last-Modified), so we should be fine.

UserCacheDir requires Go 1.11.

No unit tests, but you can excercise the logic with:

$ go1.11 build ./installer/cmd/tectonic
$ cat libvirt.yaml 
admin:
  email: a@b.c
  password: verysecure
  sshKey: "ssh-rsa AAAA..."
baseDomain: installer.testing
name: wking
platform: libvirt
libvirt:
  uri: qemu+tcp://192.168.122.1/system
  network:
    name: tectonic
    ifName: tt0
    dnsServer: 8.8.8.8
    ipRange: 192.168.124.0/24
  image: http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
pullSecretPath: /home/trking/src/openshift/installer/pull-secret.json
master:
  nodePools:
    - master
worker:
  nodePools:
    - worker
nodePools:
  - count: 1
    name: master
  - count: 2
    name: worker
$ rm -rf wking && ./tectonic init --config=libvirt.yaml && cat wking/config.yaml  # slow....
...
  image: file:///tmp/openshift-install-092942808
...
$ rm -rf wking && ./tectonic init --config=libvirt.yaml && cat wking/config.yaml  # fast :)
...
  image: file:///tmp/openshift-install-567329896
...
$ sha1sum /tmp/openshift-install-*
d4554ab97c91f1d807fafab0c81b5492a98219a8  /tmp/openshift-install-092942808
d4554ab97c91f1d807fafab0c81b5492a98219a8  /tmp/openshift-install-567329896
$ tree -a ~/.cache/openshift-install/
/home/trking/.cache/openshift-install/
└── libvirt
    └── 914a1ca2be4f5adecaea8ceda514b02e

1 directory, 1 file

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 19, 2018
@wking
Copy link
Member Author

wking commented Sep 19, 2018

The errors are because we aren't using Go 1.11 yet, so we get complaints about os.UserCacheDir being undefined. Can we bump our Go (like we bumped to 1.10 in openshift/release#1132)? Should I create a dummy shim for earlier Gos? How dumb can my shim be?

@wking
Copy link
Member Author

wking commented Sep 19, 2018

I also haven't bothered vendoring httpcache until folks have time to look this over and review the general direction. I don't think our CI will green out without the vendor bump, but just in case, I'll drop a...

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 19, 2018
@wking wking force-pushed the cache-libvirt-image branch 2 times, most recently from 9f90bb1 to 04d2575 Compare September 19, 2018 16:38
@@ -29,6 +29,7 @@ func ParseConfig(data []byte) (*Cluster, error) {
return nil, err
}
cluster.PullSecret = string(data)
cluster.PullSecretPath = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be in here?

Copy link
Member Author

@wking wking Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be in here?

Yeah. We used to just copy the input config over. Now that I'm saving the updates config (because it might contain a new libvirt image URI), I need to clear PullSecretPath (after reading the content into PullSecret) to avoid hitting this when we reload the config for the install step.

@wking wking force-pushed the cache-libvirt-image branch from 04d2575 to 5f705cf Compare September 19, 2018 20:57
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 19, 2018
@wking
Copy link
Member Author

wking commented Sep 19, 2018

Rebased around #273 with 04d2575 -> 5f705cf.

Checking our RHCOS source against ETag [1] / If-None-Match [2] and
Last-Modified [3] / If-Modified-Since [4], it seems to support
In-None-Match well, but only supports If-Modified-Since for exact
matches:

  $ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
  $ curl -I "${URL}"
  HTTP/1.1 200 OK
  Server: nginx/1.8.0
  Date: Wed, 19 Sep 2018 04:32:19 GMT
  Content-Type: application/octet-stream
  Content-Length: 684934062
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
  Connection: keep-alive
  ETag: "5ba15a84-28d343ae"
  Accept-Ranges: bytes
  $ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 200 OK
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
  HTTP/1.1 200 OK
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT

That last entry should have 304ed, although the spec has [4]:

  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.
  ...
  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.

So the server is violating the SHOULD by not 304ing dates greater than
Last-Modified, but it's not violating a MUST-level requirement.  The
server requirements around If-None-Match are MUST-level [2], so using
it should be more portable.  The RFC also seems to prefer clients use
If(-None)-Match [4,5].

I'm using gregjones/httpcache for the caching, since that
implemenation seems reasonably popular and the repo's been around for
a few years.  That library uses the belt-and-suspenders approach of
setting both If-None-Match (to the cached ETag) and If-Modified-Since
(to the cached Last-Modified) [6], so we should be fine.

UserCacheDir requires Go 1.11 [7,8,9].

[1]: https://tools.ietf.org/html/rfc7232#section-2.3
[2]: https://tools.ietf.org/html/rfc7232#section-3.2
[3]: https://tools.ietf.org/html/rfc7232#section-2.2
[4]: https://tools.ietf.org/html/rfc7232#section-3.3
[5]: https://tools.ietf.org/html/rfc7232#section-2.4
[6]: https://github.com/gregjones/httpcache/blob/9cad4c3443a7200dd6400aef47183728de563a38/httpcache.go#L169-L181
[7]: golang/go#22536
[8]: golang/go@816154b
[9]: golang/go@50bd1c4
Assume the caller has HOME set and XDG_CACHE_HOME not set, because
that's easy.  The ~/.cache default is from [1].  Once we bump to Go
1.11, we can revert this commit and get the more-robust stdlib
implementation, which is why I haven't bothered with a robust
implementation here.

[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-0.7.html
Passing the compressed images to libvirt was giving me "no bootable
device" errors with my:

  $ rpm -q libvirt qemu-kvm-rhev
  libvirt-3.9.0-14.el7_5.7.x86_64
  qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64

Ideally, the HTTP headers would tell us that the image was compressed
(via Content-Type [1] or Content-Encoding [2]), but
aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com isn't configured to do that
at the moment:

  $ curl -I http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
  HTTP/1.1 200 OK
  Server: nginx/1.8.0
  Date: Sat, 22 Sep 2018 04:59:53 GMT
  Content-Type: application/octet-stream
  Content-Length: 684751099
  Last-Modified: Fri, 21 Sep 2018 13:56:07 GMT
  Connection: keep-alive
  ETag: "5ba4f877-28d078fb"
  Accept-Ranges: bytes

  $ curl -I http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2
  HTTP/1.1 404 Not Found
  Server: nginx/1.8.0
  Date: Sat, 22 Sep 2018 04:59:56 GMT
  Content-Type: text/html
  Content-Length: 168
  Connection: keep-alive

I've opened [3] about getting Content-Encoding support via gzip_static
[4], but in the meantime, just assume that anything with a ".gz"
suffix is gzipped.

Unzipping is also somewhat expensive (although not as expensive as
network requests).  It would be nice for callers to be able to
configure whether we cache the compressed images (less disk
consumption) or the uncompressed images (less CPU load when launching
clusters).  For now, we're just caching the network response.

[1]: https://tools.ietf.org/html/rfc7231#section-3.1.1.5
[2]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
[3]: https://projects.engineering.redhat.com/browse/COREOS-593
[4]: http://nginx.org/en/docs/http/ngx_http_gzip_static_module.html
Generated with:

  $ bazel run //:gazelle

using:

  $ bazel version
  Build label: 0.17.2- (@non-git)
  Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
  Build time: Fri Sep 21 15:04:25 2018 (1537542265)
  Build timestamp: 1537542265
  Build timestamp as int: 1537542265
@wking wking force-pushed the cache-libvirt-image branch from 5f705cf to 85286c5 Compare September 22, 2018 05:14
@wking
Copy link
Member Author

wking commented Sep 22, 2018

Updated to unzip .gz URIs for you.

@crawford crawford mentioned this pull request Sep 23, 2018
@crawford
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@crawford
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2018
@openshift-merge-robot openshift-merge-robot merged commit 4cfda6f into openshift:master Sep 23, 2018
@wking wking deleted the cache-libvirt-image branch September 23, 2018 21:13
wking added a commit to wking/openshift-installer that referenced this pull request Nov 6, 2018
The new pipeline handles Content-Encoding gzip:

  $ curl -LI --compressed https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.73/redhat-coreos-maipo-47.73-qemu.qcow2
  HTTP/1.1 302 Moved Temporarily
  Server: nginx/1.12.1
  Date: Tue, 06 Nov 2018 20:55:06 GMT
  Content-Type: text/html
  Content-Length: 161
  Location: https://d26v6vn1y7q7fv.cloudfront.net/releases/maipo/47.73/redhat-coreos-maipo-47.73-qemu.qcow2
  Set-Cookie: ...; path=/; HttpOnly; Secure

  HTTP/1.1 200 OK
  Content-Type: binary/octet-stream
  Content-Length: 726878219
  Connection: keep-alive
  Date: Tue, 06 Nov 2018 20:23:37 GMT
  Last-Modified: Tue, 06 Nov 2018 17:56:36 GMT
  ETag: "021080ef3b515d2443be3749ebbb0b08-87"
  Content-Encoding: gzip
  Accept-Ranges: bytes
  Server: AmazonS3
  Age: 1890
  X-Cache: Hit from cloudfront
  Via: 1.1 d85d7507ed6501757cfe600c02a26c7d.cloudfront.net (CloudFront)
  X-Amz-Cf-Id: ...

so we can rely on Go's default compression handling.  See
DisableCompression, which defaults to false, in [1].  To confirm the
default handling, try to retrieve from a local server:

  $ export OPENSHIFT_INSTALL_LIBVIRT_IMAGE=http://localhost:8080/example.qcow
  $ openshift-install --dir=wking create cluster
  INFO Fetching OS image...
  FATAL Error executing openshift-install: failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get Tfvars: failed to use cached libvirt image: Get http://localhost:8080/example.qcow: EOF

In another terminal, I was using Ncat as a dummy server to capture
request headers:

  $ nc -l -p 8080 </dev/null
  GET /example.qcow HTTP/1.1
  Host: localhost:8080
  User-Agent: Go-http-client/1.1
  Accept-Encoding: gzip

This commit drops the now-unnecessary suffix handling from 7abe94d
(config/libvirt/cache: Decompress when URI has ".gz" suffix,
2018-09-21, openshift#280).

[1]: https://golang.org/pkg/net/http/#Transport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants