Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition jQuery CDN from StackPath to Fastly #30

Closed
20 tasks done
Krinkle opened this issue Sep 10, 2023 · 10 comments
Closed
20 tasks done

Transition jQuery CDN from StackPath to Fastly #30

Krinkle opened this issue Sep 10, 2023 · 10 comments
Assignees
Labels
Service: jQuery CDN code.jquery.com

Comments

@Krinkle
Copy link
Member

Krinkle commented Sep 10, 2023

General

  • Document recent traffic profile. https://github.com/jquery/infrastructure-puppet/blob/be7518e07a/doc/cdn.md#latest-statistics
  • Document current CDN settings at StackPath (from Highwinds StrikeTracker). https://github.com/jquery/infrastructure-puppet/blob/be7518e07a/doc/cdn.md#highwinds-configuration
  • Create Fastly account, set up delegate access to 2+ admins with 2FA enabled.
  • TLS: Upload custom *.jquery.com certificate.
  • DNS: We prefer CNAME flattening to reduce lookups. Okay?
  • DNS We generally prefer 24h TTL to reduce lookups (shorter during switchover). Okay?
  • DNS: Figure out the correct entrypoint that satisfies out TLS and Networking preferences:
    • Dual stack IPv4 + IPv6.
    • HTTPS with HTTP2 and HTTP1.1
    • HTTP with HTTP1.1 (no redirects).
    • TLS 1.2+ configured such that it is compatible with at least IE9/Win7 for compat with current setup and customer expectations. Ref Renew star.jquery.com cert (expires 14 July 2023) #21.
  • Service: Gzip enabled with strongest settings.
  • Service: Ignore URL query parameters for caching, to reduce origin load.
  • Service: Treat URLs as case-insensitive such that /jQuery-foo.js is able to match /jquery-foo.js.
  • Final confirmation that account is ready to handle 2.2 PB bandwidth per month with peaks of 30K req/s and 8.9Gbps (see traffic profile). E.g. no relevant limitations, quotas, or trial modes in place.

Testing

  • Compression don't poison the cache (either split, or shared and decompressed by edge).
  • Case insensitive URLs don't poison the cache.
  • Various desktop and mobile browsers on real devices.
  • Use curl to try every combination of -4, -6, --http1.1, --http2, --tls-max 1.2, --tls-max 1.3, http+https URLs (except http2 over HTTP) and confirm HTTP 200 OK (esp no redirect). Use --connect-to ::SOMETHING.global.fastly.net to test prior to deploying any DNS changes.

Deployment

Three services overall: code, content, releases.

  1. code: Switch low-traffic alias codeorigin.jquery.com for functional testing.
  2. content: Switch completely, including aliases.
  3. releases: Switch stage.releases.jquery.com for functional testing.
  4. releases: Switch releases.jquery.com. First significant exposure. This is aimed at developers during development, not in production, not in critical path.
  5. code: Update our high-traffic doc sites https://jquery.com and https://api.jquery.com to use codeorigin.jquery.com instead of code.jquery.com. This significantly increases exposure to learn of any connectivity issues that may be specific to uncommon browsers, geography/ISPs, firewalls.
  6. code: The big one Switch code.jquery.com.
  7. code: Switch our high-traffic doc sites back to using the "code.jquery.com" canonical name.

Examples of past issues:

Post-deployment

@Krinkle Krinkle added the Service: jQuery CDN code.jquery.com label Sep 10, 2023
@Krinkle Krinkle self-assigned this Sep 10, 2023
@Krinkle
Copy link
Member Author

Krinkle commented Sep 10, 2023

  • DNS: We prefer CNAME flattening to reduce lookups. Okay?
  • DNS We generlaly prefer 24h TTL to reduce lookups (shorter during switchover). Okay?

Confirmed with Fastly Support. These are fine.

@Krinkle
Copy link
Member Author

Krinkle commented Sep 10, 2023

DNS: Figure out the correct entrypoint that satisfies out TLS and Networking preferences.

@supertassu and I read through these pages:

We settled on dualstack.t.sni.global.fastly.net to start the first deployment stages.

@Krinkle
Copy link
Member Author

Krinkle commented Sep 10, 2023

After experimenting with ignoring query strings via VCL-like header configuration (https://docs.fastly.com/en/guides/making-query-strings-agnostic), cache objects seem to get mixed up between compressed and uncompressed responses.

$ curl https:/releases.jquery.com/qunit/?42a --connect-to ::dualstack.t.sni.global.fastly.net -I --compressed

HTTP/2 200 
…
content-encoding: gzip
accept-ranges: bytesage: 0
x-served-by: cache-lhr7351-LHR
x-cache: MISS
$ curl https:/releases.jquery.com/qunit/?42a --connect-to ::dualstack.t.sni.global.fastly.net -i
HTTP/2 200 
…
content-encoding: gzip
accept-ranges: bytes
age: 41
x-served-by: cache-lhr7345-LHR
x-cache: HIT
x-cache-hits: 1
Warning: Binary output can mess up your terminal.

Krinkle added a commit to jquery/jquery-wp-content that referenced this issue Sep 10, 2023
Only calls that are hidden, i.e. not in docs or examples.

Ref jquery/infrastructure-puppet#30.
@Krinkle
Copy link
Member Author

Krinkle commented Sep 10, 2023

Fastly Support helped us realize that this was actually an issue on our end due to the origin server for releases.jquery.com not sending Vary: Accept-Encoding.

@supertassu traced this down to a mistake on the new WordPress servers. We forgot to set gzip_vary on in the nginx config. Oddly enough, Debian defaults to:

        gzip on;
        gzip_comp_level 6;

But leaves gzip_vary unset, which defaults to off per http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_vary. That seems like a bug in the Debian nginx package. Something we should look at upstreaming.

The old WordPress servers did the same in the private repo at https://github.com/jquery/infrastructure, but we missed it during the conversion.

We caught this before switching DNS, and codeorigin was not affected either way as it already sets the vary header correctly.

@Krinkle
Copy link
Member Author

Krinkle commented Sep 10, 2023

Service: Ignore URL query parameters for caching, to reduce origin load.

This can be done via custom VCL per https://developer.fastly.com/reference/vcl/variables/client-request/req-url-path/. But, an easier way is in the GUI under "Headers". This is slightly confusing as it's not actually a header, but you can use it to configure VCL expressions with the rest done automatically.

Documented at https://docs.fastly.com/en/guides/making-query-strings-agnostic.

Worked great.

Service: Treat URLs as case-insensitive

Similarly, done through another "Header" using the std.lower() expression per https://developer.fastly.com/reference/vcl/functions/strings/std-tolower/.

Ignore query strings (Request / Set)

  • Destination: url
  • Source: req.url.path
  • Ignore if set: No
  • Priority: 10

Case-insensitive URLs (Request / Set)

  • Destination: url
  • Source: std.tolower(req.url)
  • Ignore if set: No
  • Priority: 15

@Krinkle
Copy link
Member Author

Krinkle commented Sep 10, 2023

In the first rounds of testing we bumped against a connectivity problem that looks like it may have to do with how the TLS configuration at Fastly. Here is what we knew:

Our base expectation is for HTTPS support to starts at IE9-11 on Windows 7. Ref #21.

We don't expect IE8 or Windows XP to work, since we already moved to TLS 1.2 at some point during the StackPath era.

Via BrowserStack, in Windows 8 and IE 11, I can load these URLs without issue:

They also work fine in IE 11, IE 10, and IE 9 on Windows 7.

Using the same Win8/IE11 browser, https://codeorigin.jquery.com/mobile/1.4.0/images/icons-png/eye-black.png consistently fails with a connection error. It also fails in IE 11, IE 10, and IE 9 on Win 7. Note "codeorigin" vs "code", where codeorigin uses our new Fastly deployment.


Fastly Support responded:

[…] Fastly provides the following TLS cipher suite.

sslscan codeorigin.jquery.com
Preferred TLSv1.2 128 bits ECDHE-RSA-AES128-GCM-SHA256 Curve 25519 DHE 253
Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-GCM-SHA384 Curve 25519 DHE 253
Accepted TLSv1.2 256 bits ECDHE-RSA-CHACHA20-POLY1305 Curve 25519 DHE 253

OpenSSL name -> IANA name
ECDHE-RSA-AES128-GCM-SHA256 -> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
ECDHE-RSA-AES256-GCM-SHA384 -> TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
ECDHE-RSA-CHACHA20-POLY1305 -> TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256

Please see below URL. https://learn.microsoft.com/en-us/windows/win32/secauthn/tls-cipher-suites-in-windows-7

Windows 7 does not support those cipher suites and the connection will fail.

For compatibility, we have a TLS configuration (CNAME k.sni.global.fastly.net) with cipher suites in CBC mode.

% sslscan k.sni.global.fastly.net
Preferred TLSv1.2 128 bits ECDHE-RSA-AES128-GCM-SHA256 Curve 25519 DHE 253
Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-GCM-SHA384 Curve 25519 DHE 253
Accepted TLSv1.2 256 bits ECDHE-RSA-CHACHA20-POLY1305 Curve 25519 DHE 253
Accepted TLSv1.2 128 bits ECDHE-RSA-AES128-SHA256 Curve 25519 DHE 253
Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-SHA384 Curve 25519 DHE 253
Accepted TLSv1.2 128 bits ECDHE-RSA-AES128-SHA Curve 25519 DHE 253
Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-SHA Curve 25519 DHE 253

OpenSSL name -> IANA name
ECDHE-RSA-AES128-SHA256 -> TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
ECDHE-RSA-AES256-SHA384 -> TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
ECDHE-RSA-AES128-SHA -> TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
ECDHE-RSA-AES256-SHA -> TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

[…]

Switching our experimental deployment from t.sni to k.sni (dualstack.k.sni.global.fastly.net) resolved the issue. The new deployment now also works in IE11 on Windows 8 and IE 9 on Windows 7.

@Krinkle
Copy link
Member Author

Krinkle commented Sep 18, 2023

Traffic levels at Fastly over the past week, prior to the big switch. This represents the "codeorigin.jquery.com" canary experiment at about 4 requests per second every second (or 11-22K requests per hour), with an increase on 9 September when we updated our first-party documentation sites (jquery.com, api.jquery.com) to load jquery-3.7.1.min.js from the canary deployment at codeorigin.jquery.com instead of the canonical code.jquery.com.

Screenshot

We built most of our confidence with the canary traffic over the two weeks, using https://releases.jquery.com and codeorigin.jquery.com. Prior to the big switch we also made sure the code.jquery.com DNS entry was set to highest DNS TTL that Cloudflare supports (1 day, 24 hours), so that to big switch would go as slowly as possible within the limitation of Cloudflare's DNS system (no geographic variance unfortunately). DNS tends to roll over pretty quickly for 99% of traffic, so in practice this doesn't make much difference, but it's something.

The big switch took place on Friday 15 Sept around 17:55 UTC. Over the two weeks prior we were doing around 4 req/s (180/min, or 0.1% of jQuery CDN traffic). Within the first five minutes this went up to 16,000 requests per second (1M per minute, or about half our of our traffic):

Screenshot

Over the course of the next hour we recieved around 90% of our normal traffic, and within a day 99% of the 22K-30K requests per second we normally do.

Screenshot

From the other side

From the StackPath side, we started around 27,000 requests per second (HTTPS+HTTP) on Friday 15 Sep 2023 at 14:30 UTC, a few hours before the switch.

Screenshot

Draining down to about 300 req/s by 21:00 UTC:

Screenshot

Today, we are still serving about 100 requests per second every second through the StackPath service. It's been two full days since the switch, and it's 24 hours after the old DNS entry should have expired from DNS resolves by Internet service providers, device operating systems, and web browsers. While this is proportionally small (<1%), it's still more than 20X the "experimental" amount of traffic we received on Fastly during the two weeks prior to the switch. Hopefully this will drain within another week or so.

Screenshot

Krinkle added a commit to jquery/jquery-wp-content that referenced this issue Sep 18, 2023
@Krinkle
Copy link
Member Author

Krinkle commented Sep 25, 2023

Continuing to slowly drain from 100 rps on 17 Sept 2023 to about 60 rps today (HTTPS: 10 rps, HTTP: 50 rps).

Still totalling about 50 million requests between 18 Sept and 25 Sept.

Breakdown:

Screenshot

@Krinkle
Copy link
Member Author

Krinkle commented Oct 13, 2023

Fastly logo added to https://releases.jquery.com as of jquery/jquery-wp-content@28542b5.

@Krinkle
Copy link
Member Author

Krinkle commented Mar 29, 2024

It turns out, major Internet infrastructure doesn't "just" shut down, does it?

The Highwinds StrikeTracker portal is still up six months later, and there's even some decent traffic still coming through.

From 25 Sep 2023 to 25 March 2024

Screenshot 2024-03-29 at 12 01 57

Last 6 months (1 Oct 2023 to 29 March 2024)

  • The slow decline stabalised around 23 October.
  • Another major drop around 15 February. Hard to tell if this is downstream DNS finally updating, or whether this is StackPath infrastructure turning pieces off.
  • Overall 760 million requests in the past 6 months
Screenshot 2024-03-29 at 12 03 06

Year to date (1 Jan to 29 March 2024)

Seems to not want to go below 40 rps. These could be health checks for all I know, although it seems a bit much.

Screenshot 2024-03-29 at 12 03 55

Krinkle added a commit to jquery/jquerymobile.com that referenced this issue Apr 16, 2024
add link to jQuery CDN landing page instead,
where current sponsor logo is shown.

Ref jquery/infrastructure-puppet#30.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Service: jQuery CDN code.jquery.com
Development

No branches or pull requests

1 participant