Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github action for mirroring repository content to an S3 bucket #47

Merged
merged 1 commit into from
Jul 29, 2020

Conversation

cameel
Copy link
Member

@cameel cameel commented Jul 23, 2020

Part of ethereum/solidity#9258.

The action is configured to run on every push to gh-pages and use AWS CLI tool to sync modified files (which is determined by timestamps and file sizes).

Secrets

This action requires two secrets to be added in repository settings:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

To get them:

It will show you the key ID and secret. You can't see them again once you close the dialog - in that case just generate a new pair and delete the old one.

I created this user specifically for this action and its privileges are limited to reading/writing S3 buckets (note: all buckets, not just solc-bin). It can't access any other AWS services.

@cameel cameel requested review from aarlt and chriseth July 23, 2020 17:00
@cameel cameel self-assigned this Jul 23, 2020
@cameel
Copy link
Member Author

cameel commented Jul 23, 2020

Here's a bash script you can use to make sure that all files are actually available on S3 (you could actually use it against any host, it's not S3-specific). If you run it in your local checkout of solc-bin it will go through all the files, download the equivalents from S3 one by one and diff them.

For simplicity I did not make it use git ignore rules so make sure you have no stray unversioned files in there (e.g. node_modules). It does ignore files starting with dots and underscores though.

#!/usr/bin/env bash

set -e

s3_url="https://solc-bin.s3.eu-central-1.amazonaws.com"
tmp_dir="$(mktemp --directory)"

readarray -t files < <(find . -type f,l -regex '\./[^._].*' -printf "%P\n" | sort)

missing_file_count=0
different_file_count=0
file_count=0

mkdir -p "$tmp_dir"
for file in "${files[@]}"; do
    printf "%s: " "$file"

    if ! curl "${s3_url}/${file}" --output "${tmp_dir}/${file}" --fail --silent --no-progress-meter --create-dirs; then
        echo MISSING
        ((++missing_file_count))
    else
        if ! cmp --silent -- "$file" "${tmp_dir}/${file}"; then
            echo DIFFERENT
            ((++different_file_count))
        else
            echo OK
        fi
    fi

    rm "${tmp_dir}/${file}"
    ((++file_count))
done
rm -r "$tmp_dir"

echo "Total files: ${file_count}"
echo "Missing:     ${missing_file_count}"
echo "Different:   ${different_file_count}"

@cameel
Copy link
Member Author

cameel commented Jul 23, 2020

You can see the script in action in my fork of solc-bin: cameel/solc-bin > actions > Mirror repository content to an S3 bucket.

It's actually surprisingly fast. Even the full multi-GB upload takes only 5 min (plus another 5 for solc-bin checkout). I was expecting much worse.

I have the action in my fork hooked up to the solc-bin bucket right now so before merging this please remember to disable it - by removing current credentials of its user on AWS and generating new ones for this repo (as described above). Or just tell me to do it myself.

@cameel
Copy link
Member Author

cameel commented Jul 24, 2020

Looks like the nightly action committing a new build does not trigger the S3 sync action. That's a limitation of github actions.

I see two possible workarounds:

  1. Copy the S3 sync job to the nightly action. This adds some code duplication.
  2. Schedule the S3 sync to run e.g. 1 hour after the nightly. No duplication but there's either unnecessary delay or we risk syncing too soon and missing the build if it takes longer than usual. It also requires us to remember to update the sync schedule if we ever change the nightly schedule.

@chriseth
Copy link
Contributor

I think scheduling at roughly the right time is the better solution. We do not depend on this having a small delay - even 24 hours would be acceptable short-term.

@cameel cameel force-pushed the github-action-s3-mirror branch 5 times, most recently from 9008acd to 27cdb4b Compare July 27, 2020 16:02
@cameel
Copy link
Member Author

cameel commented Jul 27, 2020

Fine. Scheduled to run at 1:00 now.

And two more changes to account for S3 bucket updates not being atomic:

  • Syncing list.* files last. Clients won't see updated list with new entries until all files are already in the bucket. At least as long as we only add new binaries and never remove old ones.
  • Added turnstyle action to prevent running multiple sync operations at the same time (see https://github.com/ethereum/solc-bin/pull/47/files#r461004973).

Comment on lines +23 to +27
- name: Wait for other instances of this workflow to finish
# It's not safe to run two S3 sync operations concurrently with different files
uses: softprops/turnstyle@v1
with:
same-branch-only: no
Copy link
Member Author

@cameel cameel Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's no way to tell github to run only one instance of the action at a time. As a workaround I used the turnstyle action.

Is it OK to use it? I looked at its code briefly and I didn't see anything nefarious. The docs say that giving it GITHUB_TOKEN is required but seems to work without it just fine. So this PR should be completely safe (turnstyle can't access any secrets) but it's a third-party action so if we ever update it without reviewing the new code and we start putting any secrets (GITHUB_TOKEN or S3 keys) in env variables it could theoretically steal them and use them to modify files on github or in S3.

@cameel cameel force-pushed the github-action-s3-mirror branch 4 times, most recently from 97d87a8 to 517e31c Compare July 28, 2020 15:23
@cameel cameel force-pushed the github-action-s3-mirror branch from 517e31c to 4894a0f Compare July 28, 2020 15:32
@cameel
Copy link
Member Author

cameel commented Jul 28, 2020

Just one minor change: added --no-progress to aws s3 sync to get clearer logs.

@chriseth chriseth merged commit fdbc708 into gh-pages Jul 29, 2020
@chriseth chriseth deleted the github-action-s3-mirror branch July 29, 2020 21:45
@cameel
Copy link
Member Author

cameel commented Jul 30, 2020

Below is our current S3/cloudfront config.

I don't have a better place to put it and it's not important enough to be preserved in the repo but I'd like to be able to point someone at it if we ever need to revisit the configuration.

There's nothing here that could not be recreated with some trial and error but some values I set were different from defaults so having a record of it may make things easier for us in the future.

S3 bucket

  • Name: solc-bin

Bucket policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::solc-bin/*"
        },
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::solc-bin"
        }
    ]
}

EDIT 2021-01-07: originally the policy did not contain a permission for listing bucket contents and this made the server return HTTP 403 Forbidden (rather than 404) for missing files.
EDIT 2021-09-06: Amazon converted the config to JSON.

CORS configuration

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": []
    }
]

Cloudfront

Values in bold differ from defaults or values pre-filled by Amazon.

General

Parameter Value
Domain Name d1ilmvg25bc920.cloudfront.net
Alternate Domain Names (CNAMEs) solc-bin.ethereum.org
SSL Certificate Custom SSL Certificate (example.com)
Default Root Object README.html
Supported HTTP Versions HTTP/2, HTTP/1.1, HTTP/1.0
IPv6 Enabled
Delivery Method Web
Cookie Logging Off
Price Class Use All Edge Locations (Best Performance)
AWS WAF Web ACL -
Custom SSL Client Support -
Security Policy TLSv1.2_2021

EDIT 2021-03-12: we're serving README.html rather than README.md now.
EDIT 2021-09-06: Security Policy changed from TLSv1 to TLSv1.2_2021 (which might be the default now)

Origins and Origin Groups

Parameter Value
Origin ID S3-solc-bin
Origin Domain Name solc-bin.s3.amazonaws.com
Origin Path
Restrict bucket access No
Parameter Value
Origin ID S3-solc-bin-with-static-Origin-header
Origin Domain Name solc-bin.s3.amazonaws.com
Origin Path
Restrict bucket access No
Add custom header Origin: https://static-origin.example.com

EDIT 2021-09-14: Added S3-solc-bin-with-static-Origin-header origin.

Behaviors

  • Defaults for all paths
    Parameter Value
    Path Pattern Default (*)
    Cache Policy Managed-CachingOptimized

All behaviors have these settings in common:

Parameter Value
Origin or Origin Group S3-solc-bin-with-static-Origin-header
Viewer Protocol Policy HTTP and HTTPS
Allowed HTTP Methods GET, HEAD, OPTIONS
Field-level Encryption Config -
Cached HTTP Methods GET, HEAD, OPTIONS
Cache and origin request settings Use a cache policy and origin request policy
Origin Request Policy Managed-CORS-S3Origin
Smooth Streaming No
Restrict Viewer Access No
Compress Objects Automatically Yes

EDIT 2021-09-06: Removed the behavior that disabled caching for list files.
EDIT 2021-09-14: Switched origin to S3-solc-bin-with-static-Origin-header and enabled caching for OPTIONS requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants