Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate from netlify to GCS #849

Open
atvaccaro opened this issue Aug 23, 2023 · 4 comments
Open

migrate from netlify to GCS #849

atvaccaro opened this issue Aug 23, 2023 · 4 comments
Assignees
Labels
portfolio work Work related to the analytics portfolio tooling Work related to the management of our tooling and shared modules

Comments

@atvaccaro
Copy link
Contributor

atvaccaro commented Aug 23, 2023

We briefly discussed future plans for the portfolio site awhile back (e.g. #715) and we decided it's time to finally start exploring this migration. We've struggled with netlify recently (specifically deploying individual sites under redirects) and we don't have paid support, so it's probably worthwhile to just migrate to a static site served from a GCS bucket.

The general steps are:

  1. Create a new bucket and configure it as a GCP-hosted static site behind a load balancer; see this document created by the services team for gtfs.calitp.org
  2. Change portfolio.py to write to the proper subpaths in the bucket for individual sites; this would involve replacing netlify deploy with gsutil or gcsfs
  3. (Potential) May have to configure JupyterBook to render links etc. under a subpath
  4. Remove _redirects and modify index.html if needed to function properly as the bucket home page
  5. (Optional) Configure CI to allow deploys with a button (i.e. workflow dispatches) for at least the index
@atvaccaro atvaccaro changed the title Proposal: migrate from netlify to GCS migrate from netlify to GCS Aug 31, 2023
@tiffanychu90 tiffanychu90 added tooling Work related to the management of our tooling and shared modules portfolio work Work related to the analytics portfolio labels Nov 15, 2023
@evansiroky evansiroky added portfolio work Work related to the analytics portfolio tooling Work related to the management of our tooling and shared modules and removed portfolio work Work related to the analytics portfolio tooling Work related to the management of our tooling and shared modules labels Sep 19, 2024
@mjumbewu
Copy link
Contributor

mjumbewu commented Oct 15, 2024

I'm not sure whether it is new since the Cal-ITP GTFS Hosting doc was written, but GCP has a set of steps for this at https://cloud.google.com/storage/docs/hosting-static-website#command-line_1. The general steps above still apply, but that documentation is useful for the specific gcloud commands.

  • Verify that analysts' GCP accounts can publish to GCS (for manual deployment workflow)

    It looks like analysts accounts have the roles/storage.objectAdmin role assigned to them. At the very least, new users should have the CustomGCSPublisher role defined below assigned to them.

  • Update the documentation for publishing to the portfolio.

  • Create new draft and production buckets and configure them as a GCP-hosted static sites; see the GCS docs

    Looking through the existing buckets (for naming patterns), there are three that are public:

    • calitp-map-tiles which (I hope) contains map tiles
    • calitp-publish-data-analysis which contains downloadable data assets that are useful for analysis
    • calitp-metabase-data-public which contains GeoJSON boundary files for metabase to use in aggregations

    I'll stick to the calitp- prefixing and call this one calitp-data-analyses-portfolio. Note that a pattern that some use is to use the domain name as the bucket name (or as a subset of the bucket name), and this does not follow that pattern.

    Command:

    # Production bucket
    gcloud storage buckets create gs://calitp-data-analyses-portfolio \
      --project=cal-itp-data-infra \
      --location=us-west2
      
    gcloud storage buckets add-iam-policy-binding gs://calitp-data-analyses-portfolio \
      --member=allUsers \
      --role=roles/storage.objectViewer
    
    gcloud storage buckets update gs://calitp-data-analyses-portfolio \
      --web-main-page-suffix=index.html
    
    # Draft bucket
    gcloud storage buckets create gs://calitp-data-analyses-portfolio-draft \
      --project=cal-itp-data-infra \
      --location=us-west2
      
    gcloud storage buckets add-iam-policy-binding gs://calitp-data-analyses-portfolio-draft \
      --member=allUsers \
      --role=roles/storage.objectViewer
    
    gcloud storage buckets update gs://calitp-data-analyses-portfolio-draft \
      --web-main-page-suffix=index.html \
      --lifecycle-file=<(echo '{ "rule": [ { "action": {"type": "Delete"}, "condition": {"age": 365} } ] }')
  • Configure the production bucket behind a load balancer with the instructions at in the GCS docs; use a single-region deployment in us-west2; name it calitp-application-lb

  • Create a GCP role and a service account that can be used to deploy from GH Actions

    This role should be set up similar to the GTFS flex hosting IAM configuration. Ideally these would be configured as IAC with terraform in the data-infra repo.

    • Create IAM role

      ID projects/cal-itp-data-infra/roles/CustomGCSPublisher

      8 assigned permissions

      • resourcemanager.projects.get
      • storage.buckets.get
      • storage.buckets.list
      • storage.objects.create
      • storage.objects.delete
      • storage.objects.get
      • storage.objects.list
      • storage.objects.update

      Command:

      gcloud iam roles create CustomGCSPublisher \
        --project=cal-itp-data-infra \
        --title="Custom GCS Publisher" \
        --description="Custom role for publishing to GCS" \
        --permissions=resourcemanager.projects.get,storage.buckets.get,storage.buckets.list,storage.objects.create,storage.objects.delete,storage.objects.get,storage.objects.list,storage.objects.update
    • Create Service account

      Email gh-actions-publisher@cal-itp-data-infra.iam.gserviceaccount.com

      Assign Role projects/cal-itp-data-infra/roles/CustomGCSPublisher

      Command:

      gcloud iam service-accounts create gh-actions-publisher \
        --project=cal-itp-data-infra \
        --description="Service account for GH Actions to publish to GCS" \
        --display-name="GH Actions Publisher"
    • Create a Workload Identity Pool

      Instead of storing the service account key in the GH repo secrets, we can use Workload Identity to allow the GH Actions to authenticate as the service account. As the GCP docs say:

      [S]ervice account keys are powerful credentials, and can present a security risk if they are not managed correctly. Workload Identity Federation eliminates the maintenance and security burden associated with service account keys.

      https://cloud.google.com/iam/docs/workload-identity-federation#why

      Here are a couple good resources that give some background:

      Command:

      gcloud iam workload-identity-pools create gh-actions-publisher-pool \
        --project=cal-itp-data-infra \
        --location=us-west2 \
        --description="Workload Identity Pool for GH Actions to publish to GCS"
    • Create a Workload Identity Provider

      Command:

      gcloud iam workload-identity-pools providers create-oidc gh-actions-publisher-provider \
        --project=cal-itp-data-infra \
        --location=us-west2 \
        --display-name="GH Actions Publisher Provider" \
        --workload-identity-pool=gh-actions-publisher-pool \
        --attribute-mapping="google.subject=assertion.sub,attribute.repository_owner=assertion.repository_owner,attribute.repository=assertion.repository" \
        --issuer-uri=https://token.actions.githubusercontent.com
    • Allow authentication from the provider to impersonate the service account by adding the iam.workloadIdentityUser role to the service account

      Command:

      gcloud iam service-accounts add-iam-policy-binding "gh-actions-publisher@cal-itp-data-infra.iam.gserviceaccount.com" \
        --project=cal-itp-data-infra \
        --role=roles/iam.workloadIdentityUser \
        --member="principalSet://iam.googleapis.com/projects/cal-itp-data-infra/locations/us-west2/workloadIdentityPools/gh-actions-publisher-pool/attribute.repository_owner/cal-itp"
  • Update the GH Actions workflow to authenticate as the service account

    Use the google-github-actions/auth action. This will set the default application credentials for the gcloud CLI to pick up.

    permissions:
      contents: read
      id-token: write
    
    steps:
      ...
    - name: Authenticate as GCP service account
      uses: google-github-actions/auth@v2
      with:
        workload_identity_provider: 'projects/cal-itp-data-infra/locations/us-west2/workloadIdentityPools/gh-actions-publisher-pool/providers/gh-actions-publisher-provider'
        service_account: 'gh-actions-publisher@cal-itp-data-infra.iam.gserviceaccount.com'
  • Add the bucket to the action as an environment variable

    env:
      GCS_BUCKET: calitp-data-analyses-portfolio
  • Change portfolio.py to write to the proper subpaths in the bucket for individual sites; this would involve replacing netlify deploy with gcloud storage cp. Note that there is a GitHub Action that Google quasi-maintains for uploading to Cloud Storage, but I am opting to use the gcloud CLI instead, as it makes for easier local testing, and is more modular (in case something besides GCS).

  • (Potential) May have to configure JupyterBook to render links etc. under a subpath

  • Remove _redirects and modify index.html if needed to function properly as the bucket home page

  • (Optional) Configure CI to allow deploys with a button (i.e. workflow dispatches) for at least the index

A consideration:

  • We use Netlify aliases for deploying preview instances. How would we want to manage that here?

@ohrite
Copy link

ohrite commented Oct 29, 2024

@mjumbewu once you have GCS credentials set up, let's make sure there's a runbook about GitHub GCS credential rotation as part of acceptance. Maybe this is something to pair on?

@mjumbewu
Copy link
Contributor

Execution of the above steps is on hold until mid-January, when we have a conversation with Compiler and Caltrans folks about a strategy for static sites going forward. See this Slack thread for more information.

@mjumbewu
Copy link
Contributor

After a call with @evansiroky and Compiler on Jan 14, here are a few updates:

  • At a high level, Evan is interested in ensuring that all the static sites are hosted and administered with Caltrans-owned infrastructure.
  • There are a number of parts to consider across hosting, CDN, DNS
  • One of the main features they use in Netlify is preview sites, which doesn't have a direct replacement with GCS
  • There is a recording available for all but the first ~10 minutes of the meeting -- see https://docs.google.com/document/d/1G_bGlyv1DoJuFCT9romNtyaDLYKoG4sj57qkwMWNWQw/edit?usp=sharing

@HaroldBooker HaroldBooker assigned evansiroky and unassigned mjumbewu Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
portfolio work Work related to the analytics portfolio tooling Work related to the management of our tooling and shared modules
Projects
None yet
Development

No branches or pull requests

6 participants