Skip to content

Commit

Permalink
✨ update docs / remove references to old gdrive images flow
Browse files Browse the repository at this point in the history
  • Loading branch information
ikesau committed Nov 26, 2024
1 parent 267c426 commit b55f2f4
Show file tree
Hide file tree
Showing 5 changed files with 6 additions and 66 deletions.
2 changes: 1 addition & 1 deletion db/model/Gdoc/GdocBase.ts
Original file line number Diff line number Diff line change
Expand Up @@ -663,7 +663,7 @@ export class GdocBase implements OwidGdocBaseInterface {
}

/**
* Load image metadata from the database. Does not check Google Drive or sync to S3
* Load image metadata from the database.
*/
async loadImageMetadataFromDB(
knex: db.KnexReadonlyTransaction,
Expand Down
2 changes: 0 additions & 2 deletions ops/buildkite/deploy-content
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,6 @@ sync_to_r2_aws() {
sync_baked_data_to_r2() {
echo '--- Sync baked data to R2'
# Cloudflare Pages has limit of 20000 files
# NOTE: There's also images/published, which are the gdocs images synced from GDrive.
# There's currently a small-enough amount of them, but we need to sync them to R2 or Cloudflare Images at some point.
# NOTE: aws is about 3x faster than rclone
sync_to_r2_aws grapher/exports # 9203 files
sync_to_r2_aws exports # 3314 files
Expand Down
12 changes: 0 additions & 12 deletions packages/@ourworldindata/types/src/gdocTypes/Image.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,5 @@
import { DbEnrichedImage } from "../dbTypes/Images.js"

// This is the JSON we get from Google's API before remapping the keys to be consistent with the rest of our interfaces
export interface GDriveImageMetadata {
name: string // -> filename
modifiedTime: string // -> updatedAt e.g. "2023-01-11T19:45:27.000Z"
id: string // -> googleId e.g. "1dfArzg3JrAJupVl4YyJpb2FOnBn4irPX"
description?: string // -> defaultAlt
imageMediaMetadata?: {
width?: number // -> originalWidth
height?: number // -> originalHeight
}
}

// All the data we use in the client to render images
// everything except the ID, effectively
export type ImageMetadata = Pick<
Expand Down
2 changes: 1 addition & 1 deletion packages/@ourworldindata/types/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ export {
type UnformattedSpan,
} from "./gdocTypes/Spans.js"

export type { GDriveImageMetadata, ImageMetadata } from "./gdocTypes/Image.js"
export type { ImageMetadata } from "./gdocTypes/Image.js"
export {
ALL_CHARTS_ID,
LICENSE_ID,
Expand Down
54 changes: 4 additions & 50 deletions site/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,65 +22,19 @@ A Google Doc can be written and registered via the `/admin/gdocs` view in the ad

This content is only updated in an environment's database when someone presses "publish" from the Google Doc preview (`/admin/gdocs/google_doc_id/preview`)

## Images in Google Docs
## Images

To match Google Docs' "one document, many environments" paradigm, the source of images for all environments is a Shared Drive. An image is referenced, in Archie, via filename, which we use to find the entity via Google Drive's API.

e.g.
Image blocks can be added to gdocs via the follow archie syntax:

```
{.image}
filename: my_image.png
{}
```

This means that the filenames of images uploaded to the Shared Drive **must be unique**.

We chose to do it this way instead of via Google Drive File ID because it's easier to read and sanity check. We also considered inline images, but Google Docs doesn't support inline SVGs and downsizes images wider than 1600px.

We mirror these images to Cloudflare's R2 to allow environments to have some amount of independence from one another. For OWID developers, the env variables needed for this functionality are stored in our password manager.

It is recommended to use a unique folder in R2 for each environment. By convention, `dev-$NAME` for your local development server (when you run `make create-if-missing.env.full` for the first time, this will be generated based on your unix `$USER` variable by default), and one for your staging server too (e.g. `neurath`).

### Baking images

During the baking process (`bakeDriveImages`) we:

1. Find the filenames of all the images that are currently referenced in published Google documents (the `posts_gdocs_x_images` table stores this data)
2. See if we've already uploaded them to R2 (by checking the `images` table, which is only updated after we've successfully mirrored the image to R2)
3. Mirror them to R2 if not
4. Pull all the images from R2
5. Create optimized WEBP versions of each image at multiple resolutions
6. Save them, and the original file, into the assets folder

### Previewing images

The preview flow is slightly different. If a document with an image in it is previewed, we also fetch the image from the Shared Drive and upload it to R2, but we don't do the resizing - we just display the source image via R2's CDN. This logic is all contained in the [Image component](gdocs/Image.tsx).
where `my_image.png` is an image that has been uploaded via the `/admin/images` view in the admin client, and thus exists in Cloudflare Images.

### Gotchas

#### Updating images

If an image has changed since we last uploaded it to R2 (e.g. a new version has been uploaded, or its description has changed) we'll re-upload the file to R2. This happens even if you're only previewing a document that references the image, regardless of whether or not you re-publish it.

This means that any other documents that reference the image will use the updated version during the next bake, even if they haven't been republished. This seemed preferable to tracking version state and having to manually update every article whenever you update an image.

#### Refreshing a database

If you are refreshing your environment's database by importing a database dump from prod, the prod `images` table may make claims about the existence of files in your environment's S3 folder that aren't true, which will lead to 403 errors when trying to bake.

In this project's root Makefile, we have a make command (`make sync-images`) that runs `rclone sync` from prod to your environment to solve this problem. Make sure your `~/.config/rclone/rclone.conf` is configured correctly and contains

```
[owid-r2]
type = s3
provider = Cloudflare
env_auth = true
access_key_id = xxx
secret_access_key = xxx
region = auto
endpoint = https://078fcdfed9955087315dd86792e71a7e.r2.cloudflarestorage.com
```
We store information about the image's dimensions and alt text in the database, which is shared via React content to any component that needs to render them. See `Image.tsx` for the (many) implementation details.

## Data Catalog

Expand Down

0 comments on commit b55f2f4

Please sign in to comment.