Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(assets): Use entity-tags to revalidate cached remote images #12426

Merged
merged 12 commits into from
Dec 18, 2024
Merged
5 changes: 5 additions & 0 deletions .changeset/red-poems-pay.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'astro': minor
---

Improve asset caching allowing stale assets to be revalidated with entity tags
ematipico marked this conversation as resolved.
Show resolved Hide resolved
76 changes: 54 additions & 22 deletions packages/astro/src/assets/build/generate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,18 @@ import { getConfiguredImageService } from '../internal.js';
import type { LocalImageService } from '../services/service.js';
import type { AssetsGlobalStaticImagesList, ImageMetadata, ImageTransform } from '../types.js';
import { isESMImportedImage } from '../utils/imageKind.js';
import { type RemoteCacheEntry, loadRemoteImage } from './remote.js';
import { type RemoteCacheEntry, loadRemoteImage, revalidateRemoteImage } from './remote.js';

interface GenerationDataUncached {
cached: false;
cached: CacheStatus.Miss;
weight: {
before: number;
after: number;
};
}

interface GenerationDataCached {
cached: true;
cached: CacheStatus.Revalidated | CacheStatus.Hit;
ematipico marked this conversation as resolved.
Show resolved Hide resolved
}

type GenerationData = GenerationDataUncached | GenerationDataCached;
Expand All @@ -44,7 +44,17 @@ type AssetEnv = {
assetsFolder: AstroConfig['build']['assets'];
};

type ImageData = { data: Uint8Array; expires: number };
type ImageData = {
data: Uint8Array;
expires: number;
etag?: string | null;
};

enum CacheStatus {
Miss = 0,
Revalidated,
Hit,
}

oliverlynch marked this conversation as resolved.
Show resolved Hide resolved
export async function prepareAssetsGenerationEnv(
pipeline: BuildPipeline,
Expand Down Expand Up @@ -136,7 +146,9 @@ export async function generateImagesForPath(
const timeChange = getTimeStat(timeStart, timeEnd);
const timeIncrease = `(+${timeChange})`;
const statsText = generationData.cached
? `(reused cache entry)`
? generationData.cached === CacheStatus.Hit
? `(reused cache entry)`
: `(revalidated cache entry)`
: `(before: ${generationData.weight.before}kB, after: ${generationData.weight.after}kB)`;
const count = `(${env.count.current}/${env.count.total})`;
env.logger.info(
Expand Down Expand Up @@ -166,7 +178,7 @@ export async function generateImagesForPath(
await fs.promises.copyFile(cachedFileURL, finalFileURL, fs.constants.COPYFILE_FICLONE);

return {
cached: true,
cached: CacheStatus.Hit,
};
} else {
const JSONData = JSON.parse(readFileSync(cachedFileURL, 'utf-8')) as RemoteCacheEntry;
Expand All @@ -184,11 +196,29 @@ export async function generateImagesForPath(
await fs.promises.writeFile(finalFileURL, Buffer.from(JSONData.data, 'base64'));

return {
cached: true,
cached: CacheStatus.Hit,
};
} else {
await fs.promises.unlink(cachedFileURL);
}

// Try to revalidate the cache
if (JSONData.etag) {
const revalidatedData = await revalidateRemoteImage(options.src as string, JSONData.etag);
oliverlynch marked this conversation as resolved.
Show resolved Hide resolved

if (revalidatedData.data.length) {
// Image cache was stale, update original image to avoid redownload
originalImage = revalidatedData;
} else {
revalidatedData.data = Buffer.from(JSONData.data, 'base64');
oliverlynch marked this conversation as resolved.
Show resolved Hide resolved

// Freshen cache on disk
await writeRemoteCacheFile(cachedFileURL, revalidatedData);

await fs.promises.writeFile(finalFileURL, revalidatedData.data);
return { cached: CacheStatus.Revalidated };
}
}

await fs.promises.unlink(cachedFileURL);
}
} catch (e: any) {
if (e.code !== 'ENOENT') {
Expand All @@ -209,6 +239,7 @@ export async function generateImagesForPath(
let resultData: Partial<ImageData> = {
data: undefined,
expires: originalImage.expires,
etag: originalImage.etag,
};

const imageService = (await getConfiguredImageService()) as LocalImageService;
Expand Down Expand Up @@ -239,13 +270,7 @@ export async function generateImagesForPath(
if (isLocalImage) {
await fs.promises.writeFile(cachedFileURL, resultData.data);
} else {
await fs.promises.writeFile(
cachedFileURL,
JSON.stringify({
data: Buffer.from(resultData.data).toString('base64'),
expires: resultData.expires,
}),
);
await writeRemoteCacheFile(cachedFileURL, resultData as ImageData);
}
}
} catch (e) {
Expand All @@ -259,7 +284,7 @@ export async function generateImagesForPath(
}

return {
cached: false,
cached: CacheStatus.Miss,
weight: {
// Divide by 1024 to get size in kilobytes
before: Math.trunc(originalImage.data.byteLength / 1024),
Expand All @@ -269,6 +294,17 @@ export async function generateImagesForPath(
}
}

async function writeRemoteCacheFile(cachedFileURL: URL, resultData: ImageData) {
return await fs.promises.writeFile(
cachedFileURL,
JSON.stringify({
data: Buffer.from(resultData.data).toString('base64'),
expires: resultData.expires,
etag: resultData.etag,
}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON.stringify and Buffer.from can be both fail at runtime, we should handle the errors somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wrapped this in a try catch block which will print a warning and no-op, rather than throwing an Error. The asset should just skip the cache in this case.

);
}

export function getStaticImageList(): AssetsGlobalStaticImagesList {
if (!globalThis?.astroAsset?.staticImages) {
return new Map();
Expand All @@ -279,11 +315,7 @@ export function getStaticImageList(): AssetsGlobalStaticImagesList {

async function loadImage(path: string, env: AssetEnv): Promise<ImageData> {
if (isRemotePath(path)) {
const remoteImage = await loadRemoteImage(path);
return {
data: remoteImage.data,
expires: remoteImage.expires,
};
Comment on lines -282 to -286
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what happened here, but thank you for cleaning it, ha

return await loadRemoteImage(path);
}

return {
Expand Down
36 changes: 35 additions & 1 deletion packages/astro/src/assets/build/remote.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import CachePolicy from 'http-cache-semantics';

export type RemoteCacheEntry = { data: string; expires: number };
export type RemoteCacheEntry = { data: string; expires: number; etag?: string };

export async function loadRemoteImage(src: string) {
const req = new Request(src);
Expand All @@ -19,6 +19,40 @@ export async function loadRemoteImage(src: string) {
return {
data: Buffer.from(await res.arrayBuffer()),
expires: Date.now() + expires,
etag: res.headers.get('Etag'),
};
}

export async function revalidateRemoteImage(src: string, etag: string) {
oliverlynch marked this conversation as resolved.
Show resolved Hide resolved
const req = new Request(src, { headers: { 'If-None-Match': etag } });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth also supporting If-Modified-Since, and storing the Last-Modified or Date for image responses that don't include an etag.

const res = await fetch(req);

if (!res.ok && res.status != 304) {
oliverlynch marked this conversation as resolved.
Show resolved Hide resolved
throw new Error(
`Failed to revalidate cached remote image ${src}. The request did not return a 200 OK / 304 NOT MODIFIED response. (received ${res.status}))`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error isn't actionable. If a user sees that, they don't know what to do in order to fix it. We should provide a better error that gives the user an action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a bad error, however this is handled the same way in the existing loadRemoteImage function I based the revalidateRemoteImage function off of, which provides similarly un-actionable errors.

I can potentially handle common cases like 404 with custom messages although there are many status codes where it would be difficult to give any actionable advice, like 5xx errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing more context. Looking at the current use case, we don't need to throw an error because we are attempting to revalidate the image, so I assume we already have the image.

Maybe we could log a warning instead—for any non-200 status code—and inform the user that Astro couldn't revalidate the image and will use the existing one (we could add more info e.g. status code). What do you think?

Copy link
Contributor Author

@oliverlynch oliverlynch Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a warning would be a better solution. I've added the status text (e.g. NOT FOUND/FORBIDDEN) to the error, which should make it more understandable without needing to look up the status code first.

I've added a try catch block in generate.ts to handle this error and errors from Request itself, which should fall back to using the stale cache and print a warning in the log.

Example warning:

[WARN] An error was encountered while revalidating a cached remote asset. Proceeding with stale cache. Error: Failed to revalidate cached remote image https://<domain>/test.jpg. The request did not return a 200 OK / 304 NOT MODIFIED response. (received 403 Forbidden)
  ▶ /_astro/test_Z2qHSPG.webp (reused cache entry) (+295ms) (31/33)

);
}

const data = Buffer.from(await res.arrayBuffer());

if (res.ok && !data.length) {
// Server did not include body but indicated cache was stale
return await loadRemoteImage(src);
}

// calculate an expiration date based on the response's TTL
const policy = new CachePolicy(
webToCachePolicyRequest(req),
webToCachePolicyResponse(
res.ok ? res : new Response(null, { status: 200, headers: res.headers }),
), // 304 responses themselves are not cachable, so just pretend to get the refreshed TTL
);
const expires = policy.storable() ? policy.timeToLive() : 0;

return {
data,
expires: Date.now() + expires,
etag: res.headers.get('Etag'),
};
}

Expand Down
Loading