Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Layer #982

Merged
merged 14 commits into from
Sep 17, 2024
Merged

Content Layer #982

merged 14 commits into from
Sep 17, 2024

Conversation

ascorbic
Copy link
Contributor

@ascorbic ascorbic commented Jul 23, 2024

Summary

Creates a successor to content collections with expanded use cases, including remote data sources and improved performance.

// src/content/config.ts
import { defineCollection, z } from "astro:content";
import { glob, file } from "astro/loaders";
import { feedLoader } from "@ascorbic/feed-loader";

// Loaders can be defined inline 
const countries = defineCollection({
  loader: async () => {
    const response = await fetch("https://restcountries.com/v3.1/all");
    const data = await response.json();
    // Must return an array of entries with an id property, or an object with IDs as keys and entries as values
    return data.map((country) => ({
      id: country.cca3,
      ...country,
    }));
  },
});

// Loaders can also be distributed as packages
const podcasts = defineCollection({
  loader: feedLoader({
    url: "https://feeds.99percentinvisible.org/99percentinvisible",
  }),
});

// The `glob()` loader loads multiple files, with one entry per file
const spacecraft = defineCollection({
  loader: glob({ pattern: "*.md", base: "src/data/spacecraft" }),
  // A schema is optional, but provides validation and type safety for data.
  // It can also be used to transform data before it is stored.
  schema: ({ image }) =>
    z.object({
      title: z.string(),
      description: z.string(),
      heroImage: image().optional(),
    }),
});

// The `file()` loader loads multiple entries from one file
const dogs = defineCollection({
  loader: file("src/data/dogs.json"),
  schema: z.object({
    id: z.string(),
    breed: z.string(),
    temperament: z.array(z.string()),
  }),
});

export const collections = { spacecraft, dogs, countries, podcasts };

Links

@ascorbic ascorbic changed the title Initial commit Content Layer Jul 23, 2024
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
proposals/content-layer.md Outdated Show resolved Hide resolved
@ascorbic ascorbic requested a review from ematipico July 25, 2024 08:38
@ascorbic ascorbic marked this pull request as ready for review August 8, 2024 12:45
@ascorbic ascorbic mentioned this pull request Aug 8, 2024
@krishna-santosh
Copy link

@ascorbic can we define a schema when fetching data from a remote URL?

@ascorbic
Copy link
Contributor Author

@krishna-santosh yes. You can either define it at the top level (inside the defineCollection object) in the same way as now, or a loader can define it. In that case the loader can generate it dynamically, e.g. by introspecting the API.

@simonswiss
Copy link

Multiple entries per file is great — but how about multiple files per entry?

There are many cases where an entry has a heroIntro paragraph which wants to be a rich text field, as well as the main entry body.

The ability to store multiple files part of the same entry would be great!

Alternatively, storing other Markdown (Markdoc/MDX) fields in the frontmatter (we do this with Keystatic's markdoc.inline field) and providing a similar .render() method on those would be awesome.

Apologies if this is not the right place for this feedback 🤗

@joelvarty
Copy link

Let me know if this is the wrong place for this, but I wanted to provide feedback on what I feel would be important for the success of this layer from a CMS perspective:

Dependancy Tracking

  • Being able to track dependancies automatically is VERY useful if I have content that is being used in multiple places and I invalidate it.
  • Being able to incrementally update the data layer should work hand in hand with that dependancy checking.

Function Support

  • As I understand it, this layer will exist as files somewhere - how will that work in runtimes that don't allow file system support.

@ascorbic
Copy link
Contributor Author

@joelvarty yes, we'd love to do dependency tracking at some point. We are in a good place to do that, but it's not in scope for this version.

The data is read-only at runtime, and even though it's implemented as files on disk it's loaded as a virtual module, so it's all handled by rollup and works on serverless runtimes. I haven't tested it with Cloudflare yet though, so we'll need to ensure that works.

@joelvarty
Copy link

@ascorbic It sounds like we would need to do a build to get new data into the project. Am I correct in that?

@ascorbic
Copy link
Contributor Author

@joelvarty Yes, there's no way to update a deployed site without a build. Locally you can sync by running astro build, astro sync or when running astro dev you can use the s + enter hotkey

@Suven
Copy link

Suven commented Aug 14, 2024

@ascorbic there is no chance of exporting globalContentLayer right? I am generally doing SSG, but am currently using an instance of npm run dev for live-previews of the CMS. If I had access to the global instance, I could do some trickery like resyncing every n pageviews or all X seconds.

@ascorbic
Copy link
Contributor Author

@Suven No (and please don't try: you will break things), but I will be exposing a refresh method in astro:server:start for integrations to use

@ascorbic
Copy link
Contributor Author

ascorbic commented Aug 16, 2024

I have a proposed addition to the RFC, which I'd welcome feedback on.

Integration support for content layer

The is a proposal to add support for syncing the content layer to integrations. It would allow integrations to trigger a sync during dev, optionally of just certain loaders. It would also allow them to pass metadata such as a webhook body to the loader

Use cases

  • Adding a sync button to the Astro toolbar
  • Opening a web socket to a CMS that listens for updates
  • Creates a webhook URL that can be tunnelled to a public address for CMSs to trigger
  • Allow e.g. code sandboxes or hosted dev serversa to trigger reloads

API

Adds a syncContent function to the astro:server:setup hook options, with the following signature:

async syncContent(options: { 
   loaders?: Array<string>,
   context?: Record<string, any>
})

loaders is an optional array of loader names. If set, only those loaders will be synced. This allows integrations to selectively sync their own content.

context is an optional object with arbitrary data that is passed to the loader's load function as syncContext.

Usage

This shows an integration that creates a refresh webhook endpoint:

export default function() {
    return {
        name: '@astrojs/my-integration',
        hooks: {
            'astro:server:setup': async ({ server, refreshContent }) => {
                server.middlewares.use('/_refresh', async (req, res) => {
                  	if(req.method !== 'POST') {
                      res.statusCode = 405
                      res.end('Method Not Allowed');
                      return
                    }
                    let body = '';
                    req.on('data', chunk => {
                        body += chunk.toString();
                    });
                    req.on('end', async () => {
                        try {
                          	const webhookBody = JSON.parse(body);
                            await refreshContent({
                              // Include the parsed request body. `webhookBody` is an arbitrary name
                              context: { webhookBody },
                              // Only refresh a particular loader
                              loaders: ['my-loader']
                            });
                            res.writeHead(200, { 'Content-Type': 'application/json' });
                            res.end(JSON.stringify({ message: 'Content refreshed successfully' }));
                        } catch (error) {
                            res.writeHead(500, { 'Content-Type': 'application/json' });
                            res.end(JSON.stringify({ error: 'Failed to refresh content' }));
                        }
                    });
                });
            }
        }
    }
}

Inside the loader:

import { type Loader } from "astro/loaders"
export function myLoader(): Loader {
  return {
    name: "my-loader",
    load: async ({ store, logger, syncContext, meta }) => {
      if(syncContext?.webhookBody?.action) {
        logger.info("Received incoming webhook")
        // do something with the webhook body
      }
      // this is a normal sync...
    }
  }
}

Questions

How should we handle the case where a sync is already in progress? Should it be queued? Should it be skipped?

@matthewp
Copy link
Contributor

Outside of you using refreshContent in one of the examples (which I kind of like better...) I think the idea sounds reasonable and astro:server:setup is the place to do it.

@Suven
Copy link

Suven commented Aug 19, 2024

That proposal sounds great! Regarding your question: I guess triggering a sync while syncing is very likely in CMS-Preview-contexts as the user is continuously making changes. In those I guess the user is only interested in the latest version of his change and this, cancelling the current sync and starting a new one would make sense.

@NotWoods
Copy link

Is it possible to set up deferred rendering by passing a function/promise to rendered? The current deferredRender property seems like it's designed around local files, but loaders that pull from a third-party API could get a performance boost by just fetching metadata then fetching the HTML only when needed.

Did you have anything in particular in mind? The deferred renderers are designed around virtual modules, so you can pull your content from them?

The virtual module API is a lot of overhead if you're not working with files on a file system. I'm writing a loader for the Notion REST API and it feels weird to make a virtual module per API call. After chatting with folks on the Discord I get the impression virtual modules isn't design for this use case.

From an ease-of-loader-implementation perspective a function that can use dynamic imports feels much simpler. I don't know if that makes it harder to optimize the file loader use case.

@werfred

This comment has been minimized.

@yeehaa123
Copy link

Maybe I'm doing something wrong, but references don't seem to work with this API. Is this correct? Is this something that will be implemented down the line? Is there any way I can help out?

@ematipico

This comment has been minimized.

@twodft
Copy link

twodft commented Aug 22, 2024

I'm wondering if the render also works for remote MDX, if not, how can I implement my own loader to correctly render our remote MDX content from APIs

@ascorbic
Copy link
Contributor Author

Maybe I'm doing something wrong, but references don't seem to work with this API. Is this correct? Is this something that will be implemented down the line? Is there any way I can help out?

They should work in the same way as existing collections. If you can't get the working, can you ask in Discord

@matthewp
Copy link
Contributor

@ascorbic what happens if there's a conflict between the schema the loader is providing and the schema the user provides in defineCollection?

@ascorbic
Copy link
Contributor Author

@matthewp the user-defined config will override any loader-defined one

@ascorbic
Copy link
Contributor Author

I've added a section on integration support to the RFC, and have a PR with an implementation.

@HiDeoo
Copy link
Member

HiDeoo commented Aug 30, 2024

Tiny feedback on the Content Layer loader API, not sure if it's on purpose or not in this context: if you throw an AstroError with an hint in a loader, the hint is never displayed to the user. I think it would be nice to display it, e.g. to help user with obvious configuration mistakes, etc.

@ascorbic
Copy link
Contributor Author

ascorbic commented Sep 4, 2024

I've made a few small changes to the RFC, with the only API change being that the type for the data store is now DataStore. We're now ready for a call for consensus, with a goal to ship this as stable in 5.0. If you have any final comments on the RFC please make them here now. This process will last for a minimum of three days before this PR is merged. Please make any bug reports in the main astro repo.

There will be follow-up RFCs for future features, particularly the libSQL backend.

Co-authored-by: Erika <3019731+Princesseuh@users.noreply.github.com>
@jcayzac
Copy link

jcayzac commented Sep 4, 2024

Some user with a lot of content recently asked about it on Discord, if I recall correctly: how about making the loaders generator functions that yield entries, rather than async functions that have to return entire collections?

Edit: actually, anything that returns an AsyncIterable would do the trick?

@hfournier
Copy link

The comment on the built-in file loader example is:

// The file loader loads a single file which contains multiple entries. The path is relative to the project root, or an absolute path.
// The data must be an array of objects, each with a unique `id` property, or an object with IDs as keys and entries as values.

So, a data structure that may already have some other unique identifier, must also have an id property, which seems redundant. It results in data that looks like this:

[
  {
    id: 'abc',
    data: {
      id: 'abc',
      myUniqueId: 'abc',
      name: 'Abc'
    },
    filePath: 'src/myFolder/myData.json',
    collection: 'myData'
  },
  ...

with 3 properties with the same value.
Would it be possible to add an optional 2nd param to the file() loader that indicates which property to use as the id?
This would eliminate one redundant property and not require existing json files to be altered with an additional id for each entry.

@ascorbic
Copy link
Contributor Author

@hfournier that's an interesting idea. I'm not sure about having it as a second argument, but it is something I'll look at

@ascorbic
Copy link
Contributor Author

This is now included in Astro 5 beta

@ascorbic ascorbic merged commit a1fbce2 into main Sep 17, 2024
@ascorbic ascorbic deleted the content-layer branch September 17, 2024 13:20
@ascorbic ascorbic restored the content-layer branch September 17, 2024 13:53
@jurajkapsz
Copy link

Not sure if this is still the right place to give feedback - I've followed a link from docs - as Astro 5 beta recently came out, but I've tried out this API and my page styles were afterwards rendered somehow broken. I have the latest Astro v4 release v4.15.6 and I use mdx in content collections.

Does this have something to do with @astrojs/mdx, which is mentioned in Astro 5 beta docs to be of v4, while with Astro v4 its v2.3.1?

Looking forward for this API.

@ascorbic
Copy link
Contributor Author

ascorbic commented Sep 17, 2024

@jurajkapsz can you open an issue on the main astro repo please. I'm investigating styles with MDX in content layer at the moment, and it would be helpful if you had a minimal reproduction of the problem

@jurajkapsz
Copy link

@ascorbic OK, will do, atm I am doing some tests on my side to better understand what happened and to eventually write a more precise bug report.

I've noticed that what visually broke pages after switching to the new Content Layer API render was a different final order of processed CSS styles, which changed style cascade, making eg CSS BEM modifiers unusable.

I happen to have certain style definitions in one component and modifying style definition of given component in another component. I'd say it is correct to have it that way, but I give it a second thought; anyhow it worked before, and the final styles where somehow in correct order.

@withastro withastro locked as resolved and limited conversation to collaborators Sep 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.