Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WORK IN PROGRESS Add a NodeID (or something) to content entities #6318

Closed
bep opened this issue Sep 9, 2019 · 5 comments
Closed

WORK IN PROGRESS Add a NodeID (or something) to content entities #6318

bep opened this issue Sep 9, 2019 · 5 comments

Comments

@bep
Copy link
Member

bep commented Sep 9, 2019

This is work in progress

This ties directly into #6310 -- but is important enough to warrant its own discussion.

I will discuss 2 slightly related IDs in this proposal. The latter may only be of internal interest and possibly not exposed in any API, but time will show.

Content NodeID

Given this example content tree:

content
├── _content.py
├── _index.md
├── blog
│   ├── _content.py
│   ├── _index.md
│   ├── image.png
│   ├── post
│   │   └── index.md
│   └── post1.md
└── docs
    ├── _content.py
    └── _index.md

So, the different _content.py content adapters can, in general, prodce any of

  • New content pages, including bundles.
  • New images, data etc. resources
  • Pages, images etc. that complement other pages/bundles

I suggest something fairly simply that would fit nicely into a prefix tree for fast lookups:

/en/blog/index.branch
/en/blog/index.branch/image.png
/en/blog/post/index.leaf
/en/blog/post/index.leaf/image.png

The above have some nice features that we need:

  • We know that everything below /blog/post/index.leaf/ belongs to the same bundle. Which means fast prefix scans and inserts.
  • We know that everything below /blog/index.branch belongs to the same section etc.
  • We know that everything below /en/ is English content.

There are some open questions re language etc., but we'll sort out those details when that comes...

The above looks rather odd, and we can work on that, but this isn't something that you print on a T-shirt, so it does'nt have to be pretty.

The important part is that /blog/index.branch identifies the Blog section etc.

_content.py will then also produce pages/resources with potentiallywith same IDs and will form some kind of composite relationship. We could probaly create a formal/custom definition for this, but I suspect that the default order of these will be 1: Content file 2: Content plugin and resources from 2: Will complement 1: (new files and metadata will be added), logically:

/blog/post/index.leaf/0
/blog/post/index.leaf/1

Content SourceID

Today, when we get a change event for a content file, we do (little simplified):

  • Remove or replace the data from internal content tree starting from a given directory (we store the pages in a mutable Radix tree)
  • Rebuild/re-render site

That change event will be on the form /Users/bep/sites/mySite/themes/content/blog/article/sunset.jpg. Hugo has this clever concept of a big union file system, so that filename does not map directly where to find this object, so iterate all the module paths until we find one with a matching prefix (/Users/bep/sites/mySite/themes) and then we walk up the directory tree to find the best directory to start walking from (blog/article if it's a bundle).

This works, and is not particulary hard to reason about.

But the above falls short when we start composing content from multiple sources, and these sources are either not files or they have filenames that do not map particulary well into the content tree.

The file /blog/_content.py may look something like this:

def GetFilenames():
	return ["data/content1.yaml", "data/content2.yaml"]

So, when we get a change event for /Users/bep/sites/mySite/themes/assets/data/content2.yaml we need a way to rebuild from the owning node(s) without having to rebuild the whole thing.

@bep bep added the Proposal label Sep 9, 2019
@bep bep added this to the v0.59 milestone Sep 9, 2019
@regisphilibert
Copy link
Member

regisphilibert commented Sep 9, 2019

The latter may only be of internal interest and possibly not exposed in any API, but time will show.

We'll eventually need this on the page context for partialCached variants.

@bep
Copy link
Member Author

bep commented Sep 9, 2019

We'll eventually need this on the page context for partialCached variants.

The first variant, yes, but not the "latter". I have added one more section about that one, but this is still a draft.

@moorereason
Copy link
Contributor

So, when we get a change event for /Users/bep/sites/mySite/themes/assets/data/content2.yaml

It's hard to stay out of the weeds here, but some observations. Correct my misconceptions.

  1. Multiple content plugins could be relying upon the same local datafile, so we need to maintain a list of listeners for each datafile. If true, does that change the SourceID idea?
  2. In this use case, the change event came from the local filesystem. (In other scenarios, change events may be hidden behind the plugin itself?)
  3. When a datafile is changed, the plugin will need a way to receive change events so that it can decide which output to re-render, if any.

@bep
Copy link
Member Author

bep commented Sep 9, 2019

@moorereason I think you can call this README based implementations, and it's safe to say that even if the above have headines and pretty formatting, it's early thoughts.

I can answer your questions more generally:

  • For file based content (md, yaml etc.) we assume that the files are watched by Hugo (in /assets or /content)
  • We need to keep some kind of plugin registry.
  • For the plugins that tells us what files they want, this is straight forward.
  • For the plugins that reads from a stream (http.Get "somejson"), this is not straight forward.

Some ideas for the "not straight forward part":

  • Poll based updates. The plugins will be responsible for implementig the check, but Hugo maintains the registry and trigger rebuilds whenever the plugins tells us.
  • Similar to the above I know some CMSes support Server-Sent events.
  • Provide a web hook per plugin, i.e. https://mydevserver:1313/hooks/myplugin
  • Define an optional ShouldUpdate(IDs) plugin method....

Note that the above is mostly talking about the "server watch mode". There are some additonal thoughs to be had around doing the full buid effectivley, keyword being date based remote updates ("give me all updates since ..."), caching with hashes of the items to detect changes on a more fine grained level etc, which is also relevant in watch mode.

The aboe looks complex, and proably is. My goal is, however, to make it look simple to the end user. I have seen other SSGs where creating "source adapters" seems like a very daunting task.

@bep bep modified the milestones: v0.59, v0.60 Oct 21, 2019
@bep bep modified the milestones: v0.60, v0.61 Nov 25, 2019
@bep bep closed this as completed Nov 28, 2019
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants