Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages from data, take 5 #6310

Closed
bep opened this issue Sep 5, 2019 · 33 comments
Closed

Pages from data, take 5 #6310

bep opened this issue Sep 5, 2019 · 33 comments

Comments

@bep
Copy link
Member

bep commented Sep 5, 2019

There are existing issues about this, but I prefer to start fresh when I have new ideas on a subject. I thought about this again when having my hair washed at the hairdresser today. Maybe the hair massage helped.

I think I have been too hung up in the technical challenges of this (remote adapters, how to effectively do partial updates etc.), making the whole issue too big to start with. What we have talked about earlier have also been "something different on the side of what we already have".

But what we have is:

  • A virtual /content directory that can be composed via Hugo Modules (with overrides on file level)
  • Virtual mounts support (you can mount any directory or file, even from remote GitHub repos into /content)
  • A front matter based metadata model with cascade keyword etc.
  • Partial server updates based on filesystem events.
  • ...

With that I mind, I thought about adding a new reserved filename in /content starting with _content.

Given the example below:

  • _content.json, _content.toml (and YAML) would be fairly straight forward, i.e. metadata + content. We should probably support a tree structure somehow, so you can build a complete content structure from one root _contentfile. We should probably support multiple files per directory so you could do _content_products.json etc.
  • _content.go would represent the dynamic content, some kind of content adapter, possibly remote. This is obviously the area with most open questions (_content.js would be a thought), but having the naming in place is a start.
content
├── _content.json
├── _index.md
├── blog
│   ├── _content.toml
│   ├── _index.md
│   ├── image.png
│   └── post1.md
└── docs
    ├── _content.go
    └── _index.md

/cc @regisphilibert @onedrawingperday @digitalcraftsman @budparr @moorereason @kaushalmodi and gang.

@bep bep added the Proposal label Sep 5, 2019
@bep bep added this to the v0.59 milestone Sep 5, 2019
@regisphilibert
Copy link
Member

Yes. This is clearer than using the section's _index.md.

We should probably support multiple files per directory so you could do _content_products.json etc.

If we need to create another file, might as well create the directory: /products/_content.json

@bep
Copy link
Member Author

bep commented Sep 5, 2019

If we need to create another file, might as well create the directory: /products/_content.json

Yes, probably.

Also thinking, I think I'm going to restrict this to JSON in its first iteration, as that is the only format supporting stream decoding.

@bep
Copy link
Member Author

bep commented Sep 5, 2019

Also thinking, I think I'm going to restrict this to JSON in its first iteration

And possibly YAML:

go-yaml/yaml#4

bep added a commit to bep/hugo that referenced this issue Sep 6, 2019
@regisphilibert
Copy link
Member

That's great. Now I'm seeing those _content.yaml file with lots of information about the meta data. Which key goes where in a "page" object.

But I was also under the impression that Hugo would fetch the data based on some parameters (endpoints, pagination, etc...) which might have been a bit optimistic.

With _content.go? Does this mean, we'll be able to write our own data fetcher/parser in Go or Javascript and if so, can't the metadata/front_matter be addressed from there?

@bep
Copy link
Member Author

bep commented Sep 6, 2019

With _content.go?

So, there is 2 stories to this issue.

The main story being that we ned to break this down into smaller pieces to be able to grasp it and possibly also implement it in iterations.

So:

  • _content.yaml is raw page data just wrapped in a more "data like" format than a markdown file with frontmatter. The big benefit being that you can create thousands of pages in one file.
  • _content.go would be the "create those thousands of articles by some kind custom scripting towards an Hugo API" (which would handle all the caching/partial update logic etc).
  • _content.wordpress(and now I'm just making stuff up) would use a built-in Hugo adapter to pull in those thousand articles.

@regisphilibert
Copy link
Member

Got it! Thanks for clarifying.

@regisphilibert
Copy link
Member

regisphilibert commented Sep 6, 2019

Oh and, if all Hugo need from a _content.go is to produce/return an array of items, we could use Go Template and return.

That would allows us to use some �partialCached (for transformers) and other familiar Hugo stuff to prep the data grabbed from GetJSON or else...

@bep
Copy link
Member Author

bep commented Sep 7, 2019

I have done some experiments with @natefinch 's https://github.com/starlight-go/starlight today (his library wraps https://github.com/google/starlark-go/ by Google), and I'm very impressed and think it would be a good fit for the above (and also other uses).

It will be yet another thing to learn for Hugo users (it's a Python dialect), but I think well worth it.

It integrates very well with the Go side of the fence. It returns all the variable definitions when evaluating a script, even functions, so it should be possible to define "plugin interfaces" with default implementations, and implement whatever needed in the _content.py file, e.g:

type interface DataGetter {
   GetDataStream() io.ReadCloser
}

Would be implemented in _content.py as:

def GetDataStream():
	return http.Get(site.Param("wordPressAPI"))

@bep
Copy link
Member Author

bep commented Sep 7, 2019

See my previous comments, and push the appropriate button below whether you think adding Starlark (a Python dialect) as a scripting language in Go is a good idea or not. First as a way for users to write custom "source adapters" for content, but we will most llkely find other use cases, eventually (@natefinch did a PR some time ago with custom template functions in Python).

EDIT: Those votes came in fast ... Note that if you push the "no, that is a bad idea", it would be good if you could elaborate in a comment. What would be a good alternative etc.?


@regisphilibert
Copy link
Member

I think the learning curve of Go Template is hard enough for many new users, letting them know that learning Python is a requisite for Data source sounds a bit harsh.

I'd be willing to invest time into learning some Go, but Python, not so excited. Did you drop JS because of speed?

@bep
Copy link
Member Author

bep commented Sep 7, 2019

I'd be willing to invest time into learning some Go, but Python, not so excited. Did you drop JS because of speed?

The options that I have evaluated are Python and Lua. To my knowledge there are no solid and embedded JS implementation in Go. But I assume some day it will happen.

@regisphilibert
Copy link
Member

And for my own curiosity, why not Go?

@bep
Copy link
Member Author

bep commented Sep 7, 2019

And for my own curiosity, why not Go?

Mostly security related.

@kaushalmodi
Copy link
Contributor

I think the learning curve of Go Template is hard enough for many new users, letting them know that learning Python is a requisite for Data source sounds a bit harsh.

+1

@bep
Copy link
Member Author

bep commented Sep 7, 2019

Mostly security related.

Note that that remark was about "compiled Go" (not Go templates).

@regisphilibert
Copy link
Member

regisphilibert commented Sep 7, 2019

I trust you in choosing the fastest most reliable way of letting coders build their own data parsers.

So I'll mention Go Template one last time.
With returning partials, merge, `transform.Unmarshal etc..., Hugo's got really better at handling data.
With getJSON (might need improvements and complementing methods), Scratch and the new features mentioned above, I know I can build that parser with Hugo's Go Template. Is it a bad idea?

@bep
Copy link
Member Author

bep commented Sep 8, 2019

The main problem/challenge with using Go templates for this is that it's procedural, it's "one script per file", one method.

I will try to think of a better example, but in the context we're talking about (plugins), it becomes hard/ugly to then create plugin APIs with life cycle methods, e.g.:

def PluginType():
  return "source"

def ShouldUpdate(after):
  return true

def GetDataStream():
	return http.Get(site.Param("wordPressAPI"))

In the above, Hugo could look at the plugin and say "Oh, it supports JSON via a reader (stream), we can optimize for that".

With Go templates, parts of the above may look like:

{{ if .LastUpdate.After ... }}
{{ .Result.NotModified }}
{{ else }}
{{ .Result.Set getJSON "foo" }}
{{ end }}

Note again that I'm not saying that the above represents a "real plugin interface", but I'm fairly sure that most real plugin scenarios would require some level of "branching". And it would be good if we could write those plugins in something that doesn't look like code from the 80s.

Also note that Starlark is a Python dialect, a sub-set of Python built for this particular purpose (and as an embedded scripting option it is, in my eyes, done very well -- supporting both Go's garbage collection and multithreading).

Starlark is a small and simple language with a familiar and highly readable syntax. You can use it as an expressive notation for structured data, defining functions to eliminate repetition, or you can use it to add scripting capabilities to an existing application.

For me, my Python skills are just about on par with my JavaScript skills, but I still think I would prefer Python for the use cases above if I were a Python newbie. I only think JS would make sense if you also bring in all the (in)sanity of NPM (but we really need for it to somehow integrate with the Go side of the fence).

Also, being able to define these plugin interfaces (as proper interfaces), we can also provide a set of implementations in Go which you then can configure from your plugin, e.g. (and again, I'm just quickly making this stuff up):

def SourcePlugin():
  # One of "many" supported adapters with implementation in Go.
  return "wordpress"

def PluginConfig():
  return site.Param("myWordPressConfig")

The above is, I think, valid Python. Most editors will provide syntax highlighting for it (if you suffix the file "*.py").

So, the above could be rewritten to:

{{ .SetSourcePlugin "wordpress" }}
{{ .SetPluginConfig .Site.Params.myWordPressConfig  }}

Which isn't bad, but since there is no way for Hugo to look at the file _content.tpl and know what it is/needs, you end up sending in the full plugin API as the "dot context" and you get some level of spaghetti when putting it all together.

I will also add that, if you think the above is hard and you still want/need to use it (people have happily lived without the above for a long time), Hugo Modules allows people to borrow from other people's work. This will be even more true if we extend this to writing plugins that gets exposed as template functions (see @natefinch 's PR).

As one last note: We should be able to acccess all of Hugo's template functions inside these scripts, .e.g resources.Get "foo.jpg".

@HenrySkup
Copy link

Most editors will provide syntax highlighting for it (if you suffix the file "*.py") and is, in my head, much easier to explain to people than some Go template conditional spaghetti.

yes, yes! Let Go Templates do the templating ... and other interfaces for the more ambiguous and likely-to-branch-and-to-error data generation. Since Go is 1) a security risk and 2) rather esoteric then Python and JS seem like robust and highly used options (with Python being particularly well crafted and good for beginners). That being said, there would be some aesthetic nicety of having an "all-GO-based" solution .

@regisphilibert
Copy link
Member

I’m convinced and cast my vote. :). Can’t wait to see some parser example. I guess I wanted an excuse to learn Go.

@regisphilibert
Copy link
Member

regisphilibert commented Sep 8, 2019

I realize this is a config language.

This might be a an example that Hugo users can more easily relate to:

https://github.com/bazelbuild/starlark/blob/master/README.md#tour

This is pretty exciting actually. Give so much control on a directory’s data, remote or otherwise

@bep
Copy link
Member Author

bep commented Sep 8, 2019

I guess I wanted an excuse to learn Go.

Still plenty of reasons to learn Go ...

For completeness, my previous answer about "Go as plugin" assumed some kind of "compile on the fly" and communication via "os/exec". Go has a "plugin package" built-in that could have worked for us, but it's Linux and macOS only and considered very experimental and buggy (and no-one have worked on it for years).

bep added a commit to bep/hugo that referenced this issue Sep 9, 2019
@gour
Copy link
Contributor

gour commented Sep 9, 2019

The options that I have evaluated are Python and Lua.

Lua is commonly used for such things, e.g. Pandoc uses it for custom filters, but considering that Starlark is implemented in Go and looks as 'python-lite', I believe it's a better option than full-fledged Lua being easy and/or simple-enough.

To my knowledge there are no solid and embedded JS implementation in Go.

I won't cry because of that. :-)

@bep bep added this to the v0.117.0 milestone Aug 1, 2023
@bep bep modified the milestones: v0.117.0, v0.118.0 Aug 30, 2023
@bep bep modified the milestones: v0.118.0, v0.119.0 Sep 15, 2023
@bep bep modified the milestones: v0.119.0, v0.120.0 Oct 4, 2023
@bep bep modified the milestones: v0.120.0, v0.121.0 Oct 31, 2023
@bep bep modified the milestones: v0.121.0, v0.122.0 Dec 6, 2023
@bep bep modified the milestones: v0.122.0, v0.123.0, v0.124.0 Jan 27, 2024
@bep bep closed this as completed Jan 31, 2024
@bep bep reopened this Jan 31, 2024
@bep bep modified the milestones: v0.124.0, v0.125.0 Mar 4, 2024
@bep bep modified the milestones: v0.125.0, v0.126.0 Apr 23, 2024
bep added a commit to bep/hugo that referenced this issue May 13, 2024
bep added a commit to bep/hugo that referenced this issue May 13, 2024
bep added a commit to bep/hugo that referenced this issue May 13, 2024
bep added a commit to bep/hugo that referenced this issue May 14, 2024
bep added a commit to bep/hugo that referenced this issue May 14, 2024
bep added a commit to bep/hugo that referenced this issue May 14, 2024
@bep bep closed this as completed in e2d66e3 May 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests