Improve the way we consume collection(s) to avoid sending all metadata in all pages #713

DavidWells · 2016-09-02T22:24:00Z

Related to trimming down the window.collection. #712

When you add in new custom meta (like in this blog post example) all those extra fields are also added to the window.collection data. This could potentially make it HUGE

Example:

---
layout: Post
title: 'Defining Serverless and Why It Matters to Developers'
date: 2016-09-01
description: "You’ve probably heard the term _serverless._ But what does it actually mean? And more importantly, as a developer, why should you care?"
author:
  name: Serverless
  url: http://twitter.com/goServerless
  avatar: https://avatars3.githubusercontent.com/u/13742415?v=3&s=60
tags:
- serverless

---

```json
// printed in window.collection, notice the custom meta values
{
    "layout": "Post",
    "comments": true,
    "title": "Defining Serverless and Why It Matters to Developers",
    "date": "2016-09-01T00:00:00.000Z",
    "description": "You’ve probably heard the term _serverless._ But what does it actually mean? And more importantly, as a developer, why should you care?",
    "author": {
        "name": "Serverless",
        "url": "http://twitter.com/goServerless",
        "avatar": "https://avatars3.githubusercontent.com/u/13742415?v=3&s=60"
    },
    "tags": ["serverless"],
    "__filename": "blog/defining-serverless-and-why-it-matters-to-developers.md",
    "__url": "/blog/defining-serverless-and-why-it-matters-to-developers/",
    "__resourceUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html",
    "__dataUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html.bf1d5e0db467ef721b8508992749379b.json"
}

All the data also exists in the index.html.bf1d5e0db467ef721b8508992749379b.json file.

So the question is, would it be possible/worth it to remove the additional custom fields from outputting into window.collection and just have the .json files for the individual pages handle that additional data?

The text was updated successfully, but these errors were encountered:

thangngoc89 · 2016-09-02T22:30:41Z

would it be possible/worth it to remove the additional custom fields from outputting into window.collection

It depends on use cases. If user doesn't require any of these field in their codebase for generating a list of posts, it's possible to remove them. See collections API https://phenomic.io/docs/usage/collections/

Does it worth it? I don't know. In the huge site of yours, if you remove all of these custom fields how much bytes do you gain ?

MoOx · 2016-09-03T04:58:33Z

I am currently discussing with @bloodyowl to improve Phenomic collection API in order to stop sending almost all data into all pages. The idea is mainly to only put in the pages the data used (minimal json) and create json for each possibles pages.

For now I don't plan to add a custom way to retrieve only some fields, because if we choose the solution above, it won't be a big deal (tell me if I am wrong) to get all fields for, let's say, the 10 pages you are listing.

Again I repeat the idea we are working on: only put in the html (& json files for client nav) the data requested by a page. So no more extra unused json.

To achieve that, in order to keep an API simple (not by adding a graphql server - please @bloodyowl and others, take a look to this & tell me what you think, I personally thing it's a bit crazy to go this path, but I may be wrong) the idea is to provides HoC.

Here is some pseudo code we have in mind (API subject to changes):

class YouPageThatListContent extends ...

export default Phenomic.createContainer(
  YouPageThatListContent,
  (store) => ({
     pages: store.get("pages", { sortBy: "date", order: "DESC", limit: 5})
   })
)

// alternative
export default Phenomic.createContainer(
  YouPageThatListContent,
  (state) => ({
     pages: Phenomic.queryCollection(state.pages, { sortBy: "date", order: "DESC").slice(0, 5)
   })
)

Now you are going to ask: what about pagination? Imo this should be in core, not in a plugin...

Ok then here is an idea:

<Route
  path="/tag/:tag(/:page??)" component={ YouPageComponent }
  collection="posts" pageSize={20} sortBy="date" order="DESC"
  filter={ (item, routeParams) => item.tags.indexOf(routeParams.tag) > 0) }
/>

// ...

// YouPageComponent
export default Phenomic.createContainer(
  YouPageThatListContent,
  (state, page) => ({
     taggedPosts: pages.items,
     // you can send any kind of data, injected as props into YouPageThatListContent
     totalPages: pages.numberOfItems,

     // while pagination is allowed for one resource at a time, you might get other data as well
     authors: Phenomic.queryCollection(state.authors, { sortBy: "commits", order: "DESC").slice(0, 5)
    someRandomPosts: Phenomic.queryCollection(state.posts).randomMethodToImplement(5)
   })
)

Not that the code above assume that we will introduce a new way to register collectionS (yes multiple collections, instead of having to filter via layout or something else)

By doing that, we will be able to statically retrieve collections fraction and only inject that in the html (& as well create json fragments, for client navigation).

Any thoughts on this approach?

thangngoc89 · 2016-09-03T05:22:59Z

Totally agree with this approach. Filter with js runtime can be slow especially if you have a lot of pages.

bloodyowl · 2016-09-03T08:37:52Z

Added a few changes to my proposal:

Content definition

import Phenomic from "phenomic"

// I think that we should accept:
// type Data = { [key: string]: any | Promise<any> }
// Data | Promise<Data>
module.exports = {
  // See: https://gist.github.com/bloodyowl/27e159aa9e02c5ac40fd6ff5c2bb93e8
  posts: Phenomic.createCollection(
    requireAll(require.context("markdown!./posts", true, /\.md/)),
    { indexes: ["id", "url"] } // will create JS Maps to improve query time on build & dev server
  ),
  authors: requireAll(require.context("json!./authors", true, /\.json/)),
  // accept promises
  someExternalData: require("isomorphic-fetch")(someURL),
}

Consuming the data

Indexed queries

import React from "react"
import Phenomic from "phenomic"

const PostRoute = (props) => (
  <div>
    <h1>{props.post.title}</h1>
    <p>{props.post.content}</p>
  </div>
)

export default Phenomic.createContainer(PostRoute, {
  queries: (state, params) => ({
    // O(1) if indexed, O(N) otherwise 
    post: state.posts.getBy("id", params.id),
  })
})

"Special" queries

import React from "react"
import Phenomic from "phenomic"

const HomepageRoute = (props) => (
  <div>
    <ul>
      {props.posts.map((post) =>
        <li>{post.title}</li>
      )}
    </ul>
    <ul>
      {props.authors.forEach((author) =>
        <li>{author.username}</li>
      )}
    </ul>
  </div>
)

export default Phenomic.createContainer(HomepageRoute, {
  queries: (state, params) => ({
    posts: state.posts.queryCollection({ sortBy: "date", order: "DESC" }).slice(0, 5),
    authors: state.authors.slice(0, 5),
  })
})

Pagination

import React from "react"
import { Router, Route } from "react-router"

import PostRoute from "./PostRoute"
import HomepageRoute from "./HomepageRoute"

export default (
  <Router>
    <Route path="/" component={HomepageRoute} />
    <Route path="/post/:id" component={PostRoute} collection="posts"/>
    <Route path="/posts/page/:page" component={PostList} collection="posts" pageSize={20} />
  </Router>
)

This configuration leaves us enough information to just generate Math.ceil(collection.length / props.pageSize) pages at build time.

Phonemic.createContainer(Component, {
  queries: (state, routeParams, page) => ({
    hasNextPage: page.hasNextPage,
    posts: page.items,
  }),
})

Configuration

You basically provide your content and an instance of ReactRouter, we do the rest.

module.exports = {
  content: require("./content"),
  router: require("./web/routes/Router"),
}

Build configuration

I think that in order to prevent colliding stuff and forcing us to provide stuff for all configurations, the webpack.config.js should remain in user-land.

If you don't use the development server, webpack shouldn't even be mandatory (e.g. you get the data from an external API).

var webpack = require("webpack")
var path = require("path")

module.exports = {
  // maybe autofill entry & output, not quite sure about this yet 
  entry: {
    bundle: "phenomic/lib/entry",
  },
  output: {
    path: path.join(__dirname, "./.phenomic"),
    filename: "[name].js",
  },
  module: {
    loaders: [
      {
        test: /\.js$/,
        ignore: /node_modules/,
        loader: "babel",
        query: {
          presets: ["es2015", "react"],
        },
      },
    ],
  },
  plugins: [
    new webpack.DefinePlugin({
      "process.env": {
        NODE_ENV: JSON.stringify(process.env.NODE_ENV),
      },
    })
  ],
}

thangngoc89 · 2016-09-04T04:40:17Z

What can I do to help with this? Do you have any POC?

bloodyowl · 2016-09-04T07:21:29Z

I'm working on a POC, just need to add the PhenomicCollection and the pagination mechanism and I'll share it, that should be a good starting point 😊

thangngoc89 · 2016-09-04T07:59:44Z

Nice !

DavidWells · 2016-09-19T17:57:07Z

I wanted to follow up on this thread.

What do you guys think about normalizing the collection by perhaps URL?

This way we can have a constant lookup time

{
    "url/xyz/lolz": {
        "__dataUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html.bf1d5e0db467ef721b8508992749379b.json"
    },
    "url/two": {
        "__dataUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html.bf1d5e0db467ef721b8508992749379b.json"
    }
}

You could still map over the data with Object.keys if you want =)

Maybe even going a step further with https://github.com/paularmstrong/normalizr

I can probably just do this in user land but I wanted to float the idea around here as well.

bloodyowl · 2016-09-19T18:00:12Z

With the idea we have, there's not really a need for this. But if you want to create a "static API" from your contents, that should be totally possible in user-space 😃

thangngoc89 · 2016-09-19T18:00:55Z

@bloodyowl any progress on this?

MoOx · 2016-09-19T19:04:09Z

I am thinking about the fact that currently we use the entire collection to know if a click must be done using browser push + preventDefault :/

DavidWells · 2016-09-19T20:50:49Z

@MoOx I was thinking about this too.

Perhaps it could be solved with a 'smarter' link component.

Where each link component gets additional data attributes added to it on build. Then the link listener wouldn't need to check the collection it could just use the inline data-attr on the a tag.

example:

<!-- on click use router to go to /url/xyz -->
<a href='/url/xyz' data-phenomic-path='/url/xyz' data-phenomic-data='/path/index.html.bf1d5e0db467ef721b8508992749379b.json'>Local link</a>

This might even let us remove the need for the entire collection to be placed on the window?

lapidus · 2016-09-29T18:02:10Z

Newcomer to Phenomic ... Apologies for interjecting but I hope this can help others too.

Should one expect major changes to how collections are handled within the next month?
If not, it would be super helpful with another practical example in the docs on how to use the current context-based system. For example, what are the steps required to list a bunch of 'recipes' on a recipes page ... :)

MoOx · 2016-09-30T06:29:17Z

@lapidus you can expect a major change in the coming weeks! ;)

bloodyowl · 2016-12-22T19:23:32Z

Going to be fixed with #925

DavidWells · 2016-12-22T19:43:11Z

@bloodyowl awesome!

How is it being approached? I didn't see it mentioned in #925

bloodyowl · 2016-12-22T19:46:24Z

in 1.0.0, parsers output partial (to be used in lists) & data (used when fetching the item itself) + the window.collection disappears completely

DavidWells · 2016-12-22T19:52:46Z

@bloodyowl Cool.

A couple questions:

is all the site data still output into the DOM? Or referenced via script tag
are .json files still nested in their respective /url/path/blah/index.json folders or centralized in a single location? ref

My main concerns are making phenomic a viable option for larger website implementations =). There are certain things in the current setup that make it not an option for sites with 1000+ pages.

MoOx · 2016-12-23T06:36:36Z

No, HTML will only contains relevant data for its own page
That's not really a problem. For not we are creating a sort of "static" api, so files are not just hashes, but I guess we could improve that in the future.

The goal of 1.0 is really to make something scalable by default. We are having the same concerns as you do :)

@bloodyowl tell me if it's incorrect

bloodyowl · 2016-12-23T09:34:53Z

yeah, basically the JSON files are put in dist/phenomic and organised with the same shape they have in the dev server API

DavidWells changed the title ~~Question: Does window.collection need all the default values?~~ Question: Does window.collection need all the custom frontmatter values? Sep 2, 2016

MoOx changed the title ~~Question: Does window.collection need all the custom frontmatter values?~~ Improve the way we consume collection(s) to avoid sending all metadata in all pages Sep 3, 2016

MoOx added the feature request label Sep 3, 2016

MoOx mentioned this issue Sep 3, 2016

Add a way to consume Wordpress API #719

Closed

MoOx mentioned this issue Sep 4, 2016

Improve collection item private metadata #712

Closed

MoOx mentioned this issue Sep 6, 2016

Generate routes for data coming in from outside phenomic #732

Closed

MoOx mentioned this issue Sep 15, 2016

Simplify collection API #686

Closed

MoOx mentioned this issue Sep 21, 2016

Easier access to current page metadata #471

Closed

DavidWells mentioned this issue Oct 10, 2016

Idea: Preload .json files on link hover #680

Closed

MoOx mentioned this issue Nov 22, 2016

New data/collection API #889

Closed

DavidWells mentioned this issue Dec 14, 2016

Proposal for shrinking page object for large site implementations #913

Closed

MoOx mentioned this issue Jan 12, 2017

1.0 #936

Merged

MoOx closed this as completed in #936 May 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the way we consume collection(s) to avoid sending all metadata in all pages #713

Improve the way we consume collection(s) to avoid sending all metadata in all pages #713

DavidWells commented Sep 2, 2016

thangngoc89 commented Sep 2, 2016

MoOx commented Sep 3, 2016

thangngoc89 commented Sep 3, 2016

bloodyowl commented Sep 3, 2016

thangngoc89 commented Sep 4, 2016

bloodyowl commented Sep 4, 2016

thangngoc89 commented Sep 4, 2016

DavidWells commented Sep 19, 2016

bloodyowl commented Sep 19, 2016

thangngoc89 commented Sep 19, 2016

MoOx commented Sep 19, 2016

DavidWells commented Sep 19, 2016

lapidus commented Sep 29, 2016

MoOx commented Sep 30, 2016

bloodyowl commented Dec 22, 2016

DavidWells commented Dec 22, 2016

bloodyowl commented Dec 22, 2016 •

edited

Loading

DavidWells commented Dec 22, 2016

MoOx commented Dec 23, 2016

bloodyowl commented Dec 23, 2016

Improve the way we consume collection(s) to avoid sending all metadata in all pages #713

Improve the way we consume collection(s) to avoid sending all metadata in all pages #713

Comments

DavidWells commented Sep 2, 2016

thangngoc89 commented Sep 2, 2016

MoOx commented Sep 3, 2016

thangngoc89 commented Sep 3, 2016

bloodyowl commented Sep 3, 2016

Content definition

Consuming the data

Indexed queries

"Special" queries

Pagination

Configuration

Build configuration

thangngoc89 commented Sep 4, 2016

bloodyowl commented Sep 4, 2016

thangngoc89 commented Sep 4, 2016

DavidWells commented Sep 19, 2016

bloodyowl commented Sep 19, 2016

thangngoc89 commented Sep 19, 2016

MoOx commented Sep 19, 2016

DavidWells commented Sep 19, 2016

lapidus commented Sep 29, 2016

MoOx commented Sep 30, 2016

bloodyowl commented Dec 22, 2016

DavidWells commented Dec 22, 2016

bloodyowl commented Dec 22, 2016 • edited Loading

DavidWells commented Dec 22, 2016

MoOx commented Dec 23, 2016

bloodyowl commented Dec 23, 2016

bloodyowl commented Dec 22, 2016 •

edited

Loading