Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Incremental builds stage 1 - rebuilding affected packages/apps. #147

Closed
threepointone opened this issue Nov 16, 2020 · 2 comments
Closed
Labels
enhancement New feature or request

Comments

@threepointone
Copy link
Contributor

threepointone commented Nov 16, 2020

(This is about 'builds', i.e. generating production artefacts. This isn't about incremental tests/lint/typecheck, though they're deeply informed by this document, and we'll talk about them separately later.)

We can approach the idea of incremental builds in phases; starting from a coarse granularity of the build sizes, moving to smaller scopes. This document talks about the coarsest granularity; the workspace.

Rebuilding only affected workspaces:

First, there exists the idea of a production revision id; the last git revision at which a release was made. This id can either be stored in a database and retrieved/updated by your build processes, or it can be derived in some way from the project repository itself; for example, examining git tags which update on every release.

Second, there exists the idea of a dependency graph; this is represented by a mapping of workspace name => dependencies@(optional version). This can be generated by cycling through the codebase and extracting imports/requires (excluding any imports local to that workspace) from all source files (excluding tests). These dependencies will be of two types:

  • Third-party deps/peerdeps (like 'react') in the workspace's package.json, or in the root package.json. For these we also pick up the specific version that this dependency resolves to (probably from yarn.lock)
  • A package exported by a local workspace.

So, given a production revision id, we can determine which workspaces have to be rebuilt by following these steps:

  • Find all source files that have changed since the production revision id to the current id, and group them by workspace.
  • Generate 2 dependency graphs; one at the production revision id, and one at the current revision id.
  • If any of a workspace's source files has changed, mark it as 'dirty' (i.e - it requires a rebuild)
  • If any of the workspace's third party dependencies have changed versions, mark it as dirty.
  • If any of the workspace's dependent workspaces are dirty, mark it as dirty.

The workspaces marked as 'dirty' are the packages/apps that have to be rebuilt. Run builds on them, and deploy. Done.

This above is useful for a vast number of usecases. It also feels it's detailed enough that we can work on this fairly quickly and ship it, so I think that's what we should do. (I might try to take this on myself soon, unless someone else does before me.)


Addendum: Better.

In the next phase, we'll work on using features like webpack 5's persistent caching/module federation to split a big app's build into smaller chunks (possibly aligned with our idea of 'views', much like nextjs uses the same strategy with 'pages') and build them individually, finally combining them on deploy; this gives the advantage of applying caching/no-ops on each of those chunks, and reducing the build time for a given application. The cons are figuring out which deps are common to all packages and optimising the delivery of those packages and how it'll affect overall builds, but there should be an opportunity for best-effort attempts there. webpack 5 support will land in create-react-app at some point (probably in 4.1, tracking here facebook/create-react-app#9994), so we can start exploring this as an option whenever.

And in a further phase, we could possibly build a whole new bundler that's deeply integrated with serving infrastructure; depending on actual usage patterns, we could use product analytics/module usage to determine optimal chunks to be constructed, and even predict which chunks to load based on where we think a user will navigate to next. This is a longer term project, and it's unclear just yet what steps to take; regardless it'll be something that we (JP Morgan) will likely invest time and effort into at some point.

@threepointone threepointone added the enhancement New feature or request label Nov 16, 2020
@threepointone
Copy link
Contributor Author

threepointone commented Nov 17, 2020

An alternate proposal: If one is using changesets, then the presence of changesets is what determines which packages are to be built. The con here is that you have to be super disciplined, having a changeset even before starting work on a feature, that sort of thing. The benefit tho, is that it's trivial to know which packages to build by just examining active changesets.

(Well. maybe it's not as trivial as that. You still have to track package.json changes, and dependencies across workspaces.)

@LukeSheard
Copy link
Contributor

Closing this in favour of something like turborepo which abstracts incremental changes / builds already. There should be some CI layer which is responsible for this - not modular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants