Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced build caching #1052

Closed
4 tasks
FredKSchott opened this issue Sep 14, 2020 · 11 comments
Closed
4 tasks

Advanced build caching #1052

FredKSchott opened this issue Sep 14, 2020 · 11 comments

Comments

@FredKSchott
Copy link
Owner

FredKSchott commented Sep 14, 2020

Background

While Snowpack was originally released to explore the power of ESM-based build tooling, we quickly realized that one of the biggest benefits of this new paradigm was cache efficiency. If you can cache each file build individually, you have some guarantees that you can cache it indefinitely. If a file never changes and your Snowpack config never changes, you are technically guaranteed to be able cache a file build result forever.

Our dev server was able to take advantage of this right away, but with a fallback in case we were wrong: serve the cached value, THEN build the file behind the scenes to confirm that no config changed that would affect the final build. If the two don't match, then clear the entire cache and reload the page. All of this happens fast enough that the user generally won't even notice.

Feature Overview

If we can create a system that lets us actually guarantee that a file's cached build is still accurate, then we can dramatically speed up our builds by only re-building the files that have changed since last time.

The same goes for installs: if we can cache the exact install targets that were used to create each web_modules installation, then we can cache and reuse installations across builds, when dependency install targets don't change.

If both of these are implemented, warm builds for sites of all sizes (builds that would normally take seconds or even full minutes) could drop down to single-digit seconds.

Feature Request

  • If your node_modules/ directory changes at all, we clear the cache (already handled for dependencies during dev, we just need to clear the build cache when this happens as well).
  • New snowpack config value: watchConfig: string[]
    • User can provide their own file system globs
    • If one of these files changes, the entire build cache is cleared on startup (or when it happens, if already running)
    • ex: "watchConfig": ["./config/*"] or ["../../babel.config.json"] to watch files outside the project directory
  • Plugins can add common globs to this config:
    • we already have a config() hook for this exact reason
    • Each plugin is able to provide a list of globs to watch in the current working directory
    • ex: babel plugin would provide watchConfig.push("**/babel.config.*")
  • New --hyperspeed mode tells build to use & update a cold cache
    • dev has a cold cache, now build needs one too
    • This "Incremental" mode is designed to be opt-in.
    • When you enable it, you understand that any non-standard build config must be added to config.
Previously: Open Question: how to detect config changes?

Only two things cause a file's cache to become stale: a change to the source file or a change to any one build plugin's configuration.

  • Change to source file: We already include this in our cache, so this can be considered done.
  • Change to dependencies: We already trigger a cache clean if we detect that your dependencies have changed.
  • ⚠️ Change to any plugins configuration: This one is tricky, and where most of the work for this will go.

How can we detect changes to config like babel.config.json or postcss.config.js? A few ideas:

  • Automatically watch changes to any file with .config. in the name (or, any file in a config folder, configurable).
  • Plugins tell Snowpack which config files to listen to. For example, @snowpack/plugin-babel tells Snowpack about babel.config.js/json.
  • Plugins are responsible for loading config, and then returning a hash of the config used to build the file? Babel provides a way to load the exact config before transpiling a file, but I'm unsure if postcss provides a way to do this (it may just do the file reading itself). Either way, this is more work for plugin authors and wouldn't support CLI based build scripts.
  • Some other idea?
  • Some combination of the above?

If we can guarantee that Snowpack will detect all relevant config changes, then I believe we can reuse built files much better then we currently are.

@joshwilsonvu
Copy link
Contributor

joshwilsonvu commented Sep 30, 2020

I've only made a small contribution to Snowpack, but I'll share some thoughts anyway!

Babel discusses the caching issue here, and points out that .js config files are difficult to cache. I would recommend not caching any sort of .js config file, because a change in a dependency could change behavior without any modification to the config file itself. This problem could manifest with .json files as well through fields like "extends". As an extreme example, Babel might even be configured to use a plugin written within the source tree, which could change behavior dramatically without touching babel.config.json, and Babel itself would have no idea something has changed. Point is, "guarantee[ing] that Snowpack will detect all relevant config changes" is very hard.

However, I would expect that watching any files that look like a config file, and any files that plugins tell Snowpack about (options 1 & 2), should cover 90% of cases. So we need a way to make the common case work, and an escape hatch for the other 10%, i.e. reverting to the present behavior. This could piggyback on the build --clean option, or adding a --reload option, or a new option entirely. In addition to that, we would need to let users know that this optimization might cause some obscure issues and to see if this new option clears it up.

@FredKSchott
Copy link
Owner Author

All contributions are appreciated, regardless of size! :D

Great points! I bet we could get over 90% confidence here by also always clearing the cache when dependencies change (we already have a check for this in Snowpack) + giving you a way to mark other files to be watched. For example, if you load some config file outside of your working directory, you could add config that tells Snowpack to watch that file as well.

This also may point to the fact that this may never be default-on. If it's opt-in (ex: --cache or --incremental) then we can make sure that you understand what the requirements are.

@alubbe
Copy link

alubbe commented Oct 2, 2020

On top of just rebuilding after any changes to the dependencies, can't we fix the config fingerprinting/invalidating issue by not using fingerprints of the file(s), but instead building up a complete configuration object in memory from all files/settings/relevant environment variables/etc. and then fingerprint that?

@joshwilsonvu
Copy link
Contributor

Even if we do build up the complete configuration objects in memory, it's no silver bullet. There may still be pathological cases that prevent it from working 100% of the time (ex. Babel can't detect changes in plugin implementations). I've got nothing against fingerprinting the complete loaded config, but the extra dev time and run time it takes to do that may or may not be worth dropping a 5% failure rate to 2%. It's impossible to know what those rates actually are without profiling all real-world usage.

We could leave that decision to the plugins: return an array of extra paths for Snowpack to watch, or return a string hash of the configuration object. Snowpack itself could support both without too much complexity.

The opt-in idea would ensure that Snowpack continues to Just Work, and hopefully most users will take advantage of it. You could always make it default-on later.

@FredKSchott
Copy link
Owner Author

Yea, I even looked at @babel/core and it doesn't give you a clear API to find out which config files you loaded from. You're right that we could load the options object ourselves, and do some serialization of the object, but it's slightly less explicit when that would or wouldn't work.

I think the design that's appearing from this conversation is something like:

  • If your node_modules directory changes at all, we clear the cache
  • Each plugin is able to provide a list of globs to watch in the current working directory (ex: babel would provide `"watchConfig": ["babel.config.*"]).
  • The user is able to provide their own "watchConfig" (ex: "watchConfig": ["./config/*"] or ["../../babel.config.json"] to watch files outside the project directory).
  • "Incremental" mode is designed to be opt-in. When you enable it, you understand that any non-standard config that you use must be added to a watchConfig array).

This would be explicit, reasonably easy to debug (with good --verbose logs), and manually adding new config files to watch would be a straightforward manual intervention for custom/weird setups.

@FredKSchott
Copy link
Owner Author

FredKSchott commented Oct 2, 2020

Also, I'm starting to like the idea of going full Elon Musk and calling this --hyperspeed mode, or something equally silly 😄

@alubbe
Copy link

alubbe commented Oct 2, 2020

I like --hyperspeed - it also implies that the users have to pay attention or things might wrong ;)

One more thing about our codebase: we have 1000s of jsx files, but only 100s of dependencies. That makes it pretty slow to load in build mode, even over http2. Would it be possible to bundle together all of our jsx files into one (without any optimizations whatsoever), but keep the dependencies separate? That way the browser only has to assemble 100-200 files instead of 1000-1500 per page/route and with his new --hyperspeed mode, it would also build extremely fast on most changes (that don't modify/add/remove dependencies).

@FredKSchott
Copy link
Owner Author

100s of dependencies??? 🙀 Hope that this is a typo!

When you say "load in build mode", do you mean that the initial build itself takes a long time, or that after the initial build when you go to load the site in the browser that takes a long time? Is this a problem in the final build, or during development? And lastly, are you seeing this slow load time when you load files from your own machine or are you loading these files over the network?

If you're using your own server, Etag support would allow for better caching in the browser, even during development.

@FredKSchott
Copy link
Owner Author

Alright, updated the main issue with an updated plan. Would love any help on any of these broken-down steps!

@alubbe
Copy link

alubbe commented Dec 3, 2020

hey ya - just wanted to check in on this issue. is there some way to help out getting started with this, or is the current focus on new snowpack features elsewhere?

@FredKSchott
Copy link
Owner Author

Hey! I'd still love to see this, but it's advanced enough and nice-to-have enough that we need to focus elsewhere for now. I'll reopen this once v3 is out the door and we can revisit. Would still love help in other issues though, if you're interested in contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants