Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build git-recent-hours #2908

Closed
wants to merge 2 commits into from
Closed

Conversation

peterbe
Copy link
Contributor

@peterbe peterbe commented Feb 12, 2021

Part of #2642

This is pretty neat. It makes is possible to build only the documents that were modified recently.
Example:

BUILD_GIT_RECENT_HOURS=24 yarn build

But you need to set CONTENT_TRANSLATED_ROOT if you want it to be "realistic", so

export CONTENT_TRANSLATED_ROOT=~/dev/MOZILLA/MDN/translated-content-rendered/files
# alternative syntax
yarn build --git-recent-hours 6

Now, the (final) files it generates are built in a way that each document is aware of the translations. These files will (almost) be ready to upload to S3.

Bugs and caveats:

  • If you were to run the deployer and upload what got built, the /$locale/search-index.json would be only the built files. But whatever we did for not building all the sitemaps, we can do that here too.
  • The way it figures out what all the translations are is a bit hacky. Once all translations' slugs are Englishified, we'd be able to figure out all the English document's translations by simply "guessing".
  • If we were to run this in prod-build.yml every 6h we need to make sure the Deployer doesn't delete what WASN'T built (when it does it's "cleanup stuff" (hi @escattone ))
  • I've only tested this with the en-US content. But I think it should be possible (before this is an open PR) to make it work for translated content too.

I'm going to keep it as a draft for the moment. But I like the start.

build git-recent-hours
@peterbe
Copy link
Contributor Author

peterbe commented Feb 12, 2021

@escattone I have an idea! A really good one if I may.
You know how we have the very explicit prod-build.yml (and stage-build.yml). My first idea was to explode that and make it more advanced/complicated. But instead, I think we can make a fresh new file: partial-prod-build.yml which is very very similar. But...

  • To avoid uploading static assets (that are more a product of Yari rather than the content), we point the Deployer to client/build/en-us/docs instead of client/build. Or, perhaps a new flag, like poetry run deployer $BUILD_ROOT --only-documents
  • We don't trigger the Speedcurve ping
  • We use --update when uploading to Elasticsearch. Hardcoded so it can't accidentally do a fresh complete build.
  • No Deployer whatsdeployed for the code, only the latest content.
  • No upload of lambda functions
  • (once the deployer has the functionality) no deletion from S3 based on what wasn't built

So it'll be some redundant repetition in prod-build.yml compared to partial-prod-build.yml (and double replicated for stage) but it's pleasant to be able to know with confidence what it does without having to rely on clever if-statements or any logic like that.

@peterbe
Copy link
Contributor Author

peterbe commented Feb 12, 2021

What would be ideal, but not critical, is if we could use a "tagged" version of Yari in these builds. It should ideally be the version that was used in the last 24h complete build.

@peterbe
Copy link
Contributor Author

peterbe commented Feb 12, 2021

@fiji-flo If you can share some feelings on this too, much appreciated.

In terms of work left to do, I think we could ignore that it's only en-US for now. And we can hack around any shortcomings in Deployer.
I rushed the code so I would like to go over it one extra time before turning this into a real PR.
But the plan is feeling very realistic and it would reduce the time between merging content PRs until it gets published.

@fiji-flo
Copy link
Contributor

Looks like it works. It's still adding complexity, but much less than obvious alternatives. One thing to watch out for, do we enforce squash and merge? Using a normal merge last modified is the time of authoring isn't it?

@ghost
Copy link

ghost commented Feb 13, 2021

Couldn't we go back through a Commit instead of using time? I imagine myself marking the commit or tag where a deploy is done somewhere. Then go back to the files modified or touched by the current commit back to the committed commit and build those?
Comment to be taken with a grain of salt but I imagine it is more accurate.
I think all this has to be done with Git commands and somehow Node JS command line streams (I don't know I imagine a Git command whose result is parsed / split / to get the data you need)

Base automatically changed from master to main February 16, 2021 16:38
@peterbe
Copy link
Contributor Author

peterbe commented Feb 16, 2021

Looks like it works. It's still adding complexity, but much less than obvious alternatives. One thing to watch out for, do we enforce squash and merge? Using a normal merge last modified is the time of authoring isn't it?

Yes, it's more moving parts. (same with the new interactive buildDocments hehe)

One solution to the problem of a modified being older than the time window is that we run it like this:

  • Run cron every 2h
  • Filter on the last 3h

There'll be some overlap where the filtering picks up files that got built in the last 2h window, but we're still talking about very minor build times. The time it takes to build, say 10 files vs. 15 files is still in the single-digit seconds range.
After all, the rate our Yari builder builds at is about 25 documents per second.

@peterbe
Copy link
Contributor Author

peterbe commented Feb 16, 2021

@fiji-flo I chatted to @escattone about this draft PR and I've decided to keep playing with this draft. My plan is to hack on it when time allows. The plan:

  1. Add a thing to the Deployer that tells it to NOT bother doing a "diff" to see which S3 keys can be deleted. Similar to the Elasticsearch indexing CLI that has a --update option which "cakes on" rather than worrying about a fresh build.
  2. Change it to NOT generate the seach-index.json or any sitemaps.
  3. Create a partial-stage-build.yml that first doesn't actually do anything, but just see what it would have uploaded.

I wanna wait till the uplifting settles down and until Ryan is done with #2224

@github-actions github-actions bot added the 🐌 idle Issues and PRs without recent activity. Flagged for maintainer follow-up. label Nov 25, 2021
@schalkneethling schalkneethling added needs decision from engineering and removed 🐌 idle Issues and PRs without recent activity. Flagged for maintainer follow-up. labels Jan 12, 2022
@github-actions github-actions bot added the 🐌 idle Issues and PRs without recent activity. Flagged for maintainer follow-up. label Feb 18, 2022
@github-actions github-actions bot added the merge conflicts 🚧 Please rebase onto or merge the latest main. label May 16, 2022
@github-actions
Copy link
Contributor

This pull request has merge conflicts that must be resolved before we can merge this.

@github-actions github-actions bot closed this May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐌 idle Issues and PRs without recent activity. Flagged for maintainer follow-up. merge conflicts 🚧 Please rebase onto or merge the latest main. needs decision from engineering
Projects
Development

Successfully merging this pull request may close these issues.

3 participants