Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use single CPU for building site preview #41142

Merged
merged 1 commit into from
May 15, 2023

Conversation

tengqm
Copy link
Contributor

@tengqm tengqm commented May 15, 2023

Based on some experiments, I am suspecting that there are some concurrency issues when building the site for preview. In this PR, I'm proposing we limit the CPUs to 1 (which is actually what Netlify provides us today). The site preview build should normally complete within 4 minutes. The chance of a successful build is much higher than the default setting.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels May 15, 2023
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 15, 2023
Based on some experiments, I am suspecting that there are some concurrency
issues when building the site for preview. In this PR, I'm proposing we
limit the CPUs to 1 (which is actually what Netlify provides us today).
The site preview build should normally complete within 4 minutes.
The chance of a successful build is much higher than the default setting.
@netlify
Copy link

netlify bot commented May 15, 2023

Pull request preview available for checking

Name Link
🔨 Latest commit 68a4556
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/64619fe4478406000824ca8a
😎 Deploy Preview https://deploy-preview-41142--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@@ -40,7 +40,7 @@ build-preview: module-check ## Build site with drafts and future posts enabled
hugo --cleanDestinationDir --buildDrafts --buildFuture --environment preview

deploy-preview: ## Deploy preview site via netlify
hugo --cleanDestinationDir --enableGitInfo --buildFuture --environment preview -b $(DEPLOY_PRIME_URL)
GOMAXPROCS=1 hugo --cleanDestinationDir --enableGitInfo --buildFuture --environment preview -b $(DEPLOY_PRIME_URL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this limit to main branch builds as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GOMAXPROCS=1 hugo --cleanDestinationDir --enableGitInfo --buildFuture --environment preview -b $(DEPLOY_PRIME_URL)
GOMAXPROCS=2 hugo --cleanDestinationDir --enableGitInfo --buildFuture --environment preview -b $(DEPLOY_PRIME_URL)

could this work OK too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. The container we got from Netlify is 1C1G. I don't think relaxing this to 2 processors will buy us anything. I'm more inclined to start with 1 processor so that any concurrency bugs can be ruled out. Later on, if this proved to be working as expected, we can try relax the constraint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this limit to main branch builds as well?

I'm more concerned about the netlify preview at this stage. Surely we can apply this to main branch if a similar timeout issue is there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. I am less worried about the previews and more worried about the main branch builds (which produce the live site).

Anyway, if this change is helpful, I support it. Do we know why it helps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golang/go#33803 again 🙃

we should consider PR-ing hugo to have https://github.com/uber-go/automaxprocs in the meantime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With throttling to 1 CPU, should setting GOMAXPROCS to 32 explicitly lead to build times going from 4 to over 60 minutes? We get timeouts when Hugo uses the automatic setting which we assume is 32.

I can picture how that might happen, but those scenarios feel more like bugs in something (maybe an old kernel?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If netlify is hosting on a 32 core host but running in a container throttled to 1 core then Go will attempt to use 32 cores and will quickly exhaust the process's quota leading to aggressive throttling in the kernel. It's hard to say exactly what slowdown will result but it can be large.

If you control the execution environment it's often recommended to only set CPU requests (CFS shares -- proportional time ~guaranteed to schedule) and not limits (CFS quota -- time bound above which process is throttled) unless you're benchmarking. Linux cfs quota throttling is known to be heavy handed.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu

I took a quick peek through hugo yesterday and AFAICT it's using go's default runtime.GOMAXPROCS() which would match the number of cores on the underlying host (roughly speaking) rather than the cfs quotas (see linked go issue as wel).

I'm not sure where best to insert it into hugo, I'd usually say in main() but they have a very tiny main, I'd guess this affects many other projects and hugo would generally benefit from defaulting GOMAXPROCS in a CFS aware way until go standard runtime solves this (as java did already).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filing an upstream issue / PR

Copy link
Member

@BenTheElder BenTheElder May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sftim
Copy link
Contributor

sftim commented May 15, 2023

FWIW it looks like the deploy preview took 63.6s to build using Hugo.

@sftim
Copy link
Contributor

sftim commented May 15, 2023

Thanks!

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimani68, sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2023
@windsonsea
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1a240bc13e0d454eaf6a2d1f8dcab75cfc25d9a9

@sftim
Copy link
Contributor

sftim commented May 15, 2023

This may be a fix for #35780 and #40930

@tengqm tengqm deleted the serial-build branch May 15, 2023 14:15
@sftim
Copy link
Contributor

sftim commented May 15, 2023

This has really helped!

@BenTheElder
Copy link
Member

The next hugo release should be CPU quota aware 🤞 gohugoio/hugo#10952

@sftim
Copy link
Contributor

sftim commented May 24, 2023

PR #41301 copies this approach for live deploys.

Once we adopt Hugo 0.112 we should be able to drop the workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants