Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1925061: disable using MADV_FREE to release unused memory #73

Closed
wants to merge 1 commit into from

Conversation

paulfantom
Copy link

As in title. This is for testing a possible solution to OOMKilling prometheus during upgrades.

/cc @s-urbaniak @sdodson @bparees

@openshift-ci-robot
Copy link

@paulfantom: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

*: disable using MADV_FREE to release unused memory

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2021
@bparees bparees added the staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead). label Feb 12, 2021
@bparees
Copy link

bparees commented Feb 12, 2021

pre-emptively group-lead-approving so you can merge it if it helps.

@paulfantom paulfantom changed the title *: disable using MADV_FREE to release unused memory Bug 1925061: disable using MADV_FREE to release unused memory Feb 12, 2021
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Feb 12, 2021
@openshift-ci-robot
Copy link

@paulfantom: This pull request references Bugzilla bug 1925061, which is invalid:

  • expected the bug to target the "4.7.0" release, but it targets "---" instead
  • expected Bugzilla bug 1925061 to depend on a bug targeting a release in 4.8.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1925061: disable using MADV_FREE to release unused memory

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@paulfantom
Copy link
Author

/bugzilla refresh

@openshift-ci-robot
Copy link

@paulfantom: This pull request references Bugzilla bug 1925061, which is invalid:

  • expected the bug to target the "4.7.0" release, but it targets "---" instead
  • expected Bugzilla bug 1925061 to depend on a bug targeting a release in 4.8.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link

/lgtm

@openshift-ci-robot
Copy link

@rphillips: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: paulfantom, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rphillips
Copy link

/hold until ready

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2021
@smarterclayton
Copy link

I don't see how MADV_FREE fixes the issue. This is real memory that prometheus has to track. I still see a doubling of prometheus memory use over the entire run

@rphillips
Copy link

MADV_FREE may decrease the memory usage on compaction... it is worth the test to see if it helps.

@smarterclayton
Copy link

image

Upgrade is in the middle, I still see 2x growth in upgrade.

@simonpasquier
Copy link

IIUC MADV_FREE should already be set by the base image in 4.7 thanks to openshift/images#61?

@rphillips
Copy link

@simonpasquier I just checked my 4.7 cluster and the madvdontneed option is not enabled.

It looks like enabling the option might be recommended from upstream now: prometheus#8357 (comment)

@rphillips
Copy link

rphillips commented Feb 16, 2021

I tweaked the registry.ci.openshift.org/ocp/4.7:base image which is not being used here.

We should get this PR into 4.7 and 4.8.

@simonpasquier
Copy link

I tweaked the registry.ci.openshift.org/ocp/4.7:base image which is not being used here.

Right i was confused about which base image is used for Prometheus. It shouldn't hurt to use MADV_DONTNEED but I'm afraid it won't help much for bug 1925061 since we're not seeing memory increase due to compaction (or any other transient task which should release memory as fast as possible once it's completed).

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2021
@paulfantom
Copy link
Author

/close

@openshift-ci
Copy link

openshift-ci bot commented May 20, 2021

@paulfantom: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this May 20, 2021
@paulfantom
Copy link
Author

/reopen

@openshift-ci openshift-ci bot reopened this May 20, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 20, 2021

@paulfantom: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented May 20, 2021

@paulfantom: This pull request references Bugzilla bug 1925061, which is invalid:

  • expected the bug to target the "4.7.z" release, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1925061: disable using MADV_FREE to release unused memory

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented May 20, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: paulfantom, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented May 20, 2021

@paulfantom: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-upgrade 132567c link /test e2e-aws-upgrade
ci/prow/e2e-aws 132567c link /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 19, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jul 19, 2021
@openshift-ci
Copy link

openshift-ci bot commented Jul 19, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants