Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data awareness and Data/Pod affinity #139

Closed
23 tasks
Moinheart opened this issue Nov 7, 2016 · 21 comments
Closed
23 tasks

Data awareness and Data/Pod affinity #139

Moinheart opened this issue Nov 7, 2016 · 21 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@Moinheart
Copy link

Moinheart commented Nov 7, 2016

Description

Kubernetes is aware of the state of data in local volumes of nodes, so that scheduler could make better decisions for pods. Pods' definitions could require local volumes and specific data when they are created, scheduler preferentially places pods to nodes which already have needed data/local volumes, and data would be transmissed to nodes which pods are running on.

Progress Tracker

  • Alpha
    • Write and maintain draft quality doc
      • During development keep a doc up-to-date about the desired experience of the feature and how someone can try the feature in its current state. Think of it as the README of your new feature and a skeleton for the docs to be written before the Kubernetes release. Paste link to Google Doc: DOC-LINK
    • Design Approval
      • Design Proposal. This goes under docs/proposals. Doing a proposal as a PR allows line-by-line commenting from community, and creates the basis for later design documentation. Paste link to merged design proposal here: PROPOSAL-NUMBER
      • Decide which repo this feature's code will be checked into. Not everything needs to land in the core kubernetes repo. REPO-NAME
      • Initial API review (if API). Maybe same PR as design doc. PR-NUMBER
        • Any code that changes an API (/pkg/apis/...)
        • cc @kubernetes/api
      • Identify shepherd (your SIG lead and/or kubernetes-pm@googlegroups.com will be able to help you). My Shepherd is: replace.me@replaceme.com (and/or GH Handle)
        • A shepherd is an individual who will help acquaint you with the process of getting your feature into the repo, identify reviewers and provide feedback on the feature. They are not (necessarily) the code reviewer of the feature, or tech lead for the area.
        • The shepherd is not responsible for showing up to Kubernetes-PM meetings and/or communicating if the feature is on-track to make the release goals. That is still your responsibility.
      • Identify secondary/backup contact point. My Secondary Contact Point is: replace.me@replaceme.com (and/or GH Handle)
    • Write (code + tests + docs) then get them merged. ALL-PR-NUMBERS
      • Code needs to be disabled by default. Verified by code OWNERS
      • Minimal testing
      • Minimal docs
        • cc @kubernetes/docs on docs PR
        • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
        • New apis: Glossary Section Item in the docs repo: kubernetes/kubernetes.github.io
      • Update release notes
  • Beta
    • Testing is sufficient for beta
    • User docs with tutorials
      - Updated walkthrough / tutorial in the docs repo: kubernetes/kubernetes.github.io
      - cc @kubernetes/docs on docs PR
      - cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Thorough API review
      • cc @kubernetes/api
  • Stable
    • docs/proposals/foo.md moved to docs/design/foo.md
      - cc @kubernetes/feature-reviewers on this issue to get approval before checking this off
    • Soak, load testing
    • detailed user docs and examples
      • cc @kubernetes/docs
      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

FEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers.
FEATURE_STATUS: IN_DEVELOPMENT

More advice:

Design

  • Once you get LGTM from a @kubernetes/feature-reviewers member, you can check this checkbox, and the reviewer will apply the "design-complete" label.

Coding

  • Use as many PRs as you need. Write tests in the same or different PRs, as is convenient for you.
  • As each PR is merged, add a comment to this issue referencing the PRs. Code goes in the http://github.com/kubernetes/kubernetes repository,
    and sometimes http://github.com/kubernetes/contrib, or other repos.
  • When you are done with the code, apply the "code-complete" label.
  • When the feature has user docs, please add a comment mentioning @kubernetes/feature-reviewers and they will
    check that the code matches the proposed feature and design, and that everything is done, and that there is adequate
    testing. They won't do detailed code review: that already happened when your PRs were reviewed.
    When that is done, you can check this box and the reviewer will apply the "code-complete" label.

Docs

  • Write user docs and get them merged in.
  • User docs go into http://github.com/kubernetes/kubernetes.github.io.
  • When the feature has user docs, please add a comment mentioning @kubernetes/docs.
  • When you get LGTM, you can check this checkbox, and the reviewer will apply the "docs-complete" label.
@Moinheart
Copy link
Author

@davidopp @erictune

@erictune
Copy link
Member

erictune commented Nov 7, 2016

Except for the transmission part, this is a duplicate of kubernetes/kubernetes#7562

I think @bprashanth has more state on this topic.

@idvoretskyi
Copy link
Member

@Moinheart which SIG is responsible for this feature?

@davidopp
Copy link
Member

davidopp commented Nov 7, 2016

Besides #7562 the following are relevant:
#121
kubernetes/kubernetes#30044 (sticky emptydir)

I agree with @erictune that this sounds a lot like sticky emptydir. @Moinheart can you read the proposal in kubernetes/kubernetes#30044 (other than the part you described about moving data) and describe if/how it is different from what you want?

@Moinheart
Copy link
Author

@erictune @davidopp
#30044 sticky emptydir is more talking about pod/volume affinity, in #30044 if local volumes are ready on nodes, we assume needed data is ready too. However, we couldn't be sure of that. I believe in many use cases, data is as import as software(pod), if k8s doesn't know the exact state of data, pods would be serving traffic without required data sometimes. Volumes are carrier and data is exactly needed for pods, it's a part of difference from #30044. k8s could make more delicate orchestration of both data and pods for high availability if k8s is aware of data.

@idvoretskyi
I don't know which SIG(s) could be responsible for this feature yet.

@Moinheart
Copy link
Author

About moving data part, we could seek support from some open source DFS or start a new project in k8s incubator for this data awareness. It would work that local volumes on nodes are collected(by label) to build a special DFS but I don't deep think it yet. k8s could get state of data through APIs of new DFS, and DFS could do the data transmission.

@Moinheart
Copy link
Author

If anything what I post makes you confused, please let me know.

@erictune
Copy link
Member

erictune commented Nov 8, 2016

A suggested approach, which does not require kubernetes changes, is to use an
init-container which does this: "if data directory empty then copy from
DFS otherwise do nothing.

http://kubernetes.io/docs/user-guide/production-pods/#handling-initialization

On Mon, Nov 7, 2016 at 10:36 PM, Wu Junhao notifications@github.com wrote:

If anything what I post makes you confused, please let me know.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#139 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHuudnVjBwxxQawNgSNVrSE7o1Dadyf7ks5q8BhggaJpZM4Kq8T5
.

@idvoretskyi
Copy link
Member

@Moinheart please, follow the process of submitting the feature request, described here - https://github.com/kubernetes/features/blob/master/README.md.

A feature may be filed once there is consensus in at least one Kubernetes SIG.

@Moinheart
Copy link
Author

@idvoretskyi
sig-scheduling

@Moinheart
Copy link
Author

@erictune
It's a useful feature to solve data dependency for pods, but in my use case there are two questions to use this beta feature. The first one is low performance, we have about 2k pods and each pod requires 100GB data, it would be overloaded for our data repo if every pod fetch data from the repo. The another one is that it is lack of runtime orchestration, as I mentioned in previous post(https://groups.google.com/forum/#!topic/kubernetes-dev/rWSmWpDr6JU), data would be replaced by brand new one each x hours(this is also a reason I need a DFS to do the data transmission besides for data awareness).

@Moinheart
Copy link
Author

update: currently we have about 20k pods

@idvoretskyi idvoretskyi added this to the next-milestone milestone Nov 9, 2016
@idvoretskyi idvoretskyi added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Nov 9, 2016
@idvoretskyi idvoretskyi modified the milestones: v1.6, next-milestone Dec 13, 2016
@krmayankk
Copy link

Is this agreed that this is not the same as sticky emptyDir proposal and is indeed a different proposal mainly because of copying data from one local node to another ? @smarterclayton @bprashanth

@erictune
Copy link
Member

erictune commented Jan 2, 2017

Another thing different about this proposal, IIUC, is the request for multiple pods to share a single local Volume (deduplication of data, implied by pictures in https://groups.google.com/forum/#!topic/kubernetes-dev/rWSmWpDr6JU). This seems like it might not be compatible with stickyEmptyDir.

If people desire this feature, it would be good to bring it up on kubernetes/kubernetes#30044

@davidopp
Copy link
Member

davidopp commented Jan 4, 2017

My interpretation of the sticky emptyDir proposal was that it would allow multiple pods to share a local volume. In particular this part

The scheduler will schedule the pod paying attention to any referenced PVs. If an empty dir PV attached to an existing node is found, that trumps all scheduler predicates. Otherwise the scheduler makes a scheduling decision taking into account pod cpu/memory requests (and in the future, requested storage), attaches it to a node, and all subsequent scheduling decisions for any pod referencing the same PV evaluate to the same node.

seems to be handling the case where Pod 2 wants to use the same local volume as Pod 1 (regardless of whether Pod 1 is still running).

@idvoretskyi
Copy link
Member

@Moinheart please, update the feature request with the design proposal.
Also, which release does this feature target?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2017
@Moinheart
Copy link
Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 10, 2018
@justaugustus
Copy link
Member

@Moinheart Is there any worked planned for this feature in the 1.11 release?
If so, can you please update the feature description according to the ISSUE_TEMPLATE?

In general, this feature issue needs to be actively maintained by someone or we need to make the determination that it is truly stale.

/remove-lifecycle frozen
cc @idvoretskyi

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 16, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 16, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

8 participants