-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD as a kubernetes SIG #15875
Comments
Thanks @logicalhan, is it possible that sharing this doc to wider audience in etcd community? |
Done |
Thanks for sharing this @logicalhan. I'll put my thoughts here, I can replicate to the document if it would be helpful though it's already quite cluttered and I'm conscious that I'm writing this as someone who has only been making an effort to be involved with etcd for five months: Feedback
Agree in terms of activity levels from some maintainers and the departure of some maintainers, however this statement and the broader document doesn't acknowledge recent improvements to the project which have in my view made the project more healthy. The challenge is it takes a long time for newer contributors to learn all the etcd codebase and get to the point of becoming a new maintainer. Refer later comments in cncf/toc#898
This needs to be corrected in the document, AWS EKS runs 3.5.x. Additionally OpenShift which has a significant user base runs 3.5.x.
I believe this should be updated to reflect the work fixing known data consistency issues and in my view the growing confidence we can have in later releases of 3.5.x. Overall for what it is worth I support the closer integration of etcd with the kubernetes project and the creation of However with that said I don't think an all at once style change right now is a good idea. Adopting all kubernetes/sig processes straight away, or trying to do a shift or fork in codebase would create an overwhelming amount of work for the two primary maintainers that are already stretched. If the proposal does go ahead I would hope that it would be in a phased approach where things can start with a sig or wg in kubernetes that is focused on resolving issues from the perspective of kubernetes and working out what further process / community integration makes sense in a gradual, phased approach. |
Great 😄
I'm definitely not suggesting a fork in the codebase.
I believe this is quite reasonable, but again, I'm deferring execution as well as the decision to become a SIG to the community, to the current etcd maintainers and etcd leadership. You guys are best equipped to make the decisions which impact your community. |
For me the most important thing is stopping etcd being treated as a second class citizen, just one of over hundred CNCF projects. My escalation about declining community was only one many escalations done by previous maintainers. Things have gotten better, however I think we would see another downturn as getting to project to self-sustainable level would be a huge effort. I see SIG-ification as a best way for project to reach self sustainability. Other problem is the immense toil of being independent project. Managing our own community, documentation, infrastructure, test and release processes is a huge effort that is put solely on maintainers. We are almost a 10 year old project however we still haven't figured out those things. I have escalated multiple times that etcd binaries are still being compiled on maintainers workstations, even to Kubernetes release team, still there was no answer. Why should they? etcd is not part of Kubernetes. I think etcd is critical for Kubernetes success and deserves being first class citizen. I want the project and it's community to be treated as such. |
I'm +1 on tighter involvement / official representation of the etcd project within Kubernetes. However, the overhead of SIG administration and the need for ~anything done here to involve sig-api-machinery would make me recommend this be a subproject of sig-api-machinery instead of a standalone SIG. |
Having discussed this with api-machinery leads (@jpbetz and @deads2k by proxy), that path is actually not on the table. |
My view (not necessarily representative of others):
|
The size of api-machinery already makes it difficult to have over-arching decision making. While subprojects can help, at a certain size, those subprojects need to become self-governing to avoid api-machinery becoming a smaller sig-arch. Etcd is an entire product today and it is neatly separated already. From the perspective of etcd leads, a self-governing subproject doesn't reduce workload, but does increase the depth of a hierarchy. It's not clear (to me), the benefit that the etcd project or the kube project will gain from that depth of hierarchy. |
It's less a question of size, and more a question of coordination / overhead. I would advocate for whatever structure leads to the most clear and efficient driving of the actual work we want this group to accomplish.
Within kubernetes, etcd is exclusive to apimachinery, right? It seems weird to me for a kubernetes sig to be the place to make decisions about non-k8s usecases (my comments are non-blocking, mostly just wanted to point out there are lighter-weight ways to represent / involve etcd in the Kubernetes project) |
As someone who has worked on etcd, I understand how complex the codebase is and I have deep respect for the active maintainers of the project. etcd has worked closely with SIG scalability in the past and has a lot of advanced testing needs. I wouldn't be surprised if the etcd project efforts around test, downgrades, performance and scale lead to as much (maybe more) coordination with other SIGs as with api-machinery. For that reason, I favor etcd be a top level entity. I agree with @deads2k that etcd as a api-machinery subproject is likely to lead to a hierarchy of mostly independent projects, which is organizationally different than the other sub-projects.
+1 |
As co-chair of SIG API Machinery I want to echo Joe and David's position:
Plus having etcd full SIG status would make it easier to manage it's own lifecycle, relationship with other SIGs and the Release process of K8s. SIG API Machinery is already too large and that doesn't come for free, and I would be worried to the level of "service" (meaning attention and leadership) that we can provide if we keep making it bigger. |
I think the size of sig-apimachinery on its own doesn't convince me (although I agree with this assessment), but the existing separation (mentioned by David), the interactions with other SIGs (mentioned by Joe) make sense to me. I certainly didn't want to block the proposal too (as Jordan), I was just trying to understand it better. I think given the discussion above I'm more on-board with it now - thanks! |
Thanks everyone for their suggestion. Looks like there is there is no strong opposition. The next step will be sending the user survey proposed by @jberkus to understand non-K8s usage of etcd better. Please take a look https://docs.google.com/document/d/122pDnsoWxnGsvCgqGtgRN2i56hjgSpNubCbQ2KkMquU/edit?usp=sharing Feedback is appreciated. |
|
Note: etcd has been a standalone project for almost 10 years. There are definitely Non-Kubernetes use cases (#projects-using-etcd); and there are also use cases which use bbolt (#other-projects-using-bolt) or raft (#notable-users) directly. The motivation of SIGfying etcd is (1) prevent etcd state from declining, and (2) not to break any thing in Kubernetes. It makes sense, but it doesn't mean it will break the statement " |
Currently, I see https://github.com/apache/apisix https://github.com/api7/etcd-adapter are using etcd for auth. ping @tao12345666333
For having more active reviewer/maintainers, I believe the Move to community membership model closer to kubernetes one is making etcd better right now. The org has more active members over past several months. But the fact is that there are only active 4 maintainers in etcd repo. The new SIG might bring more burden on current maintainers. Suggest to have multiple phases before introduing SIG, for instance, having a new reviwer team members. |
Gratitude for the reminder from @fuweid . Etcd is an integral component of Apache APISIX; our team has addressed numerous etcd-related feedback from community and contributed bug fixes to the upstream. Indeed, the development of the etcd project holds great significance for us. Back to this proposal, I'm +1 on it. Currently, we can start by adding reviewers and we are willing to participate in it. |
FWIW we depend on etcd inside of https://github.com/purpleidea/mgmt/ and while we're not a big giant project like kube, we're working hard to build something cool! I think it would be valuable to keep etcd separate from kube as much as possible, but at the same time, I'd love to see more investment in developer hours from the various kube companies that consume etcd. I'm not sure a sig is needed for that. |
FWIW - It becoming a K8s SIG won't stop it from being able to be used for other projects. K8s brings to the table a lot of resources that can help reduce some of the burdens on the current maintainers. e.g., we have the Security Response Committee and a well-defined process for assessing risk and triaging vulnerabilities. There are also things like our Kubernetes Enhancement Proposal process Product Readiness Reviews that bring a big pool of engineers with a very good grasp on api design, concurrency, and scalability. Nothing is stopping etcd from implementing something similar or for those engineers to get involved and work here. Still, there's something to be said for being able to just 'drop in' and be part of an already existing system that is well staffed and well supported. We haven't discussed much of how K8s would adopt etcd from a github administrative side of things - don't want to discuss that too much at this point, cart before horse etc etc. but etcd wouldn't HAVE to be fully migrated under one of the K8s orgs, we have 7 orgs that fall under the Kubernetes umbrella already. The only true hard requirements I can think of right now is that it would have to use the CNCF CLA instead of DCO, added to our GitHub Enterprise account, and org membership would have to follow our guidelines. |
FYI: This is a deal breaker for some Free Software communities who are against CLA's. |
The etcd DCO is a CLA, it's not just the "official" CNCF CLA. |
The standard git/linux DCO is generally much more widely accepted than third-party CLA's. In fact, I don't really know why CNCF needs anything stronger than the even DCO for all these non-copyleft projects. @mrbobbytables pointed out an important point about the CLA (thank you!), and I just wanted to explicitly spell it out for those that might not know the legal implications and community feelings of some. |
I mean, the actual license agreements (from what I can tell) are basically the same, the only difference is a DCO is a passive agreement, while a CLA is an active and explicit agreement. |
Anyway, this is derailing this issue. |
KEP is a great idea. Documentation around architecture/performance decisions is missing in etcd. PRR seams like a good idea, but there are only 3 approvers on the list. Security Response Committee is also a good idea, but are we going to get dedicated person or are we only getting access to private discussion groups? Overall I support the sprit of this proposal - |
Yeah we just need to add stuff here, if I understand @mrbobbytables correctly, which would allow us to get the automation, without any actual changes to etcd repos (except the inclusion of OWNERS files and such). It's actually quite nice, we end up getting stuff like directory level ACLs; etcd ACLs today are all root.
I appreciate the sentiment. I was personally leaving the future path vague as to not inject my own opinions into the discussion. I believe that the etcd community should determine how integration looks, what kind of timeline this process should take, how deep the integration goes. I just believe that a tighter integration with Kubernetes is net positive sum, which is why I suggest it in the first place. |
In the current set yes, but there's an ongoing program to increase that again. There's a group of us shadows that are working towards becoming the next set of approvers who also do the initial PRR reviews before the full-approvers do final approval. Referring to the the PRR review dashboard, I believe in Kubernetes 1.27 timeframe there were 7 PRR shadows in addition to the 3 active full-approvers.
SRC is a group that collectively triage incoming security bug reports for the entire kubernetes project and work with subproject owners to handle relevant incoming issues and the bug bounty program. You don't join their groups. Users reporting issues reach out to their private groups and SRC reaches out to maintainers by other means. They staff sort of a clearing-house for incoming vulnerabilities in all subprojects. I've recently fielded a report from the bug bounty to SRC for https://registry.k8s.io and I'm very thankful to the support from SRC for handling the bulk of the process and leaving the project to focus only on implementing the fix. I personally think having SRC support is very valuable. Kubernetes has a number of groups, tools, and procedures like this to handle project management boilerplate (e.g. creating and securing new github repos, managing groups and permissions, slack moderation and tooling, release hosting / content distribution, CI and cloud resources management, etc.). While these don't magically scale up infinitely, I think there's room for etcd and I personally find these helpful for maintaining subprojects without having to handle all of these from scratch. There's a lot of boilerplate for supporting a project like this.
I would further add that Kubernetes projects are usually meant to be re-usable by other projects, though sometimes if we've custom built something like the slack-infra tools or image hosting infrastructure implementation of course the project's own needs have come first ... Prioritizing Kubernetes use-cases for etcd vs other projects would remain up to the etcd owners / maintainers. Kubernetes's governance is such that a SIG generally has full say over the lifecycle of its subprojects and is the ultimate technical authority, so a "SIG Etcd" made of etcd maintainers would remain in control of etcd's project lifecycle and would continue to have etcd project-level owners making technical decisions. The Steering Committee does exist above SIGs but only as a non-technical escalation path when things don't fall clearly to a SIG and Steering delegates all technical ownership to SIGs. Steering does things like establishing the governance ground rules and approving scope changes to deconflict between SIGs, and approving adding new SIGs. SIGs have their own governance to handle things like electing new SIG leadership, though there are established baseline recommendations and minimal requirements (e.g. SIGs must submit an annual report to steering on the health of the SIG). What the priorities are for etcd remains a distinct question from being a top-level CNCF project vs operating under Kubernetes's umbrella. |
This reminds me, I think we should create a project charter similar to K8s charter guidelines. It should define the project scope and priorities so whole community can focus and make etcd serve it's users best. This is even perfect, as we can ensure that non-k8s users are heard and included. The expectation that it will not change when etcd joins K8s and will be later used as SIG charter. |
Why isn't the CLA something we're allowed to discuss here? Am I understand this correctly that if etcd were to become a SIG, it would be required for the project to adopt the CNCF CLA? Requiring individuals to sign random documents they aren't necessarily comfortable with nor do they necessarily completely understand all to reduce risk for a giant organization is not a friendly thing to communities, nor is it good for encouraging non-corporate contributions. |
A few points:
You are free to discuss whatever you want. But I'm not obligated to agree with you on it's relevance. |
@logicalhan a DCO is not the same as a CLA. Particularly, one required step for Etcd to join Kubernetes is that all current etcd contributors would need to sign the Kubernetes CLA. That's not prohibitive, but needs to be considered as part of the migration process. |
That's why I said this earlier:
|
@logicalhan @liggitt @wojtek-t Having representatives from the etcd project in Kubernetes is one thing. Making etcd a sub-project of Kubernetes is a totally different thing. If we want to discuss the second one, we need to understand the benefit of doing that for the etcd project, not vice versa. Moreover, would sig machinery have the bandwidth to provide additional technical resources to the etcd project? Do the engineers have enough technical expertise to dive into etcd project? As someone who worked on both etcd and Kube API machinery (as an internal "fork") extensively, I would say the two projects are pretty different and do not share much common knowledge and context. I do not really see the benefits of making etcd a subproject. |
This entire thread is full of discussing potential benefits to Etcd, not the other way around. The only benefit to Kubernetes if accepted is attempting to help a critical dependency. None other have been mentioned. still an open question if this makes sense, but nobody is talking about anything other than benefitting etcd.
If you read the prior comments, API machinery leaders do not want to make it a subproject. The discussion is about being a SIG on the same level as SIG API Machinery. |
I failed to see how. The things ( process, doc, etc. ) you mentioned are important but not the most critical aspect to the etcd project based on my experience. As I mentioned above, we need dedicated engineers on this project to make it better. It has been an issue for a while. I want to understand how can we make this better. How can we maintain and attract long term and dedicated contributors. I did read the thread so I cc’ed Jordan and Wojciech about the sub project thing. |
Staffed efforts are mentioned, not just processes and docs. Things like handling incoming security reports and running bug bounties take staffing.
In my opinion, attracting contributors generally requires but does not end with many of these other things mentioned above. EDIT: which is not to say etcd has no community or support. I mean that I wouldn't be so dismissive of things that are not "core technical contributors".
Well ... etcd as an API Machinery subproject was already pretty much rejected by both current Technical Leads of API machinery (@jpbetz + @deads2k) and both of the Chairs (@deads2k and @fedebongio) so I really don't think that needs retreading ... #15875 (comment) @jpbetz also speaks as a former maintainer of etcd. |
@xiang90 the process, docs etc are meant to take some of the burden off the current maintainers. re: dedicated engineers - like it or not, a pattern we have repeatedly seen, is its easier to justify FTE time for the umbrella project of "Kubernetes" than it is for other things like etcd, coredns etc =/ It's not fair by any means...but just tends to play out that way. There have been numerous pleas for more dedicated engineers - its gone up to the CNCF GB (I believe twice now) to try and escalate the risk with a wider audience and it hasn't had a lot of success. The K8s route is something that hasn't been tried yet and even if it doesn't net dedicated FTEs, it does at least offload some of the other maintainer responsibilities. |
just my two cents as a former maintainer. as long as we are not trying to solve the core issue, i will not expect to see things get significantly better. try what you think make sense otherwise. :P |
It makes sense, and it's exactly one of the reasons why we (and I) agree to SIGi-fying etcd. But as we already discussed, SIGi-fying doesn't mean that we have to change (e.g. fork or copy) the codebase.
etcd's really a complicated project, let alone adding bbolt and raft. It's not easy for any new contributor to dig into the core quickly. So personally if any former maintainers wanna come back, it would be great. You just need to raise a PR to request to rejoin, and of course with sustained contributions. |
With agreement from etcd maintainers, community and SIG api machinery leads I'm closing the issue as accepted. As promised I created first draft of SIG etcd charter that should solidify etcd interest. Opening it to community for feedback. |
Also linking the Etcd usage survey. |
I need to add this, closed or not: Currently, as a graduated project, Etcd is entitled to certain things from the CNCF, particularly around marketing and Kubecon. For example, we're entitled to a kiosk in the project pavillion if we want it. As a Kubernetes SIG, the resources we're entitled to are less; for the example, SIGs do NOT get kiosks. This is probably an acceptable sacrifice, but we want to be aware of it before we make the switch. I bring this up becuase just today we were taking advantage of some of the benefits we will lose. |
To get better understanding what we would like etcd SIG-ification to look like please read the Do's and don'ts of etcd SIG-ifycation |
Link to the next: |
What would you like to be added?
I would like etcd to formally be a part of Kubernetes leadership/team structure (i.e. as a SIG, or Special Interest Group), since it is a hard dependency on Kubernetes and is deeply coupled with the Kubernetes codebase.
I further propose that Benjamin Wang (@ahrtr) and Marek Siarkowicz (@serathius) assume SIG positions as both TLs/Chairs (they would each assume both the TL and Chair positions). I further propose that Benjamin and Marek find additional contributors that can eventually assume the chair position (since SIGs have been moving to a formalized separation of roles).
For more information on why this may be a good idea, please refer to: The Case for SIG-ifying etcd.
Why is this needed?
Etcd is a hard dependency on Kubernetes. Therefore, it makes sense for the two communities to be somewhat entwined, so that we can adhere to the implicit Kubernetes-etcd contract.
The text was updated successfully, but these errors were encountered: