Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0172] Mergebot #172

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions rfcs/0172-mergebot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
**Feature:** Enhance package meta fields to dictate mergebot behavior
**Start Date:** `[Today's Date in YYYY-MM-DD format]`
**Author:** Lassulus
**Co-authors:** Scriptkiddi (additional co-author to be determined)
**Shepherd Team:** `[To be nominated and accepted by the RFC Steering Committee]`
**Shepherd Leader:** `[To be appointed by the RFC Steering Committee]`
Comment on lines +5 to +6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Shepherd Team:** `[To be nominated and accepted by the RFC Steering Committee]`
**Shepherd Leader:** `[To be appointed by the RFC Steering Committee]`
**Shepherd Team:** 0x4A6F, Mic92, JulienMalka, infinisil
**Shepherd Leader:** Mic92

Thank you for your nominations!

**Related Issues:** `[Links to implementation PRs will be added here]`

## Summary

This RFC proposes the introduction of four new merge strategies for `nixpkgs`, aiming to provide more nuanced merge permissions.
One of the main goals of this RFC is to extend the capabilities of the mergebot beyond PRs of just r-ryantm.
These strategies facilitate automatic updates and backports from trusted sources and allow maintainers the choice between requiring full consensus or just a single approval for merges.

## Motivation

![](https://pad.lassul.us/uploads/f24e0ff3-117a-4167-8f60-79ca508b8c1c.png)

The current `nixpkgs` contribution model relies heavily on a group of *committers* (approximately 200 at the time of writing), who possess exclusive rights to merge PRs.
This model has remained unchanged since its inception and poses limitations on community contributions and maintainer workload.
Implementing granular merge permissions will enable a broader contribution base and lessen the burden on existing committers by empowering package maintainers with more control over their respective packages.

## Detailed Design

The current design of the mergebot is limited in functionality. To tackle this issue we propose to implement per package strategies. These would be set inside a meta attribute in the package.

We propose augmenting package meta fields with a new attribute set that specifies a list of merge strategies.
The mergebot will execute all listed strategies in parallel and will proceed with the merge upon one successful strategy, contingent on passing ofborg checks as well.

![](https://pad.lassul.us/uploads/965545d8-8575-41a8-9746-6bcc66f8bb0e.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like for this section/diagram to discuss/handle the situation of the PR contents changing, e.g. if the owner of the PR force-pushes.

Currently the diagram is written under the assumption that the PR contents don't change at all after PR is created.

  • If there's no way that this can happen, e.g. because all PRs must be made by r-ryantm, then that assumption should at least be stated explicitly (note also that might change in the future, maybe r-ryantm gets a feature to update PRs automatically to the latest upstream version).
  • But even better would be if there was explicit transitions in the diagram for what happens if the PR contents change.
  • It should also be discussed somehwere whether it's even technically possible to notice PR content changs in a race-free fashion.

In general, the potential threat should be acknowledged that the merge bot makes a decision on a PR state, and then a malicious PR author force-pushes a malicious change to trick the bot to merge something else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation takes the current revision of the PR at the point in time when the merge is triggered.

So for example: someone creates a PR, someones triggers the merge action, then after the merge action is triggered but before the merge is done, the author force pushes to the PR again with some malicious changes. In that case the merge would fail and buildbot would report that (github has this feature, that you can specify a revision in a merge action)


### Proposed `mergeBot` Attribute Structure

```nix
meta = {
mergeBot.strategies = ["automerge-updates", "full_consensus", "..."];
}
```
### Merge Strategies
Currently supported strategies by nixpkgs-merge-bot as of the time of writing:
- **maintainer_update:** Currently, this strategy allows package maintainers to merge updates from r-ryantm, triggered by mention of the bot.

Proposed strategies are:

- **automerge_updates:** Merges PRs from r-ryantm automatically if ofBorg checks pass, triggered by PR creation or mention. We strongly recommend to have at least one passthrough test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to fail the automerge_updates strategy by design if no passthrough tests exist? In which case would automerge + no passthrough test be ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something we can decide by a case by case basis? like if there is no passthrough test defined, why was the strategy enabled for this package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like if there is no passthrough test defined, why was the strategy enabled for this package?

That's exactly my point - is there any case where this would be ok? If not, then making that strategy fail by design if no passthrough test is defined seems like a good choice to avoid this kind of mistake happening.

Given that this only ever automerges r-ryantm updates, I'm ok with the current design, but I think being stricter here helps avoid mistakes and gives an incentive for package maintainers to make Nixpkgs more automated by contributing passthrough tests (this is a good thing IMO).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are certain packages where a delayed update would be more of a liability than an automerge. Like maybe the package is just some icon theme where there really is no testing needed?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the tests of the package itself are exhaustive enough too. So I think it makes sense to not require anything extra per say.

- **automerge_backport:** Automatically merges backport PRs if ofBorg checks pass, triggered by PR creation or mention.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think backports could often use an extra pair of eyes. However, they're also often pretty neglected...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should still follow the criteria for what is acceptable to backport. I think that requires a person to check.

https://github.com/NixOS/nixpkgs/blob/master/CONTRIBUTING.md#changes-acceptable-for-releases

Copy link
Member Author

@Lassulus Lassulus Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for certain packages it seems to be fine? like security relevant leave packages could always auto backport

- **full_consensus:** Requires approval from all maintainers before merging. The bot will notify which maintainers' approvals are pending, triggered by PR approval.
- **single_maintainer:** Allows a single maintainer's approval to trigger a merge, facilitated by a mention of the bot.
Comment on lines +41 to +48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be specified more clearly. From just these it sounds like only maintainer_update and automerge_updates are limited to @r-ryantm as author, while all the others don't restrict the author.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the idea is to not restrict the author for the other strategies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that is very easy to miss, you have to read between the lines to even notice it. I don't think it should be done like this and would definitely need to be justified. Honestly this RFC should primarily be just about the lifting of that restriction, because it's the only truly controversial part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay, I thought that was pretty clear. But yeah, that's the part worth discussing the most.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added now a sentence to the summary to make the goal clearer 4727de4 (#172). Is that good enough? or should be emphasize it more?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to feel about this. Wouldn't single_maintainer make it incredibly easy to introduce less-than-desirable code into that maintainer's packages? Currently, committer access is very hard to obtain and requires a good track record that inspires trust (and rightfully so).

Comparatively, to become a maintainer, you have to "just" get a package accepted into nixpkgs. Once that's done, what's to stop anyone from slipping in a bad change unnoticed, due to the huge number of PRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the maintainer has to introduce that code into his own package. So not sure what the attack vector would be here. like the maintainer probably maintains that package alone (otherwise there would be probably a different merge strategy) and probably introduced it to nixpkgs, so slipping in code is possible through other a bit more obfuscated means. In my opinion the attack vector from outdated packages is far bigger than the ones from maintainers backdooring the packages they maintain.

Copy link

@Naxdy Naxdy May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worst case scenario I could think of from the top off my head is someone submits a package for a new crypto wallet, then once it's adopted by a number of users, "secretly" (in a way that no one notices, anyway) introduces code that exposes private keys or siphons off funds. Something similar has happened before in the snap store, so it's not a very far-fetched scenario.

Copy link
Member

@Mic92 Mic92 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the requirements to become a package maintainer as well. They will probably also need to prove some track record. But their scope will be also more limited than full committers, so they can actual less damage than us having to give out commit rights.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we currently have no policy in place to prevent people from adding themselves to maintainer entries, even if they make no other meaningful changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This policies we need is what this RFC is about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the requirements to become a package maintainer as well. They will probably also need to prove some track record.

I think such "vetted maintainers" should be in addition to the type of maintainers we have now, not in replacement.

It should stay possible to be a random driveby contributor, as those are also very useful


### Limitation to `pkgs/by-name` Directory

To ensure accuracy in processing PRs, only those within the `pkgs/by-name` directory will be considered, due to the challenge of verifying if a PR solely affects the intended files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be stated somewhere:

Does the nix language guarantee that a limitation to a certain file (or subtree) prevents me from overriding packages somewhere else in the evaluated package attribute tree?

Or could a malicious users pull tricks to write magic trickery ala

{ pkgs, lib, ...}:

(some Nix builtin that allows injection into the final global attrset here, or some nixpkgs magic attributes that nixpkgs interprets)
(mkDerivation
  # ...
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik pkgs/by-name limits exactly that. so there are no side effects on other packages.


### Future extensions

For now we propose that new merge strategies or significant changes to current ones go through their own RFCs. The mergebot team is deciding what is deemed significat in that case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add that even minor changes of interpretations of the existing strategies should give some amount of notice (repository issues/thread on the dominant discussion platform) before going live?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably new releases on the mergebot repository would be a good idea for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention Discourse because it reduces the need to consciously subscribe to yet another thing elsewhere to get notifications.

Another thing is — I think a tracking issue+heads-up thread when the development is starting is better because you can promise to give ten days for feedback with less slowdown (development and testing and preparations for deployment are not just an instant).

Also if a change is not universally perceived as small, when the discussion is known to have time to happen we get a better chance to avoid high-tension discussions.


## Examples and Interactions

1. **Single Maintainer Mode:** A maintainer of the `nano` package, set with `strategies = ["single_maintainer"]`, can initiate a merge by commenting with a specific bot command. The bot merges the PR if the ofBorg checks successed.

2. **AutoMerge Mode:** An update PR for `ttyplot` created by r-ryantm, with `meta.mergeBot.strategies` set to `[ "automerge_updates" ]`, gets automatically merged after successful ofBorg checks.

## Drawbacks

- **Security Risks:** There's a potential for malicious actors to manipulate the merge bot to authorize merges improperly. A mitigation for these pull requests could be running validity checks (e.g file size limits, valid nix files, ..)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this risk should require more careful consideration of how merges get done from authors other than r-ryantm. My initial proposal is:

  • Add some kind of "hold" period (at least 24h) where the mergebot keeps the PR in "will merge" state to let other people look at the changes before they get pushed.
  • Make it easy for people to browse through PRs in this state.
  • In case potentially malicious changes are detected (or in case any other action is needed to stop the bot from merging), closing a PR, marking it as draft or something similar should make the bot forget about the "will merge" state.

The main concern is "this will make merges take longer!". My view is that the mergebot helps us increase throughput of changes merged, which also has the effect of decreasing latency (they're somewhat connected), and it's ok to have the latency at a minimum of 24h (due to the hold period) because the throughput is still increased and changes are still automatically getting merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can create a label which the bot tags the PR with and then would do a merge after the holding period. But I think that also conflicts with other peoples workflow. Maybe this is something we want to enable on a per package basis? but not sure how to expose the interface to configure something like that without making it complex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on how this conflicts with existing workflows? From my perspective, this mergebot only creates entirely new workflows (non-committers being able to merge changes), so there's no way this can create a conflict.

If a committer wants to get the change merged straight away without waiting for the holding period, they can just merge themselves without the help of the bot, which is the same as the current workflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't mean existing workflows, but what people told me they would expect by the mergebot. So it wouldn't disrupt current operations but would be a point where opinions will differ :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security will almost always be a trade-off with convenience. I think adding this holding period is a good approach to mitigate the risk that the RFC points out, while not being too complicated to implement. The suggestion of file size limits and valid nix files doesn't seem to be as flexible and cover as many cases. I'd like to get more opinions on this, hoping more people end up commenting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, depending on how the bot is manipulated the attacker could just merge the PR directly? So it would be better to have a review log of all the actions the bot has taken instead of relying on a waiting period which could have no effect in an attack?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what attack vector you have in mind, and what bot you're talking about (mergebot or r-ryantm)? I think if there's some case where the mergebot can be manipulated to merge the PR directly when it shouldn't, it should definitely be addressed!

I had the following assumption during all my other comments: r-ryantm's PRs are "safe". There are ways that code upstream can be malicious, but this is a problem outside r-ryantm. r-ryantm itself introducing malicious changes that don't come from upstream is possible, but not something this RFC should worry about.

As for other people introducing malicious changes:

  • Currently there's no way those changes get merged automatically or if a maintainer approves them, only when a committer merges them.
    • Since there is an existing process to vet new committers, I see this as the baseline for vetting new changes that get merged into Nixpkgs.
  • With the new strategies, those changes may be merged by maintainers who aren't committers. This is the issue called out in the RFC.
    • So we should find a way to vet those changes. My opinion is that the current proposal (checking file sizes and valid nix files) is not enough vetting: very inflexible, doesn't cover as many cases.
      • A holding period of at least 24h ensures that people can vet those changes in a way roughly similar to how new committers are vetted, thus roughly maintaining the baseline.

So it would be better to have a review log of all the actions the bot has taken

I don't think that is better, but a complementary approach, since one concerns changes before they're merged and the other concerns them after they're merged, and the actual merging is what's the issue here.

instead of relying on a waiting period which could have no effect in an attack

I think most people would agree that delaying a possible attack by at least 24h seems like a pretty good mitigation. Unless you have something to back your claim, I think it is weak.


Having said all of that, the strongest argument that I can see against the holding period is this:
I think in practice it's unlikely that a recently-committed malicious change will be used right away, since it usually takes at least some hours for the change to propagate to a channel that most people will be using. Therefore this would already serve as a "holding period". However...

In case something malicious gets committed, I see two approaches to "fix" it:

  • Rewrite git history to remove the commit completely: better solution to ensure the malicious change doesn't propagate further and is never stored anywhere in other people's machines. But practically impossible to carry this out.
  • Revert the commit: less preferable solution, since the malicious changes are still in git history.

Given this, I'd say anything we can do to prevent this from even happening is good.
Not to mention that relying on hydra taking some time to propagate changes to a channel is definitely a very weak mitigation to a potential security issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against the holding period, it just doesn't fix the security issue addressed in this part of the RFC. What you describe is another issue, that of maintainers merging malicious changes with this bot. What the issue describes is, for example, someone faking the PR in a way that the bot thinks it's a r-ryantm PR (with only pkgs/by-name changes) and merging it therefore directly.

If maintainers merge malicious code, a 24h period could help against that, but then someone would have to check those PRs in the 24h period. The impact also shouldn't be that in big in that case. since the strategy has to be explicitly enabled for a package and if we see malicious action, we either would disable that strategy again on that package, remove the malicious actor from maintainers (and probably all of nixpkgs?) or both. Also we probably would enable maintainer strategies only on low impact leaf packages.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've scraped multiple attempts at a comment on this because none actually described what feels off about holding period.

but then someone would have to check those PRs in the 24h period

I think this is the crux of the issue. If 99% PRs are totally benign and get merged with maintainer approval, that already means people are going to learn to ignore those, because there's nothing to see, really.

In case of malicious maintainer, they'll have to make it look benign, possibly collude with upstream, at which point a curious person actively has to go look in the package changelog within the holding period. Even that is sure enough to fail because of the sheer number of times I've seen commit message with 'fix'.

In short, I do not think a holding period is going to stop a malicious maintainer. It will be de-facto annoyance to the rest of the folks who are just waiting for the period to be over.

What the mergebot is intended to do in this RFC, IMO does not get any help from a holding period, whether 1h or 1w.

Just my 2c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against the holding period, it just doesn't fix the security issue addressed in this part of the RFC. [...] What the issue describes is, for example, someone faking the PR in a way that the bot thinks it's a r-ryantm PR (with only pkgs/by-name changes) and merging it therefore directly.

Yes, my interpretation of that part of the RFC wasn't the interpretation you intended. I think calling out the issue you describe doesn't make sense. It's essentially saying "there could be a bug in the code". Obviously the code will try to be as robust as possible :)

Regarding the holding period: I still think it has value, but my motivation to argue for it is gone. My last breath: everyone who will benefit from the non-r-ryantm merging strategies already waits a long time in their PRs. I don't have the data (and would love if someone else had), but from anecdotes all over, it already takes over 24h for people's PRs to get merged (which is what I argued before - reducing the >24h wait to ~24h will still make it fast enough and increase throughput). This bot isn't replacing committers, it's increasing merging throughput. Saying that this will be an annoyance given that this won't make any existing PR workflow slower is silly.

- **Reliability Concerns:** A malfunction in the merge bot could disrupt numerous contributors' workflows.
- **Code Quality** As no review is done by seasoned commiters, the code quality could drop. We want to face that by holding monthly meetings and review the merged PRs to see if the code quality drops and improving on the bot in that case.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some public audit log could be useful for this purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use the github api to show every invocation of the mergebot. Or maybe we should collect all of them in the mergebot directly and show them on a webpage?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Showing them on a webpage or so could be handy, I think. It makes it easy to quickly review some of the actions taken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very useful to do that in a way that it does not query the Github API live, but stores the audit log in some way independent of the nixpkgs repo.

In the xz backdoor situation, Github took down the xz repo, which had the effect that people were hindered investigating the compromise.

If the merge bot is abused, it would be great to be able to have those logs even if the Github live API is not available for this (just saving the audit log somewhere else in some reasonably frequent interval would be enough to do that).


## Alternatives

- **Maintain Current Model:** This approach may not be sustainable long-term due to scalability issues.
- **External Tools:** Utilize third-party GitHub actions or apps, like Mergify, and adapt them to our needs.

## Prior Art

https://github.com/NixOS/rfcs/pull/50
https://github.com/NixOS/rfcs/pull/128

## Unresolved Questions

*To be completed*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a rev-dep maintainer thinks that the dep's mergebot policy is too lax — do we recommend having a pin-like copy with a different policy, or tightening the policy for the main package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest tightening it for the main package? like all people depending on a package should be fine with the strategy enabled? and duplicating a package seems like overhead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is probably the best trade-off, at least in the beginning, but then it needs to be spelled out in the written policy, as it basically says potentially many people have the right to unilaterally deprive a maintainer of a useful tool.


## Future Work

- **Strategy Expansion:** Developing additional merge strategies and drafting RFCs to introduce them.
- **Enhanced Documentation:** Creating comprehensive documentation for the new merge strategies.