Skip to content
This repository has been archived by the owner on Mar 9, 2020. It is now read-only.

TODO list to make this happen #1

Closed
joyeecheung opened this issue Jun 6, 2018 · 36 comments
Closed

TODO list to make this happen #1

joyeecheung opened this issue Jun 6, 2018 · 36 comments

Comments

@joyeecheung
Copy link
Member

joyeecheung commented Jun 6, 2018

Off the top of my head:

  • Something that can land the PR automatically: git-node land
  • A protocol for squashing commits
    • Option 1: Use the --autosquash feature of git and ask everybody to use --fixup or --squash when they create commits that should be squashed
    • Option 2: Just squash everything in the same PR. AFAIK the commit queue of V8 does this. I like it because it's simple and we should enforce that commits that can be submitted as separate PRs must be, then it'll be easier to backport PRs since we no longer mix semver-major changes in refactor/clean-ups. And this makes it easier to review PRs, revert commits, etc.
  • Master/worker server applications to run the landing tool -> the queue. I think it's pretty obvious that we will implement them in Node.js?
  • A bot or a Jenkins CI job for triggering the landing process
  • Modification to git-node to store more metadata tracking PRs and queued jobs

For reference:

https://dev.chromium.org/developers/testing/commit-queue
https://dev.chromium.org/developers/testing/commit-queue/design

Added:

  • Estimate the workload by looking into the velocity of the project
  • Figure out how to isolate the master/worker and orchestrate and where they should be deployed
@maclover7
Copy link

Master/worker server applications to run the landing tool -> the queue. I think it's pretty obvious that we will implement them in Node.js?

We might be able to utilize the github-bot's machine for this, I believe it still has a bunch of capacity still, so we won't need new infrastructure donations

@joyeecheung
Copy link
Member Author

@maclover7 I'd be more comfortable if we use some sort of container on the machine and have something to do the orchestration. I am not sure how powerful the machine is so it may just be just a bunch of overhead to use orchestration, we should also try estimating the workload based on the velocity of the project.

@bnoordhuis
Copy link
Member

Option 2: Just squash everything in the same PR.

I'd be okay with this.

@benjamingr
Copy link
Member

Option 2 works, the only issue is that PRs by several authors would not count towards contribution of other authors.

@tniessen
Copy link
Member

tniessen commented Jun 6, 2018

I already hinted to @Trott and @joyeecheung that I am working on a prototype.

Design

The current design works like this:

  1. The user submits a task description to the CQ server:
    • Which PR to land.
    • Which tests to perform before landing (e.g. lite CI).
    • How to map PR commits to commits on the master branch.
  2. The CQ performs pre-defined checks to make sure that the PR can be landed.
  3. The CQ adds a new task with the given task description to the task queue.

The task queue is processed as a regular queue, so in FIFO mode. For each task:

  1. Produce the commits on a staging branch according to the commit mapping defined in the task description. (These are the final commits and will not be altered after this step!) This is a mixture of cherry-picking, squashing and editing of commit messages.
  2. The CQ then runs user-defined tests according to the task description.
  3. If all tests pass, update the master branch to the last commit produced by this task and push it to the upstream remote.

If one or more tests failed, the staging branch is reset to the "last known good revision", so the first commit that was not created as part of the task.

On commit mapping

In the current prototype, the commit mapping describes

  • an array of output commits (title, description, metadata such as Refs:)
  • a mapping from PR commits to output commits

The final commits are generated by iterating over the output commits and for each output commit, cherry-pick the input commits that are mapped to the respective output commit in their original order. Squashing and commit-message editing are performed at the same time, see implementation notes.

There should be reasonable defaults for the commit mapping property, e.g. "squash all" or "keep all".

This is the structure I am using to describe tasks:

  cq.post({
    pr: {
      repo: remoteRepo,
      number: 21075
    },
    tests: ['ci-lite'],
    commits: {
      mapping: [
        {
          from: 'b81729eb011900527b4c6121f4524a7d76dc1378',
          to: 0 // Index of output commit
        },
      ],
      commits: [
        {
          title: 'test: make handling of noWarnCode stricter',
          body: 'This change requires all expected warnings to be specified along with\n' +
            'their respective code and will raise an error if the code does not\n' +
            'match. This also kind of fixes the behavior when the expected warning\n' +
            'code was noWarnCode and there is an actual warning code.',
          metadata: [] // Refs, Fixes, etc.
        }
      ]
    }
  });

Implementation notes

This design allows some cool things:

  • Multiple tasks can be processed in parallel: The staging branch allows to push arbitrarily many commits to it from different PRs. As long as all tests for all PRs work, everything is great and the parallelism speeds up the process by a lot. Note that all commits on the staging branch are final, so the CQ won't introduce any problems when pushing them to master. If a test fails, the staging branch is reset to the last commit before that task, but not further, so previously submitted tasks can still finish. The failing task is removed from the queue and subsequent tasks are restarted. This minimizes the overhead of failures:

    Example:
    Tasks A, B, C are pushed to the queue.
    
    Task A is started. Commits 1, 2 are produced.
    Task queue: (A*, B, C)
    Staging branch: (→ master, 1, 2)
    
    Task B is started. Commits 3, 4 are produced:
    Task queue: (A*, B*, C)
    Staging branch: (→ master, 1, 2, 3, 4)
    
    Task C is started. Commit 5 is produced:
    Task queue: (A*, B*, C*)
    Staging branch: (→ master, 1, 2, 3, 4, 5)
    
    Note that the last known good revision differs between tasks.
    For task A, it is → master.
    For task B, it is commit 2.
    For task C, it is commit 4.
    
    Assume a test for task B failed. B is removed from the queue and the staging branch
    is reset to the last known good revision:
    Task queue: (A*, C)
    Staging branch: (→ master, 1, 2)
    
    Task C is restarted and produces commit 6:
    Task queue: (A*, C*)
    Staging branch: (→ master, 1, 2, 6)
    
    Task A finishes successfully and commits 1, 2 are pushed to master:
    Task queue: (C*)
    Staging branch: (→ master, 6)
    
    Task C finishes successfully and commit 6 is pushed to master:
    Task queue: Empty
    Staging branch: (→ master)
    
  • All Git operations are performed natively, without having to fork into git processes. The entire configuration can be done from within our software and we don't even need a working copy, a bare Git repository is enough. Using native Git commands also allows more efficient operations, such as changing the commit message and metadata while cherry-picking in a single operation (which - by the way - isn't even possible in bare Git repositories by default). I am in touch with one of the authors of nodegit and we are trying to fix some known problems to make this work as smoothly as possible.

I am trying to keep this design as modular as possible, e.g. checks (before starting the task) and tests (after starting the task) should be customizable. The same applies to processing metadata and formatting the commit message. It will definitely take me some time to even have a presentable prototype, but I will try.

@tniessen
Copy link
Member

tniessen commented Jun 6, 2018

Also, I think it would be cool to have both a web interface and a CLI.

@joyeecheung
Copy link
Member Author

Option 2 works, the only issue is that PRs by several authors would not count towards contribution of other authors.

@benjamingr If the commits can be submitted in separate PRs, then they should. If they can't, then since we require each commit must be able to pass the test, they have to be squashed anyway. Also we can use the Co-Authored-by field for attribution.

But personally I prefer to force everybody split commits and file separate PRs if possible, even if I sometimes don't want to do that myself when I am feeling lazy. It helps the backporters to minimize the changes between master and release branches as well, especially when it comes to refactoring/cleanup PRs.

@maclover7
Copy link

Just noting for posterity that there is prior work from the build WG on this issue: nodejs/build#705

@refack
Copy link

refack commented Jun 10, 2018

Just noting for posterity that there is prior work from the build WG on this issue: nodejs/build#705

And that was not the first attempt to...

I want to point out that information fragmentation is a huge problem with the org. I'm not sure why we need a new repo for this task. IMHO it only exacerbates the problem.

I move to close this repo and return the conversation to the build repo where it will at the least stay traceable if this task concludes or peters out.

@refack
Copy link

refack commented Jun 10, 2018

P.S. There is substantial work already done in the ghprb-plugin.
IMHO building on that will increase the chances that this effort will conclude, and not peter out like it had before.

@tniessen
Copy link
Member

We should probably come to an agreement before I or anyone else puts a lot of effort into this.

@joyeecheung
Copy link
Member Author

@refack I think with the current status of CI building the queue on top of successful CI runs would not be very viable for now? That was also what blocked the original issue.

I am fine with moving this back to build or continuing the discussion here, but IMO using a separate repo would reduce the noise when we are still figuring out the protocol e.g. regarding rebasing.

@maclover7 ‘s Jenkins job mentioned in that issue seems to be a good alternative as well, but we may need to be careful with conflicts. Do you still have that around?

@joyeecheung
Copy link
Member Author

@refack @maclover7 @tniessen Or, if you prefer, we can put this back on the build meeting agenda and look at the prior efforts and the current status of things before moving forward?

@joyeecheung
Copy link
Member Author

..or, since the agenda of the build meeting this week seems pretty packed, we can set up a doodle for a separate meeting, just do an overview of existing efforts, summarize the viable solutions so far, and make a concrete plan with assignees to make it happen?

@maclover7
Copy link

nodejs/build#1330

@refack
Copy link

refack commented Jun 11, 2018

(I'm using this as a meta issue ;)

Any way I see it a commit queue will require a test run, so the longest (and most complex) part of the work will be done on Jenkins, so ...

IMHO the fastest way to achieve a CQ is to use a hybrid solution. There are part of the process that are node specific (commit-mapping, meta-data, tagging, handling of PR), that could be harness independant (/CC @tniessen). The standard I/O (git checkout, CI test, git commit & rebase) could be done using Jenkins as a harness.

Some added benefit of building this on top of Jenkins is that there is a lot of prior art, and even some outsourcers that develop for Jenkins. For example I just found this missing piece, a "wait for user input" stage. It seems like this could be used to take a "commit mapping" declaration.

@Trott
Copy link
Member

Trott commented Sep 13, 2018

I'd like to get this moving. Any ideas/suggestions for how to do that? If I tried to set up a meeting for this, who would be interested in participating?

@joyeecheung
Copy link
Member Author

The last time we discussed about this in a build sub-meeting, we agreed the easiest way to get this working is to set up a Jenkins job that squash everything and land it, then explore the other more sophisticated options. minutes is in https://github.com/nodejs/build/blob/master/doc/meetings/project-meeting-2018-06-19.md

@joyeecheung
Copy link
Member Author

If I tried to set up a meeting for this, who would be interested in participating?

I am :)

@tniessen
Copy link
Member

Me too.

@orangemocha
Copy link

FYI, a Jenkins job to merge commits after testing, with meta-data tagging of each commit, had already been implemented: https://ci.nodejs.org/view/All/job/node-accept-pull-request/. There was to be some documentation in the wiki, but that seems to have been moved or deleted.

@Trott
Copy link
Member

Trott commented Sep 13, 2018

Wiki archive: https://github.com/nodejs/wiki-archive

@Trott
Copy link
Member

Trott commented Sep 13, 2018

@Trott
Copy link
Member

Trott commented Feb 27, 2020

I'd like to get this moving again.

  1. Who wants to take part?
    a. I'm hoping @joyeecheung and @tniessen want to work on this.
    b. I imagine @maclover7, @refack, and @orangemocha do not.
    c. I'm guessing @bnoordhuis and @benjamingr either don't want to be part of building the thing or don't have time to spare, but that's just a guess.
  2. I'm thinking that a way to get something valuable done and not have this stall again is to dial back the scope a bit.
    a. My thinking is let's focus on making this a GitHub Action that can be run when Jenkins is green/yellow. It can re-run some things as GitHub Actions to make sure tests still pass on the most popular platforms before merging, but it won't re-run the tests on Raspberry Pi devices, AIX, or SmartOS.
  3. Where should this effort live?
    a. I think we should close this repo and move this to the Build WG. Even if no one else besides me and Joyee from @nodejs/build get involved, that seems to be where this effort belongs. It will be more visible there too. People might notice it and join the effort, or at least provide feedback, ideas, etc.
  4. Should we have a (separate from Build WG) meeting about this to get it back on track?
    a. I vote yes and will happily set up a Doodle poll or whatever if others agree.

@bnoordhuis
Copy link
Member

I'm on the "not much time to spare" side of things. I'd really like to see this happen (I was thinking about it this week actually) and I can try to help out but regular meetings aren't an option for me right now.

Using GH Actions and moving it to the Build WG are both good ideas, IMO.

@joyeecheung
Copy link
Member Author

joyeecheung commented Feb 27, 2020

Thanks for picking this up again, @Trott . I think a GitHub action sounds like a good way to go now, considering the CI is in a relatively good shape these days. Moving it to the build WG & have a call about this sounds good as well. On the other hand the top of the list in the OP should already be addressed by nodejs/node-core-utils#366

@orangemocha
Copy link

Unfortunately I don't have time to work on this, but if someone is interested in revamping the old Jenkins job and has questions I'd be happy to answer what I can.

@cjihrig
Copy link

cjihrig commented Feb 27, 2020

I forgot that this repo exists until I woke up to notifications this morning. I wanted to add that I've been taking part in the GitHub maintainer feedback group, and have asked for a more official solution to a commit queue from GitHub. I have no idea if it will end up being implemented, or if the implementation will match our needs exactly, but it is at least on their radar.

cc: @bdougie

@mhdawson
Copy link
Member

@Trott, @sam-github is AIX still a problematic platform or is it something like the length of time the run takes? Just want to understand why it would be excluded from re-runs?

@Trott
Copy link
Member

Trott commented Feb 27, 2020

@Trott, @sam-github is AIX still a problematic platform or is it something like the length of time the run takes? Just want to understand why it would be excluded from re-runs?

If we're doing this as a GitHub Action and not as a Jenkins thing, AIX isn't available.

My idea is that a first (and maybe last) step would be to not bother to try to automate anything with Jenkins. So a manually-started passing Jenkins run is still a prerequisite. But the last-minute check-before-landing-that-the-ground-hasn't-shifted-under-us-in-the-master-branch to see that the tests are still passing...that would be only on platforms that GitHub Actions can easily automate. So, we'd skip AIX (and Raspberry Pi and SmartOS and a lot of others) in that last-minute step. We don't do the last-minute test on anything right now so doing it on only a few things as a smoke test is still an improvement.

@benjamingr
Copy link
Member

I am on the same page as @bnoordhuis

I'm on the "not much time to spare" side of things. I'd really like to see this happen (I was thinking about it this week actually) and I can try to help out but regular meetings aren't an option for me right now.

@mmarchini
Copy link

I'm interested in helping as well and I agree a mvp with Actions is likely the best approach. One thing to be aware though: Actions have limited scope when triggered in a pull request from a fork. Not sure if this is true only for pushes, or for any events. If it's only pushes, we could start with an action that parses comments from collaborators and merge based on that. Or trigger when a label is added.

@mhdawson
Copy link
Member

@Trott got it, just wanted to make sure there was not something we should look into.

@tniessen
Copy link
Member

I'm hoping @joyeecheung and @tniessen want to work on this.

@Trott I can certainly try to find some time to help with the github action approach.

@Trott
Copy link
Member

Trott commented Feb 29, 2020

@codebytere

@Trott
Copy link
Member

Trott commented Mar 4, 2020

Talked about this in Build WG meeting today and there was consensus to move this to the build repo for visibility. I'm going to close this, open an issue in the build repo, and also open an issue in the admin repo to archive this repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests