Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have prow inform child jobs about their parent #3966

Conversation

BenTheElder
Copy link
Member

Right now run_after_success jobs know nothing about their parent job, this PR allows child jobs to know the name and build ID of their parent job.

Why do we need this?

  • This will allow child jobs to find their parent job's output. We can use this do have one job do builds and the child jobs use the parent job's build.

@BenTheElder BenTheElder requested a review from spxtr as a code owner August 8, 2017 00:02
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 8, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @BenTheElder. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@BenTheElder
Copy link
Member Author

I'm not sure if this is the right solution but I think this is at least somewhat reasonable. Jenkins jobs may also need to be informed as well (?)

/cc @fejta @ixdy @pipejakob @spxtr

@BenTheElder
Copy link
Member Author

After this is worked out we should be able to update jobs to use PARENT_JOB_NAME and PARENT_JOB_BUILD_NUMBER to find the results of the parent job. This is slightly hacky but I haven't heard/found a better plan yet.

Please commence yak shaving but do not merge this yet.

@krzyzacy
Copy link
Member

krzyzacy commented Aug 8, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 8, 2017
Spec ProwJobSpec `json:"spec,omitempty"`
Status ProwJobStatus `json:"status,omitempty"`
ParentStatus ProwJobStatus `json:"parent_status,omitempty"`
ParentJob string `json:"parent_job,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API-wise it would be more correct to use OwnerReferences in the object metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want to support any form of ownership, and I specifically intend to inject those environment variables mimicking the variables supplied to the current job, but for the parent. I think we may also need to support this for Jenkins jobs

I don't currently have other use cases besides a build job being followed by test jobs, but I think you could pass anything you wanted through the output buckets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the flow is the build job pushes the artifacts in a bucket and plank injects the location of the bucket in the test jobs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an option, but I wanted to avoid restricting it to build jobs so the flow is:
Job writes to logs bucket alongside started.json, finished.json, etc with data for the child job to consume (say child_data.json or something), prow injects the parent job name / buildID into the child when it starts and the child goes to logs/$PARENT_JOB_NAME/$PARENT_BUILD_ID/ and uses child_data.json to do whatever (in this get the build location).

@fejta
Copy link
Contributor

fejta commented Aug 8, 2017

Will you clarify how and why this is necessary?

The output of the build job should be a function of the inputs to that job, which we can also send to the second job. Is this to avoid having the second job check out the repos and merge in the PRs to determine the hash?

@BenTheElder
Copy link
Member Author

BenTheElder commented Aug 8, 2017 via email

@fejta
Copy link
Contributor

fejta commented Aug 8, 2017

It sounds good, my only question/concern would be to ensure that it is really easier to do it this way than to just recalculate it in the other jobs :)

@ixdy
Copy link
Member

ixdy commented Aug 8, 2017

Is this to avoid having the second job check out the repos and merge in the PRs to determine the hash?

Each time you create a merge, the commit hash (and thus the "kubernetes version") will be different, since the merge was created at a different time.

If we use the kubernetes version in the GCS path (as we do with most of the jobs), we need to have some way to pass that between jobs. The bazel/kubeadm PR jobs are getting around this right now by using the full PULL_REFS value in the GCS path (e.g. gs://blah/bazel/master:commit,PR#:commit/) which I guess works, but is pretty ugly.

@ixdy
Copy link
Member

ixdy commented Aug 8, 2017

I guess a different approach would be to do something like this and explicitly set date/author/email on the merge commit in bootstrap so that the version is always the same for a given PULL_REFS. (I'm not sure what date we'd use - git does something similar to this for their PR merge branch, so maybe we could see what they do.)

@ixdy
Copy link
Member

ixdy commented Aug 8, 2017

e.g. what GitHub does:

$ git fetch upstream pull/3966/merge; git log --pretty=fuller FETCH_HEAD
From https://github.com/kubernetes/test-infra
 * branch              refs/pull/3966/merge -> FETCH_HEAD

commit 6835fd65cef523aefa9e4d8f2d3379fa07ff5b50
Merge: fa6daadb 7e65fedd
Author:     Benjamin Elder <[elided]>
AuthorDate: Tue Aug 8 12:08:22 2017
Commit:     GitHub <noreply@github.com>
CommitDate: Tue Aug 8 12:08:22 2017

    Merge 7e65fedd01fc3547bc5ee7039e68d55eab07d4c3 into fa6daadbd23f4bf46333c918e52805b3840d0d7e
...

I have no idea how it sets that date - maybe lazily, whenever someone requests a mergeability check?

@pipejakob
Copy link
Contributor

My biggest concern is that this is following the same pattern that is currently causing trouble for the bazel/kubeadm split: it's setting more variables but still expecting both halves of the pipe (parent and child job) to be able to construct the same consistent path using these variables to actually find the data. This is what caused problems every time we tried to change the path for the bazel artifacts, we're just moving the problem up a layer. You could argue that this path to the job's output is likely more stable than the bazel ci-artifacts path has been, but it still has the same inflexibility if we ever do want to change it.

Instead, if the general idea is for the child job to be able to find the parent job's output file, I would rather it be told explicitly where that file is (via a full URL) rather than requiring any knowledge about how the parent job might have used its name and build number to construct the gs:// path.

@0xmichalis
Copy link
Contributor

It's not strictly necessary but I wanted jobs to be able to put out a file along with finished.json, started.json and their logs meant to be consumed by child jobs with whatever data they wanted to pass (like the build bucket gs://...). I first mentioned it here: kubernetes/kubernetes#49884 (comment) I think @ixdy originally had the idea for plumbing it that way.

Or make plank pass the name of the parent as an ENV in the child job?

Instead, if the general idea is for the child job to be able to find the parent job's output file, I would rather it be told explicitly where that file is (via a full URL)

+1

@BenTheElder
Copy link
Member Author

BenTheElder commented Aug 8, 2017 via email

@BenTheElder
Copy link
Member Author

BenTheElder commented Aug 8, 2017

Or make plank pass the name of the parent as an ENV in the child job?

That's what this does (?)

If the child job env has the parent job name and buildID in ENV it can fetch a file from the parent's logs bucket.

I definitely do not want child jobs magically constructing URLs but bootstrap already constructs the logs dir URL and we can just generate the parent's bucket and get whatever we need passed in from there. There are other ways we could do this but I think this way has lower coupling in prow at least.

@0xmichalis
Copy link
Contributor

0xmichalis commented Aug 8, 2017

That's what this does (?)

Sorry, incomplete thought. I wanted to suggest doing this by default w/o the api change.

@BenTheElder
Copy link
Member Author

BenTheElder commented Aug 8, 2017 via email

@BenTheElder BenTheElder changed the title Have prow inform child jobs about their parent [WIP] Have prow inform child jobs about their parent Aug 8, 2017
@BenTheElder
Copy link
Member Author

@pipejakob further discussion with @ixdy brings up the idea of specifically using a file in kubernetes-jenkins/logs/$PARENT_JOB_NAME/$PARENT_BUILD_NUMBER/artifacts/ to log the full build path. I am working on a dependent PR to produce these.

@pipejakob
Copy link
Contributor

I'll reiterate my main objection to this approach, and I apologize if there's a disconnect in our wordings or mental models about what problem this is solving. Here are the goals as I see them:

  1. The parent job is outputting important information to a file with a specific path. The information itself may contain more paths, but here I'm referring to the path of the output file itself. The parent job must know this path, or it wouldn't be able to store this information.
  2. The child job wants this important information (so it can read the file and find even more important information from it), so it needs to know the exact path where the file was stored.

In order to solve these goals, the workflow proposed as I understand it is:

  1. Have the parent job output important information to a file in a specific path.
  2. Give the child job attributes about the parent job (name and build id).
  3. Have the child job determine the specific path it should use, given these attributes.

I'm curious if (and why) there's any opposition to just telling the child job exactly where this path is directly. It doesn't need to know the parent job's name or build id, all it actually cares about is this path.

prow injects the parent job name / buildID into the child when it starts and the child goes to logs/$PARENT_JOB_NAME/$PARENT_BUILD_ID/ and uses child_data.json to do whatever (in this get the build location).

I think this is just repeating the exact pattern and problem we had before: the bazel job (the parent) was outputting artifacts into a specific path, the child job (kubeadm e2e) needed to use the same path, and rather than being told the exact path to use, both halves tried to keep in sync by independently constructing the path based on shared attributes (the build SemVer, or the PULL_REFS, etc.). This solution is functionally the same, just using different attributes. Once you're playing the game of fmt.Sprintf("%s/%s/%s", some, random, attributes) then it becomes really hard to ever change the format of that path without breaking everyone who depends on it. The path structure becomes a contract that's impossible to evolve without convoluted migrations or breaking jobs.

If the contract is "I'll just tell you exactly where to find the file," then our path structure is infinitely flexible and requires no coordination between parent/child jobs to update, if we find it necessary to change gcs buckets, add more parameters to the path structure, etc.

@BenTheElder
Copy link
Member Author

I agree with the caveat though that I don't think that path structure should be baked into prow at all and I'm not sure of a cleaner solution that avoids this. With the log paths we're using a structure that doesn't depend on pull refs etc so providing the parent job name / id was sort of a middle ground for simple paths but without prow knowing about them.

@BenTheElder
Copy link
Member Author

What if we do:

  • Prow configuration sets a base path for a bucket for passing data between jobs
  • Prow gives each job a UUID subdir and informs each job (through ENV) of it's dir and the parent's dir

So you'd have something like:
PARENT_METADATA_PATH="gs://kubernetes-jenkins/metadata/<Parent UUID>/" METADATA_PATH="gs://kubernetes-jenkins/metadata<Current UUID>/"

@pipejakob
Copy link
Contributor

Oh, one quick but important clarification: I didn't mean to prescribe that I think prow should be populating this environment variable, especially if/since it has no context about these paths. If step 1 was that prow set these variables and then step 2 was that bootstrap.py or some other layer in the stack used them to figure out the exact path to be used and put that in another environment variable, then that's fine. As a job owner, I just don't want it to be my responsibility to construct that path myself (either in the job runner, the scenario, or kubetest, etc.). Our current solution with the bazel and kubeadm e2e jobs fell apart because the bazel job was owned by EngProd and I was owning the kubeadm e2es, and we both had to keep our implementations of this path-construction logic in sync in different places. If all of the test-infra glue between the jobs handles these paths, and the kubeadm e2e is ultimately told "here is the path to use," then I don't actually care which parts of that glue made that magic happen.

When we've been saying "the parent job does this" and "the child job does that," those were likely very overloaded in our minds to mean different things, and we should start being a lot more specific about the exact responsibilities and proposed changes to the different layers of the test stack.

@pipejakob
Copy link
Contributor

pipejakob commented Aug 9, 2017

Prow configuration sets a base path for a bucket for passing data between jobs
Prow gives each job a UUID subdir and informs each job (through ENV) of it's dir and the parent's dir
So you'd have something like:
PARENT_METADATA_PATH="gs://kubernetes-jenkins/metadata/<Parent UUID>/" METADATA_PATH="gs://kubernetes-jenkins/metadata<Current UUID>/"

This sounds great to me. Slight caveat that I'd expect them to be paths to a specific file and not just a directory.

@BenTheElder
Copy link
Member Author

Closed in favor of #4047

@BenTheElder BenTheElder deleted the run_after_success_parent_info branch August 10, 2017 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants