-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDK] Get Kubernetes Events for Job #1975
[SDK] Get Kubernetes Events for Job #1975
Conversation
@andreyvelich: GitHub didn't allow me to assign the following users: droctothorpe, deepanker13. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 7491318987
💛 - Coveralls |
events = self.core_api.list_namespaced_event(namespace=namespace) | ||
|
||
# Get events for the Job and Job's pods. | ||
for event in events.items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we distinguish between job and pod events? Should we add a prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, users can use names to get events for Job or Pods.
E.g. for PyTorchJob with name train-mnist
and 2 workers:
{"train-mnist": "Events", "train-mnist-worker-0": "Events", "train-mnist-worker-1": "Events"}
Users can understand the all pod names by running TrainingClient().get_pod_names()
Do you think we should add the prefix that identify type of the object ?
e.g.
{"pytorchjob-train-mnist": "Events", "pod-train-mnist-worker-0": "Events", "pod-train-mnist-worker-1": "Events"}
and pod.status.phase != constants.POD_PHASE_PENDING | ||
): | ||
log_streams.append( | ||
watch.Watch().stream( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! Didn't know that the Python client supported this.
@@ -861,27 +924,58 @@ def get_job_logs( | |||
break | |||
|
|||
# Print logs to the StdOut | |||
print(f"[Pod {pods[index]}]: {logline}") | |||
print(f"[Pod {pods[index].metadata.name}]: {logline}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on colorizing by colorizing by pod name? Probably overkill / can be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droctothorpe Can you elaborate here ? What are your ideas on printing logs for pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of output from https://github.com/stern/stern.
Each pod gets a distinct color, which helps parse a giant wall of interleaved logs. Happy to tackle it in a discrete PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for sharing.
Can you create an issue to discuss this ?
We might need to discuss how various output will produce the results while running this (e.g. Jupyter Notebook, VSCode, local terminal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e35a25b
to
54b6269
Compare
@droctothorpe @johnugeorge I added object kind to the event message for now, so it is easier to parse data from the event messaged:
|
/lgtm |
Fixes: #1863.
This will allow users to get Kubernetes Events for Job and Job's pods via
get_job_logs
API andverbose
parameter.Initially, I didn't watch for the events similar to logs with
follow=True
parameter. This can be done later in the following PRs.The events will be returned in this format:
I introduce a new
get_job_pods
API to get Job's pods data.Also, I removed support for Python3 from our SDK. I believe, various Python
typing
capabilities wasn't there in Python 3./assign @droctothorpe @deepanker13 @johnugeorge @tenzen-y