-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend for getting logs of a trial #2039
Conversation
b6dacbd
to
b8654eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing this @d-gol!
I left few comments.
Hey @andreyvelich, thanks a lot for checking! I will modify the PR according to your suggestions. I left some comments for clarification. |
bc8200e
to
4b3ecd1
Compare
Thank you @tenzen-y and @andreyvelich, rebased it now. |
Thanks Dejan /lgtm |
pkg/new-ui/v1beta1/backend.go
Outdated
trialName := r.URL.Query()["trialName"][0] | ||
namespace := r.URL.Query()["namespace"][0] | ||
|
||
user, err := IsAuthorized(consts.ActionTypeGet, namespace, consts.PluralTrial, "", trialName, trialsv1beta1.SchemeGroupVersion, k.katibClient.GetClient(), r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it enough to check if user can get the Trial ?
Should we also verify that user can view logs from pods @d-gol @kimwnasptd @apo-ger ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Added authorization checks for listing the pods and getting the logs. Had to reorganize the code a bit to fit in the additional checks, but the logic is the same. Adding as a separate commit, in the end we can squash the commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-gol Thanks for driving this!
/lgtm
/assign @andreyvelich
ae7ba7c
to
365154b
Compare
5e80c4e
to
4afe976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-gol @andreyvelich apologies for the late reply. Mostly have some questions around the return type and the structure of the response data.
trialName := trialNames[0] | ||
namespace := namespaces[0] | ||
|
||
user, err := IsAuthorized(consts.ActionTypeGet, namespace, consts.PluralTrial, "", trialName, trialsv1beta1.SchemeGroupVersion, k.katibClient.GetClient(), r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly orthogonal to the PR, but I think the function signature user, err = IsAuthorized(...)
is not ideal. We end up getting the same information and do duplicate checks on errors depending on the user value.
Why not have a distinct function for getting the current user and do this check once? Then IsAuthorized(user, ...)
will only be responsible for the SubjectAccessReviews check.
Or at least for this PR, we could only check if the returned user is not "" only once, the first time we call this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we are checking twice the user. I think we can proceed fixing the auth after we merge this PR? Then we can have a separate PR to improve the authentication in the entire file. Or we can do the other way around?
if err != nil { | ||
log.Printf("Marshal logs failed: %v", err) | ||
http.Error(w, err.Error(), http.StatusInternalServerError) | ||
return | ||
} | ||
if _, err = w.Write(response); err != nil { | ||
log.Printf("Write logs failed: %v", err) | ||
http.Error(w, err.Error(), http.StatusInternalServerError) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question 2: Would we expect in the future to return logs from other worker pods?
If that's the case I'd propose that the backend actually returns a JSON type response like
logs: {
master: "..."
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, if we want to change in the future. @johnugeorge @andreyvelich what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @kimwnasptd idea, let's add Primary Pod Label to the JSON response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @andreyvelich did you mean the result to be in the form:
logs: {
master: "..."
}
or something else?
We can have multiple primary pod labels, did you mean to also add them as key value pairs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master is bit confusing term. Eg: There can be job with just workers where worker0 acts as the master. If we really need to add pod info, pod name might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-gol It's not always "master" label for the pod that we get labels.
For example, for Argo we get pod with katib.kubeflow.org/model-training: true
label.
Maybe pod name to include in the response make sense as @johnugeorge mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with creating separate issue to track improvements for this API response (e.g. add trial name to the response).
So we can merge this PR and unblock UI team to start working on the UI changes to have this feature in the next release.
What do you think @d-gol @kimwnasptd @johnugeorge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. +1 to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to merge it, and later we can improve the API response with more information if needed. So again, to clarify, we want to merge this PR with a simple string response (current implementation)? Or in the form of json, like below?
{
"pod_name": logs
}
@d-gol We can merge this. Can you create an issue to track the Json response discussion ? Also, please do a rebase |
@andreyvelich great, thank you! |
@d-gol Can you try a failing e2e test locally? |
@johnugeorge sure, checking it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-gol Thanks for implementing this powerful feature!
LGTM
Although I wonder why our E2E failed.
@johnugeorge @andreyvelich @d-gol Same errors seem to occur for |
It seems that errors are caused by mxnet-mnist image. You can reproduce by |
ASAP, I will create a PR to fix this issue. |
Blocked by: #2070 |
@d-gol Can you rebase since we fixed CI? |
d15165e
to
55ba2ce
Compare
@tenzen-y done, thank you for fixing the CI! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-gol Thank you for the update!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: d-gol, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@tenzen-y great, thanks a lot! |
Implementation of the backend for fetching logs, as a part of #1764.
Providing a route
/katib/fetch_trial_logs/
to obtain logs for a specific trial.Logs are obtained from a master pod.