Backend for getting logs of a trial #2039

d-gol · 2022-11-25T13:01:00Z

Implementation of the backend for fetching logs, as a part of #1764.

Providing a route /katib/fetch_trial_logs/ to obtain logs for a specific trial.
Logs are obtained from a master pod.

pkg/new-ui/v1beta1/backend.go

andreyvelich

Thank you for implementing this @d-gol!
I left few comments.

pkg/new-ui/v1beta1/backend.go

d-gol · 2022-11-30T13:58:19Z

Thank you for implementing this @d-gol! I left few comments.

Hey @andreyvelich, thanks a lot for checking! I will modify the PR according to your suggestions. I left some comments for clarification.

tenzen-y · 2022-12-02T19:35:58Z

@d-gol Thanks for your effort.
We merged #2047 with the master branch to mutate the trial name label to pods. So, can you rebase this PR?

d-gol · 2022-12-03T15:27:02Z

Thank you @tenzen-y and @andreyvelich, rebased it now.

johnugeorge · 2022-12-05T17:48:01Z

Thanks Dejan

/lgtm

andreyvelich · 2022-12-05T17:53:18Z

pkg/new-ui/v1beta1/backend.go

+	trialName := r.URL.Query()["trialName"][0]
+	namespace := r.URL.Query()["namespace"][0]
+
+	user, err := IsAuthorized(consts.ActionTypeGet, namespace, consts.PluralTrial, "", trialName, trialsv1beta1.SchemeGroupVersion, k.katibClient.GetClient(), r)


Is it enough to check if user can get the Trial ?
Should we also verify that user can view logs from pods @d-gol @kimwnasptd @apo-ger ?

Yes, I agree. Added authorization checks for listing the pods and getting the logs. Had to reorganize the code a bit to fit in the additional checks, but the logic is the same. Adding as a separate commit, in the end we can squash the commits.

tenzen-y

@d-gol Thanks for driving this!

/lgtm
/assign @andreyvelich

pkg/new-ui/v1beta1/backend.go

manifests/v1beta1/components/ui/rbac.yaml

pkg/new-ui/v1beta1/backend.go

kimwnasptd

@d-gol @andreyvelich apologies for the late reply. Mostly have some questions around the return type and the structure of the response data.

kimwnasptd · 2022-12-14T10:17:48Z

pkg/new-ui/v1beta1/backend.go

+	trialName := trialNames[0]
+	namespace := namespaces[0]
+
+	user, err := IsAuthorized(consts.ActionTypeGet, namespace, consts.PluralTrial, "", trialName, trialsv1beta1.SchemeGroupVersion, k.katibClient.GetClient(), r)


Slightly orthogonal to the PR, but I think the function signature user, err = IsAuthorized(...) is not ideal. We end up getting the same information and do duplicate checks on errors depending on the user value.

Why not have a distinct function for getting the current user and do this check once? Then IsAuthorized(user, ...) will only be responsible for the SubjectAccessReviews check.

Or at least for this PR, we could only check if the returned user is not "" only once, the first time we call this function

Agree, we are checking twice the user. I think we can proceed fixing the auth after we merge this PR? Then we can have a separate PR to improve the authentication in the entire file. Or we can do the other way around?

pkg/new-ui/v1beta1/backend.go

kimwnasptd · 2022-12-14T10:33:48Z

pkg/new-ui/v1beta1/backend.go

+	if err != nil {
+		log.Printf("Marshal logs failed: %v", err)
+		http.Error(w, err.Error(), http.StatusInternalServerError)
+		return
+	}
+	if _, err = w.Write(response); err != nil {
+		log.Printf("Write logs failed: %v", err)
+		http.Error(w, err.Error(), http.StatusInternalServerError)
+		return
+	}


Question 2: Would we expect in the future to return logs from other worker pods?

If that's the case I'd propose that the backend actually returns a JSON type response like

logs: { master: "..." }

This is a good idea, if we want to change in the future. @johnugeorge @andreyvelich what do you think?

I like @kimwnasptd idea, let's add Primary Pod Label to the JSON response.

Hey @andreyvelich did you mean the result to be in the form:

logs: { master: "..." }

or something else?
We can have multiple primary pod labels, did you mean to also add them as key value pairs?

master is bit confusing term. Eg: There can be job with just workers where worker0 acts as the master. If we really need to add pod info, pod name might be better.

@d-gol It's not always "master" label for the pod that we get labels.
For example, for Argo we get pod with katib.kubeflow.org/model-training: true label.
Maybe pod name to include in the response make sense as @johnugeorge mentioned.

I am ok with creating separate issue to track improvements for this API response (e.g. add trial name to the response).
So we can merge this PR and unblock UI team to start working on the UI changes to have this feature in the next release.
What do you think @d-gol @kimwnasptd @johnugeorge ?

I agree. +1 to merge

I agree to merge it, and later we can improve the API response with more information if needed. So again, to clarify, we want to merge this PR with a simple string response (current implementation)? Or in the form of json, like below?

{ "pod_name": logs }

tenzen-y · 2022-12-14T16:31:07Z

@d-gol Can you rebase this since we have merged #2064 to fix CI into the master branch?

johnugeorge · 2022-12-23T13:53:48Z

@d-gol We can merge this. Can you create an issue to track the Json response discussion ? Also, please do a rebase

d-gol · 2022-12-23T14:47:23Z

@andreyvelich great, thank you!

johnugeorge · 2022-12-23T15:25:58Z

@d-gol Can you try a failing e2e test locally?

d-gol · 2022-12-23T15:45:13Z

@johnugeorge sure, checking it.

tenzen-y

@d-gol Thanks for implementing this powerful feature!

LGTM
Although I wonder why our E2E failed.

pkg/new-ui/v1beta1/backend.go

tenzen-y · 2022-12-23T16:46:50Z

@johnugeorge @andreyvelich @d-gol Same errors seem to occur for E2E Test with Katib UI, random search, and postgres / e2e and E2E Test with mxnet-mnist / e2e in #2060 and #2067.

tenzen-y · 2022-12-23T17:35:29Z

It seems that errors are caused by mxnet-mnist image. You can reproduce by ytenzen/mxnet-mnist:debug-error.

tenzen-y · 2022-12-23T17:36:37Z

ASAP, I will create a PR to fix this issue.

tenzen-y · 2022-12-23T18:06:34Z

Blocked by: #2070

tenzen-y · 2022-12-24T05:00:31Z

@d-gol Can you rebase since we fixed CI?

d-gol · 2022-12-24T12:47:24Z

@tenzen-y done, thank you for fixing the CI!

tenzen-y

@d-gol Thank you for the update!
/lgtm
/approve

google-oss-prow · 2022-12-24T12:56:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: d-gol, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

d-gol · 2022-12-24T13:55:38Z

@tenzen-y great, thanks a lot!
And thanks everyone for all the help with this @andreyvelich @johnugeorge @kimwnasptd @apo-ger

google-oss-prow bot requested review from sperlingxx and tenzen-y November 25, 2022 13:01

google-oss-prow bot added the size/L label Nov 25, 2022

d-gol force-pushed the ui-logs-backend branch from b6dacbd to b8654eb Compare November 25, 2022 13:04

johnugeorge reviewed Nov 25, 2022

View reviewed changes

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved

google-oss-prow bot requested a review from andreyvelich November 25, 2022 13:36

andreyvelich reviewed Nov 29, 2022

View reviewed changes

andreyvelich mentioned this pull request Dec 1, 2022

Add Trial Labels During Pod Mutation #2047

Merged

d-gol force-pushed the ui-logs-backend branch 2 times, most recently from bc8200e to 4b3ecd1 Compare December 3, 2022 14:00

google-oss-prow bot assigned johnugeorge Dec 5, 2022

google-oss-prow bot added the lgtm label Dec 5, 2022

andreyvelich reviewed Dec 5, 2022

View reviewed changes

tenzen-y reviewed Dec 5, 2022

View reviewed changes

google-oss-prow bot assigned andreyvelich and tenzen-y Dec 5, 2022

d-gol force-pushed the ui-logs-backend branch from ae7ba7c to 365154b Compare December 12, 2022 12:53

google-oss-prow bot removed the lgtm label Dec 12, 2022

andreyvelich reviewed Dec 12, 2022

View reviewed changes

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved

andreyvelich reviewed Dec 12, 2022

View reviewed changes

manifests/v1beta1/components/ui/rbac.yaml Show resolved Hide resolved

andreyvelich reviewed Dec 12, 2022

View reviewed changes

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved

d-gol force-pushed the ui-logs-backend branch from 5e80c4e to 4afe976 Compare December 12, 2022 18:30

apo-ger reviewed Dec 13, 2022

View reviewed changes

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved

kimwnasptd reviewed Dec 14, 2022

View reviewed changes

tenzen-y reviewed Dec 23, 2022

View reviewed changes

pkg/new-ui/v1beta1/backend.go Outdated Show resolved Hide resolved

google-oss-prow bot removed the lgtm label Dec 23, 2022

tenzen-y mentioned this pull request Dec 23, 2022

Pin the NumPy version with v1.23.5 in some images #2070

Merged

1 task

d-gol added 9 commits December 24, 2022 12:21

Backend for getting logs of a trial

64c5e64

Check Write return + use PrimaryPodLabels

e9b4daa

Add auth + use constants for labels + cleanup

77cbbb0

TODO comment for using controller-runtime client for logs

174acb8

Authorization for list pods and get logs, reduce RBAC

ef89c9b

Use corev1 for specifying resources, edit kf install RBAC

b934379

Check namespace and trialName from request

0c017ff

Remove auth checks for listing the pods

3d1460a

Use context.Background()

55ba2ce

d-gol force-pushed the ui-logs-backend branch from d15165e to 55ba2ce Compare December 24, 2022 11:22

tenzen-y approved these changes Dec 24, 2022

View reviewed changes

google-oss-prow bot added the lgtm label Dec 24, 2022

google-oss-prow bot added the approved label Dec 24, 2022

google-oss-prow bot merged commit c9dd1b4 into kubeflow:master Dec 24, 2022

andreyvelich mentioned this pull request Jan 6, 2023

Dedicated logs tab for Trials #1764

Closed

elenzio9 mentioned this pull request Jan 25, 2023

[kwa-trials-logs] Create the LOGS tab of Trial's details page in KWA #2101

Merged

Backend for getting logs of a trial #2039

Backend for getting logs of a trial #2039

Conversation

d-gol commented Nov 25, 2022

andreyvelich left a comment

Choose a reason for hiding this comment

d-gol commented Nov 30, 2022

tenzen-y commented Dec 2, 2022

d-gol commented Dec 3, 2022

johnugeorge commented Dec 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenzen-y left a comment

Choose a reason for hiding this comment

kimwnasptd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenzen-y commented Dec 14, 2022

johnugeorge commented Dec 23, 2022

d-gol commented Dec 23, 2022

johnugeorge commented Dec 23, 2022

d-gol commented Dec 23, 2022

tenzen-y left a comment

Choose a reason for hiding this comment

tenzen-y commented Dec 23, 2022

tenzen-y commented Dec 23, 2022

tenzen-y commented Dec 23, 2022

tenzen-y commented Dec 23, 2022

tenzen-y commented Dec 24, 2022

d-gol commented Dec 24, 2022

tenzen-y left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Dec 24, 2022

d-gol commented Dec 24, 2022