-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Support JSON format for file-metrics-collector
#1744
Comments
/kind feature |
Thank you for submitting this proposal @tenzen-y!
Do you mean it will be easier for user to create such Experiments ? E.g. they don't need to manually add regex to Metrics Collector spec. How we will change the parsing process in File Metrics Collector ? Will we have special regex for JSON that we are going to use automatically, when user selects What do you think @gaocegege @johnugeorge ? |
Sorry for the late reply. @andreyvelich
Yes, users do not need to specify regexp using
I think we can parse JSON files of the format shown in the proposal in the following script.
{"checkpoint_path": "", "global_step": "0", "loss": "0.22082142531871796", "timestamp": 1638422847.28721, "trial": "0"}
{"acc": "0.9349666833877563", "checkpoint_path": "", "global_step": "0", "timestamp": 1638422847.287801, "trial": "0"}
{"checkpoint_path": "", "global_step": "1", "loss": "0.1414974331855774", "timestamp": 1638422870.035161, "trial": "0"}
{"acc": "0.9586416482925415", "checkpoint_path": "", "global_step": "1", "timestamp": 1638422870.037459, "trial": "0"}
{"checkpoint_path": "", "global_step": "2", "loss": "0.10683439671993256", "timestamp": 1638422900.152162, "trial": "0"}
{"acc": "0.9688166379928589", "checkpoint_path": "", "global_step": "2", "timestamp": 1638422900.1529338, "trial": "0"}
{"checkpoint_path": "", "global_step": "3", "loss": "0.08619903773069382", "timestamp": 1638422927.9910269, "trial": "0"}
{"acc": "0.9747458100318909", "checkpoint_path": "", "global_step": "3", "timestamp": 1638422927.991675, "trial": "0"}
{"checkpoint_path": "", "global_step": "4", "loss": "0.07176543772220612", "timestamp": 1638422952.8473432, "trial": "0"}
{"acc": "0.9790566563606262", "checkpoint_path": "", "global_step": "4", "timestamp": 1638422952.848325, "trial": "0"}
package main
import (
"encoding/json"
"fmt"
"github.com/nxadm/tail"
)
func main() {
t, _ := tail.TailFile("./log.json", tail.Config{Follow: true})
metrics := []string{"loss"}
for line := range t.Lines {
logText := line.Text
var jsonObj map[string]interface{}
json.Unmarshal([]byte(logText), &jsonObj)
for _, metric := range metrics {
if _, exist := jsonObj[metric]; !exist {
continue
}
fmt.Printf("%v\n", jsonObj[metric])
}
}
} |
Sounds good. WDYT about this proposal @gaocegege @johnugeorge ? |
It makes sense. @andreyvelich |
I'll wait for feedback from the community until December 10 (UTC+9). |
@gaocegege @johnugeorge Please give your feedback on this proposal |
I'll postpone starting the implementation of this feature after December 18 (UTC+9) and wait for feedback from @gaocegege @johnugeorge |
Sorry for the late reply. I am wondering how to use the JSON feature from the user side. Could you please give us an example, e.g. Experiment YAML? |
Thanks for your reply! @gaocegege apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
namespace: kubeflow
name: file-metrics-collector
spec:
objective:
type: maximize
goal: 0.99
objectiveMetricName: accuracy
additionalMetricNames:
- loss
metricsCollectorSpec:
source:
- filter:
- metricsFormat:
- - "{metricName: ([\\w|-]+), metricValue: ((-?\\d+)(\\.\\d+)?)}"
fileSystemPath:
path: "/katib/mnist.log"
kind: File
# omitempty; default=Text
+ fileFormat: Json
collector:
kind: File
... |
Gotcha, LGTM. BTW, the variable should be JSON instead of Json. I think we can go ahead! 🎉 |
Thank you for giving me feedback! @gaocegege
It makes sense. |
I'd like to start the implementation of this feature! /assign |
Please assign me when the PR is ready to review Thanks for your contribution! 🎉 👍 |
Sure, Thanks for your review! @gaocegege |
/kind feature
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Motivation
Currently, it is difficult to parse JSON format files by
file-metrics-collector
using regexp filter sincefile-metrics-collector
is designed to use TEXT format files.I believe if
file-metrics-collector
supports JSON format files, we can be further made Katib powerful because we can make use of JSON format metrics files without regexp more easily.Therefore, I would like to support JSON format in
file-metrics-collector
, such as the following example, which is split by newlines.This JSON format is also used in cloudml-hypertune recommended for use in GCP AI Platform or Vertex AI.
https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#other_machine_learning_frameworks_or_custom_containers
Design
I'm thinking of the following Kubernetes API and webhook. Also,
file-metrics-collector
collects values whoose key isspec.objective.objectiveMetricName
andspec.objective.additionalMetricNames
from the metrcs file ifFileSystemFileFormat
is setJson
.Does it sound good to you? @kubeflow/wg-automl-leads
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
The text was updated successfully, but these errors were encountered: