Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The file metric collector example docker image does not sync with the code #945

Closed
yeya24 opened this issue Dec 2, 2019 · 6 comments
Closed
Labels

Comments

@yeya24
Copy link
Contributor

yeya24 commented Dec 2, 2019

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
The trial image docker.io/liuhougangxa/pytorch-mnist:1.0 in https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/file-metricscollector-example.yaml is outdated with https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/file-metrics-collector/mnist.py.

The mnist.py in the docker image

def test(args, model, device, test_loader, epoch):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    logging.info('\n{{metricName: accuracy, metricValue: {:.4f}}};{{metricName: loss, metricValue: {:.4f}}}\n'.format(float(correct) / len(test_loader.dataset), test_loss))

Here the logging format is {{metricName: accuracy, metricValue: {:.4f}}}, so that the file collector cannot parse it correctly.

@hougangliu

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Kubeflow version:
  • Minikube version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@andreyvelich
Copy link
Member

I faced with the same problem.
After creating docker image with the latest mnist.py code file-metricscollector-example works.

Maybe, we can't pass CI tests because of this issue, in e2e test we use yaml from examples folder:
https://github.com/kubeflow/katib/blob/master/test/scripts/v1alpha3/run-file-metricscollector.sh#L59

@hougangliu
Copy link
Member

hougangliu commented Dec 4, 2019

sorry blocking you, I updated the image in #947

@johnugeorge
Copy link
Member

@hougangliu Can you move it to a common repo instead of your private registry? Currently, did you retag your image with latest changes?

@johnugeorge
Copy link
Member

Closing this issue as #949 keeps images in kubeflowkatib repo.

@johnugeorge
Copy link
Member

/close

@k8s-ci-robot
Copy link

@johnugeorge: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants