Fixes #261 partially #426

Oseltamivir · 2024-10-28T07:48:33Z

add version detect commands and version check regex
added description in README

The changes so far has been tested on a ubuntu 22.04 docker image.

Some utilities that are not yet tested:
Packages:
libmkl-dev
linux-tools

github-actions · 2024-10-28T07:48:47Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

arjunsuresh · 2024-10-28T21:47:58Z

Thank you @Oseltamivir for your changes. Can you please tell if you are from an MLCommons member organization? Can you please sign the CLA?

Oseltamivir · 2024-10-29T08:00:16Z

@arjunsuresh Sorry for late reply, I just signed the CLA

Not sure how to get bot to recheck

arjunsuresh · 2024-10-29T08:58:14Z

Thank you @Oseltamivir The CLA test now passes. There is a problem for md5sum version detect though as the version output is different on macos. Should we change the re to r"\b\d+\.\d+(?:\.\d+)?\b" or do you have some better idea here?

md5sum --version
Microbrew md5sum/sha1sum/ripemd160sum 0.9.5 (Wed Dec  6 12:48:56 EST 2006)
  Compiled Dec  6 2006 at 17:48:56
Written by Bulent Yilmaz

Copyright (C) 2004,2006 Microbrew Software

Oseltamivir · 2024-10-29T09:53:09Z

Thanks @arjunsuresh Yea, I think your regex will be more suitable.

Also, I had some other problems finding versions for:
libmkl-dev
linux-tools

And some other packages have been tested on ubuntu only (Those with dpkg) But I see the package manager commands yum and dnf, so I am not sure if there should be added commands for RHEL/fedora.

Also to add onto documentation if needed, I used https://regexr.com/ for regex testing and https://regex-generator.olafneumann.org for generating complex regexes

arjunsuresh · 2024-10-29T10:56:57Z

@Oseltamivir yes, thats correct. Most times the version detection shouldn't change depending on the package manager but we still need to test. Would you like to join the weekly 30 minutes CM sync to discuss on this? It is at GMT 6:05pm Tuesdays.

https://meet.google.com/jtf-crbz-ezz?

Oseltamivir · 2024-10-29T11:38:35Z

I apologise but I dont think I can make it. I'm in GMT+8 which is around 3am... maybe I'll try next week.

arjunsuresh · 2024-10-29T14:29:13Z

No worries @Oseltamivir Let me know a good time and we can sync separately. Usually GMT 9-12 works best for me.

Oseltamivir · 2024-10-29T15:49:48Z

I think next week's CM sync should be fine. I just need to plan ahead. Sleeping nowadays at 3am anyway ¯_(ツ)_/¯

arjunsuresh · 2024-10-29T16:22:20Z

No worries. Good night! :)

arjunsuresh · 2024-10-31T13:35:30Z

@Oseltamivir Your PR was a big motivation for us to add the cm test automation which we now use to test any CM script. For get-generic-sys-util the tests are run for all variations on ubuntu-20.04, ubuntu-22.04, ubuntu24.04 and rhel9. You can see the results here

Oseltamivir · 2024-10-31T13:54:59Z

@arjunsuresh Awesome! Also, if you celebrate it, Happy Diwali!

Let me know if there are any regexes/tasks you need me to do regarding this aspect. Meanwhile, I will try to resolve issues and see if i can contribute to this repo and mlcommons/inference - my current focus is attempting to make the default reference implementation use all GPUs on a single node.

arjunsuresh · 2024-10-31T14:43:04Z

Thank you @Oseltamivir It'll be great if this PR is fixed and merged as it adds a very useful feature.

"reference implementation use all GPUs on a single node"

This is very useful as and some of the reference implementations like for llama2-70b already use all available GPUs on a node. But most of them don't.

Oseltamivir · 2024-10-31T14:57:00Z

Sure. No prob. I'll try to slowly go thru the failing tests

Oseltamivir · 2024-10-31T17:37:20Z

Got very frustrated over a specific library

pstree --version prints to stderr instead of stdout!!!! Spent the last 2 days ripping my hair out over why the regex isn't working.

Added a specific clause in the .sh that detects if command is pstree --version which will write to tmp-ver.out from stderr instead of stdout. It is a messy way but I think it is sufficient for now unless other libraries write to stderr.

Got a feeling the errors for bzcat --version in the build tests are also caused by printing to stderr...

I will move onto check the other regexes soon. Currently wondering why dpkg -l | grep libnuma-dev failed on ResNet50... I think it might not be installed... @arjunsuresh Should we run the install command in _cm.json if tmp-ver.out is empty?

arjunsuresh · 2024-10-31T18:47:16Z

Oh. That's an interesting find. Thank you for digging into it.

I think it's better to set the below variable in _cm.json for the packages that needs error stream so that the bash script becomes cleaner.

env['CM_SYS_UTIL_VERSION_CMD_USE_ERROR_STREAM'] = yes

If tmp-ver.out is empty, but no error happens, I think we'll need to handle case by case. One option can be to say "version-undetected" but for this we need to make sure that the installation is successful.

Oseltamivir · 2024-11-01T09:01:08Z

Current issues:

For macOS build:
md5sum: command not found

I don't really understand why this error occurs except that installation wasn't successful With the correct software, it should work. Tested on a macbook.

For ubuntu:
dpkg -l | grep libnuma-dev
no packages found matching libnuma-dev

Not sure why this errors... on a numa supported device it shld work. Tested to be working on docker from runner and ubuntu bare metal.

arjunsuresh · 2024-11-01T13:07:43Z

@Oseltamivir I guess I have fixed the errors on macos - the problem is md5sum --version works fine but it still returns 255 on macos. So, added a grep suffix to the version detect command. Also, for the newly added version regular expression the group number is 0 and not 1.

Add Version RE for g++-11

Use berkeley link for imagenet-aux by default

Code cleanup

arjunsuresh · 2024-11-04T12:00:32Z

Hi @Oseltamivir I removed the version detection ones which are failing and those which reply on dpkg or pkg_config as currently we assume that when version detection of a package is successful, the package is installed and available. We can fix these later. Currently all the tests are passing.

automation/script/module.py

arjunsuresh

Thank you for the changes

Oseltamivir requested a review from a team as a code owner October 28, 2024 07:48

Updated fix for tests

cc3d251

Oseltamivir force-pushed the main branch from 9b4bc61 to cc3d251 Compare October 28, 2024 12:23

arjunsuresh added 3 commits October 29, 2024 16:22

Merge branch 'main' into main

5e9c03c

Merge branch 'main' into main

7cf27c6

Merge branch 'main' into main

8686d81

- fixed regex for md5sum, pstree, and bzcat which caused failed tests

23d692b

Oseltamivir force-pushed the main branch from b37f582 to 23d692b Compare October 31, 2024 15:15

- fixed an annoying psmisc/pstree bug

6723d2c

- changed to use env var to detect stderr, fixed some regexes

737024b

arjunsuresh and others added 2 commits November 1, 2024 14:55

Use raw string in build-docker-image

ffa3966

Merge remote-tracking branch 'upstream/main'

8f971e7

arjunsuresh and others added 24 commits November 3, 2024 23:05

Fix tmp-run-env.out name

000a799

Skip g++11 version detection for ubuntu 20.04

4795f0b

Merge branch 'mlperf-inference' into mlperf-inference

c0823fa

Skip g++11 version detection for ubuntu 20.04

dd2c715

Skip g++11 version detection for ubuntu 20.04

f1c9e31

Merge pull request mlcommons#472 from GATEOverflow/mlperf-inference

4f7cfce

Add Version RE for g++-11

Merge branch 'mlperf-inference' into main

185a88e

Update _cm.json

9990d98

Remove version detect for libnuma-dev

432a6ed

Use berkeley link for imagenet-aux by default

4478fde

Merge branch 'mlperf-inference' into mlperf-inference

16578be

Merge pull request mlcommons#473 from GATEOverflow/mlperf-inference

30ca2bc

Use berkeley link for imagenet-aux by default

Merge branch 'mlperf-inference' into main

eb45fce

Remove ntpdate version detect

c54eaaa

Update _cm.json

9739faf

Update _cm.json

3c7904b

Update _cm.json | Remove failing version detects

9501b18

Removed pkg detection from pkg_config

c9edada

Update _cm.json

6385998

Generalised the code

cb0c19a

split exclude condition

b206baf

Merge pull request mlcommons#475 from mlcommons/anandhu-eng-patch-4

a8082e1

Code cleanup

Rename README.md to README-extra.md

16a88af

Merge branch 'mlperf-inference' into main

961c6fe

arjunsuresh reviewed Nov 4, 2024

View reviewed changes

automation/script/module.py Outdated Show resolved Hide resolved

Update module.py

8470530

arjunsuresh approved these changes Nov 6, 2024

View reviewed changes

arjunsuresh merged commit 49e1cda into mlcommons:main Nov 6, 2024
54 checks passed

github-actions bot locked and limited conversation to collaborators Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #261 partially #426

Fixes #261 partially #426

Oseltamivir commented Oct 28, 2024

github-actions bot commented Oct 28, 2024 •

edited

Loading

arjunsuresh commented Oct 28, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024 •

edited

Loading

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

arjunsuresh commented Oct 31, 2024

Oseltamivir commented Oct 31, 2024

arjunsuresh commented Oct 31, 2024

Oseltamivir commented Oct 31, 2024

Oseltamivir commented Oct 31, 2024 •

edited

Loading

arjunsuresh commented Oct 31, 2024

Oseltamivir commented Nov 1, 2024

arjunsuresh commented Nov 1, 2024

arjunsuresh commented Nov 4, 2024

arjunsuresh left a comment

Fixes #261 partially #426

Fixes #261 partially #426

Conversation

Oseltamivir commented Oct 28, 2024

github-actions bot commented Oct 28, 2024 • edited Loading

arjunsuresh commented Oct 28, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024 • edited Loading

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

arjunsuresh commented Oct 31, 2024

Oseltamivir commented Oct 31, 2024

arjunsuresh commented Oct 31, 2024

Oseltamivir commented Oct 31, 2024

Oseltamivir commented Oct 31, 2024 • edited Loading

arjunsuresh commented Oct 31, 2024

Oseltamivir commented Nov 1, 2024

arjunsuresh commented Nov 1, 2024

arjunsuresh commented Nov 4, 2024

arjunsuresh left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 28, 2024 •

edited

Loading

Oseltamivir commented Oct 29, 2024 •

edited

Loading

Oseltamivir commented Oct 31, 2024 •

edited

Loading