Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #261 partially #426

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Fixes #261 partially #426

wants to merge 7 commits into from

Conversation

Oseltamivir
Copy link

  • add version detect commands and version check regex
  • added description in README

The changes so far has been tested on a ubuntu 22.04 docker image.

Some utilities that are not yet tested:
Packages:
libmkl-dev
linux-tools

@Oseltamivir Oseltamivir requested a review from a team as a code owner October 28, 2024 07:48
Copy link

github-actions bot commented Oct 28, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@arjunsuresh
Copy link
Contributor

Thank you @Oseltamivir for your changes. Can you please tell if you are from an MLCommons member organization? Can you please sign the CLA?

@Oseltamivir
Copy link
Author

@arjunsuresh Sorry for late reply, I just signed the CLA

Not sure how to get bot to recheck

@arjunsuresh
Copy link
Contributor

Thank you @Oseltamivir The CLA test now passes. There is a problem for md5sum version detect though as the version output is different on macos. Should we change the re to r"\b\d+\.\d+(?:\.\d+)?\b" or do you have some better idea here?

md5sum --version
Microbrew md5sum/sha1sum/ripemd160sum 0.9.5 (Wed Dec  6 12:48:56 EST 2006)
  Compiled Dec  6 2006 at 17:48:56
Written by Bulent Yilmaz

Copyright (C) 2004,2006 Microbrew Software

@Oseltamivir
Copy link
Author

Oseltamivir commented Oct 29, 2024

Thanks @arjunsuresh Yea, I think your regex will be more suitable.

Also, I had some other problems finding versions for:
libmkl-dev
linux-tools

And some other packages have been tested on ubuntu only (Those with dpkg) But I see the package manager commands yum and dnf, so I am not sure if there should be added commands for RHEL/fedora.

Also to add onto documentation if needed, I used https://regexr.com/ for regex testing and https://regex-generator.olafneumann.org for generating complex regexes

@arjunsuresh
Copy link
Contributor

@Oseltamivir yes, thats correct. Most times the version detection shouldn't change depending on the package manager but we still need to test. Would you like to join the weekly 30 minutes CM sync to discuss on this? It is at GMT 6:05pm Tuesdays.

https://meet.google.com/jtf-crbz-ezz?

@Oseltamivir
Copy link
Author

I apologise but I dont think I can make it. I'm in GMT+8 which is around 3am... maybe I'll try next week.

@arjunsuresh
Copy link
Contributor

No worries @Oseltamivir Let me know a good time and we can sync separately. Usually GMT 9-12 works best for me.

@Oseltamivir
Copy link
Author

I think next week's CM sync should be fine. I just need to plan ahead. Sleeping nowadays at 3am anyway ¯_(ツ)_/¯

@arjunsuresh
Copy link
Contributor

No worries. Good night! :)

@arjunsuresh
Copy link
Contributor

@Oseltamivir Your PR was a big motivation for us to add the cm test automation which we now use to test any CM script. For get-generic-sys-util the tests are run for all variations on ubuntu-20.04, ubuntu-22.04, ubuntu24.04 and rhel9. You can see the results here

@Oseltamivir
Copy link
Author

@arjunsuresh Awesome! Also, if you celebrate it, Happy Diwali!

Let me know if there are any regexes/tasks you need me to do regarding this aspect. Meanwhile, I will try to resolve issues and see if i can contribute to this repo and mlcommons/inference - my current focus is attempting to make the default reference implementation use all GPUs on a single node.

@arjunsuresh
Copy link
Contributor

Thank you @Oseltamivir It'll be great if this PR is fixed and merged as it adds a very useful feature.

"reference implementation use all GPUs on a single node"

This is very useful as and some of the reference implementations like for llama2-70b already use all available GPUs on a node. But most of them don't.

@Oseltamivir
Copy link
Author

Sure. No prob. I'll try to slowly go thru the failing tests

@Oseltamivir
Copy link
Author

Oseltamivir commented Oct 31, 2024

Got very frustrated over a specific library

pstree --version prints to stderr instead of stdout!!!! Spent the last 2 days ripping my hair out over why the regex isn't working.

Added a specific clause in the .sh that detects if command is pstree --version which will write to tmp-ver.out from stderr instead of stdout. It is a messy way but I think it is sufficient for now unless other libraries write to stderr.

Got a feeling the errors for bzcat --version in the build tests are also caused by printing to stderr...

I will move onto check the other regexes soon. Currently wondering why dpkg -l | grep libnuma-dev failed on ResNet50... I think it might not be installed... @arjunsuresh Should we run the install command in _cm.json if tmp-ver.out is empty?

@arjunsuresh
Copy link
Contributor

Oh. That's an interesting find. Thank you for digging into it.

I think it's better to set the below variable in _cm.json for the packages that needs error stream so that the bash script becomes cleaner.

env['CM_SYS_UTIL_VERSION_CMD_USE_ERROR_STREAM'] = yes

If tmp-ver.out is empty, but no error happens, I think we'll need to handle case by case. One option can be to say "version-undetected" but for this we need to make sure that the installation is successful.

@Oseltamivir
Copy link
Author

Current issues:

For macOS build:
md5sum: command not found

  • I don't really understand why this error occurs except that installation wasn't successful With the correct software, it should work. Tested on a macbook.

For ubuntu:
dpkg -l | grep libnuma-dev
no packages found matching libnuma-dev

Not sure why this errors... on a numa supported device it shld work. Tested to be working on docker from runner and ubuntu bare metal.

@arjunsuresh
Copy link
Contributor

@Oseltamivir I guess I have fixed the errors on macos - the problem is md5sum --version works fine but it still returns 255 on macos. So, added a grep suffix to the version detect command. Also, for the newly added version regular expression the group number is 0 and not 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants