-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLPerf ResNet inference workflow failing on Windows #311
Comments
Thank you @arjunsuresh, but I now got another error at the same place with the same command (after cleaning cache):
|
oh. Looks like the script is broken on Windows. @anandhu-eng can you please check? We actually don't have the github test for R50 on Windows - I believe the workflow is having other issues too on Windows. |
Oh, I didn't see that it was excluded from the test ... It will be nice to fix it and add to the tests since I fixed and used it this spring and it worked ... |
Hi @arjunsuresh @gfursin , the issue occurs because |
Actually, we do have md5sum for Windows. I added it to the script get-sys-utils-cm a while ago (https://zenodo.org/record/6501550/files/cm-artifact-os-windows-32.zip). That's where I keep the min set of aux tools for Windows (including wget). I remember I made our download scripts compatible with Windows and md5sum was working last year using this dependency ... @anandhu-eng - I suggest to use it. I will try to update it with the latest wget.exe today too ... |
@anandhu-eng . In fact, we use get-sys-utils-min on Windows instead of get-sys-utils-cm . It has md5sum that I use on Windows. |
Hi @gfursin I think the problem started when we added md5sum check for the downloads via |
Sorry @gfursin , you are right. I am testing the script after passing the values of |
Oh, yes, sure! that's the reason why I was not using subprocess call for such tests but ran md5um it via run.bat - in such case we could reuse dependencies from other CM scripts... I remember that I manage to make md5sum work on Windows and Linux like that. That's one of the reasons to have Windows tests to avoid breaking such cross-platform functionality ;) ... |
Thank you @anandhu-eng . |
I just noticed that with the latest updates, all CM workflows for Windows are failing, even a basic test with image-classification: cmr "python image-classification onnx"
cmr "python image-classification torch"
...
INFO:root:* cm run script "python image-classification onnx"
INFO:root: * cm run script "detect os"
INFO:root: ! cd D:\Work1\CM\repos\fgg\fgg.work
INFO:root: ! call D:\Work1\CM\repos\mlcommons@cm4mlops\script\detect-os\run.bat from tmp-run.bat
INFO:root: ! call "postprocess" from D:\Work1\CM\repos\mlcommons@cm4mlops\script\detect-os\customize.py
INFO:root: * cm run script "get sys-utils-min"
INFO:root: ! load D:\Work1\CM\repos\local\cache\937028e67dc34760\cm-cached-state.json
INFO:root: * cm run script "get sys-utils-cm"
INFO:root: ! load D:\Work1\CM\repos\local\cache\1ec8fce112e84778\cm-cached-state.json
INFO:root: * cm run script "get python3"
INFO:root: ! load D:\Work1\CM\repos\local\cache\b27f55b438e94d81\cm-cached-state.json
INFO:root:Path to Python: C:\!Progs\Python310\python.exe
INFO:root:Python version: 3.10.11
INFO:root: * cm run script "get dataset imagenet image-classification original _run-during-docker-build"
INFO:root: * cm run script "detect os"
INFO:root: ! cd D:\Work1\CM\repos\local\cache\ca90e727f1414812
INFO:root: ! call D:\Work1\CM\repos\mlcommons@cm4mlops\script\detect-os\run.bat from tmp-run.bat
INFO:root: ! call "postprocess" from D:\Work1\CM\repos\mlcommons@cm4mlops\script\detect-os\customize.py
INFO:root: * cm run script "get sys-utils-min"
INFO:root: ! load D:\Work1\CM\repos\local\cache\937028e67dc34760\cm-cached-state.json
INFO:root: * cm run script "download-and-extract file _extract _url.http://cKnowledge.org/ai/data/ILSVRC2012_img_val_500.tar"
INFO:root: * cm run script "download file _cmutil _url.http://cKnowledge.org/ai/data/ILSVRC2012_img_val_500.tar"
INFO:root: * cm run script "detect os"
INFO:root: ! cd D:\Work1\CM\repos\local\cache\ca90e727f1414812
INFO:root: ! call D:\Work1\CM\repos\mlcommons@cm4mlops\script\detect-os\run.bat from tmp-run.bat
INFO:root: ! call "postprocess" from D:\Work1\CM\repos\mlcommons@cm4mlops\script\detect-os\customize.py
INFO:root: * cm run script "get sys-utils-min"
INFO:root: ! load D:\Work1\CM\repos\local\cache\937028e67dc34760\cm-cached-state.json
INFO:root: ! cd D:\Work1\CM\repos\local\cache\ca90e727f1414812
INFO:root: ! call D:\Work1\CM\repos\mlcommons@cm4mlops\script\download-file\run.bat from tmp-run.bat
md5sum: standard input: no properly formatted MD5 checksum lines found
INFO:root: ! call "postprocess" from D:\Work1\CM\repos\mlcommons@cm4mlops\script\download-file\customize.py
CM error: Downloaded path D:\Work1\CM\repos\local\cache\ca90e727f1414812\ILSVRC2012_img_val_500.tar does not exist. Probably CM_DOWNLOAD_FILENAME is not set and CM_DOWNLOAD_URL given is not pointing to a file!
Downloading from http://cKnowledge.org/ai/data/ILSVRC2012_img_val_500.tar
File ILSVRC2012_img_val_500.tar already present, original checksum and computed checksum matches! Skipping Download..
It's a top priority to fix the core CM functionality for Windows and add tests to avoid breaking it. Thanks a lot!!!! |
Hi @gfursin , I have made updates in PR #318, the download error was fixed, but got another error at later stage of the run.
run command:
|
Thank you very much @anandhu-eng for a quick response - very appreciated. By the way, the download is fixed on Windows and I can run image-classification without an issue. I also tried the above workflow and it failed with a slightly different error:
It is weird because the MVC 18+ is detected - let me dig into it ... |
Oh, I think it's because we have to call vcvars64.bat from Visual Studio to set up all variables for MVCC but it's not called anymore ... Such scripts were picked up in run.bat to build C/C++ programs (I actually developed the possibility to run native bat scripts in CM4MLOps especially for this reason to support complex sub-dependencies that require extra bat files to set up environment variables). It worked fine when I tested it a few months ago but it seems that the functionality to build and run programs has changed since then and run.bat is not called? Is it possible to check how to bring such support back, please? The way I designed CM and CM4MLOps was to always keep backwards compatibility of CM scripts with all platforms and gradually enhance it (Linux, Windows, MacOS, etc) - this is an important feature of CM and I suggest to keep this concept ... Thanks a lot! |
Hi @gfursin I don't think we ever touched this feature. MLPerf inference R50 is actually working fine on Windows as seen in this runlog and it uses |
Closing this issue as Visual Studio compilation issue is different. |
Hi @anandhu-eng and @arjunsuresh,
I just tried the CM-MLPerf workflow for ResNet benchmark with the latest CM4MLOps mlperf-inference branch on my Windows machine and it fails when downloading a model.
It's weird because we have a GitHub test for this benchmark on Windows...
Do you mind to check if it works on your side, please? If not, I am curious why our tests didn't capture that.
Here is the command I used on Windows 11 (copy/paste from our GitHub test):
and here is the error:
Thanks!
The text was updated successfully, but these errors were encountered: