-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Refactor MIGraphX model testing with Jenkins credential access. #1671
base: develop
Are you sure you want to change the base?
[CI] Refactor MIGraphX model testing with Jenkins credential access. #1671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall note: we want this to not be a shell script that's manually managing Docker containers
""" | ||
sh """ #!/bin/bash -x | ||
|
||
if [ \$(docker ps -a -q -f name=migraphx) ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... Jenkins should be managing the Docker starts/stops, not us
echo "sshfs is already installed." | ||
fi | ||
|
||
if ! command -v sshpass &> /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in the Dockerfile
sshfs_port="22" | ||
|
||
known_hosts_file="/tmp/known_hosts" | ||
ssh-keyscan -p "$sshfs_port" "$sshfs_host" > "$known_hosts_file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I'd argue having the public key in by Jenkins credential is a better idea than trusting a keyscan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tuna folks like to use Jenkins environment variables for these things, so they can set them up in the job and keep them out of github. IT may also complain, even just about visible hostnames.
Our codecov stage uses withCredentials
to fetch a Jenkins credential into an environment varlable.
My unpublished efforts to use an ssh tunnel for a Tuna database connection settled on sshagent (credentials: ['rocmlir-ixt-rack-15']) { ... ssh commands ... }
But note that ssh is weird on our CI nodes -- lockharts are different from 10.216.64.100 hosts are different from the rest.
echo "int8:$int8" | ||
echo "checkFor:$checkFor" | ||
|
||
docker pull rocm/mlir-migraphx-ci:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a stage{}
in Jenkins etc.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1671 +/- ##
===========================================
- Coverage 77.84% 77.79% -0.06%
===========================================
Files 100 100
Lines 27720 27732 +12
Branches 4027 4028 +1
===========================================
- Hits 21578 21573 -5
- Misses 4498 4522 +24
+ Partials 1644 1637 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
13eab37
to
2661e23
Compare
2661e23
to
d31ca83
Compare
This commit implements server access via Jenkins credentials, replacing the need for the previously used model-testing.sh script. The new testing framework now supports five architectures: MI200, MI300, Navi2x, Navi3x, and Navi4x.
It allows for performance (perf) and verification (verify) testing, with performance set as the default option. For each architecture, dedicated log files are generated, containing detailed information about model performances.
The testing framework is configured to execute batch size tests of 1, 32, and 64 for each model, ensuring comprehensive coverage. Additionally, tests are scheduled to run once a week, providing regular updates on model performance metrics.
This update enhances the efficiency and maintainability of our model testing process, streamlining access and reporting for different architectures.
Currently, there are issues with mounting models on the Navi4x machine, and efforts are underway to resolve this. All other architectures are functioning correctly.