Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Refactor MIGraphX model testing with Jenkins credential access. #1671

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

stefankoncarevic
Copy link
Contributor

This commit implements server access via Jenkins credentials, replacing the need for the previously used model-testing.sh script. The new testing framework now supports five architectures: MI200, MI300, Navi2x, Navi3x, and Navi4x.

It allows for performance (perf) and verification (verify) testing, with performance set as the default option. For each architecture, dedicated log files are generated, containing detailed information about model performances.

The testing framework is configured to execute batch size tests of 1, 32, and 64 for each model, ensuring comprehensive coverage. Additionally, tests are scheduled to run once a week, providing regular updates on model performance metrics.

This update enhances the efficiency and maintainability of our model testing process, streamlining access and reporting for different architectures.

Currently, there are issues with mounting models on the Navi4x machine, and efforts are underway to resolve this. All other architectures are functioning correctly.

Copy link
Collaborator

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall note: we want this to not be a shell script that's manually managing Docker containers

"""
sh """ #!/bin/bash -x

if [ \$(docker ps -a -q -f name=migraphx) ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Jenkins should be managing the Docker starts/stops, not us

echo "sshfs is already installed."
fi

if ! command -v sshpass &> /dev/null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the Dockerfile

sshfs_port="22"

known_hosts_file="/tmp/known_hosts"
ssh-keyscan -p "$sshfs_port" "$sshfs_host" > "$known_hosts_file"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I'd argue having the public key in by Jenkins credential is a better idea than trusting a keyscan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tuna folks like to use Jenkins environment variables for these things, so they can set them up in the job and keep them out of github. IT may also complain, even just about visible hostnames.

Our codecov stage uses withCredentials to fetch a Jenkins credential into an environment varlable.

My unpublished efforts to use an ssh tunnel for a Tuna database connection settled on sshagent (credentials: ['rocmlir-ixt-rack-15']) { ... ssh commands ... }

But note that ssh is weird on our CI nodes -- lockharts are different from 10.216.64.100 hosts are different from the rest.

echo "int8:$int8"
echo "checkFor:$checkFor"

docker pull rocm/mlir-migraphx-ci:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a stage{} in Jenkins etc.

Copy link

codecov bot commented Oct 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.79%. Comparing base (e454b5d) to head (d31ca83).
Report is 6 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1671      +/-   ##
===========================================
- Coverage    77.84%   77.79%   -0.06%     
===========================================
  Files          100      100              
  Lines        27720    27732      +12     
  Branches      4027     4028       +1     
===========================================
- Hits         21578    21573       -5     
- Misses        4498     4522      +24     
+ Partials      1644     1637       -7     
Flag Coverage Δ
mfma 77.79% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants