-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nightly build workflow #250
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/250
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 8048bc7 with merge base e9e5fae (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
No ciflow labels are configured for this repo. |
Thanks Huy! Am I reading this correctly that this is still not a manylinux binary? https://github.com/pytorch/ao/actions/runs/9122140883/job/25082459762?pr=250#step:14:538 Also re the upload where will this upload? It seems like the pytorch s3 bucket? s3://pytorch/whl/nightly/cu124/torchao_nightly-2024.5.17-cp310-cp310-linux_x86_64.whl does this get copied in a cron job to https://download.pytorch.org/whl/nightly/torchao/ ? |
I'm asking myself the same question. I think the workflow does its job expectedly because I can see PyTorch nightly binaries in the same format https://download.pytorch.org/whl/nightly/torch/ or https://download.pytorch.org/whl/nightly/torchvision. PyTorch wheels go through this script before being uploaded to pypi https://github.com/pytorch/builder/blob/main/release/promote/wheel_to_pypi.sh#L45-L59, so I'm pretty sure that this is where it preps and renames the file so that it becomes manylinux1 on PYPI, i.e. https://pypi.org/project/torch/2.3.0/#files
Yeah, the file will be uploaded to https://download.pytorch.org/whl/nightly under different CUDA arch, for example https://download.pytorch.org/whl/nightly/cu121. Their installation instruction will be |
About the script https://github.com/pytorch/builder/blob/main/release/promote/wheel_to_pypi.sh#L45-L59, I will check with @atalman and follow up on this tomorrow, we have several choices:
|
@msaroufim Could you help confirm that the nightly binaries are correct https://pypi.org/project/torchao-nightly/2024.5.17/#files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
The main more general questions I have are
It seems like this script already also does official releases by switching this variable here to anything except a value including the word nightly
ref:
description: Reference to checkout
default: nightly
And the second question is I just did a manual validation of the nightly binaries by running
pip install torchao-nightly==2024.5.17
python
import torchao.ops
dir(torch.ops)
And then
pytest test/test_ops.py .......
So it seems like I did successfully compile and use CUDA binaries that were built on a different machine than mine which means that the cuda manylinux support is working and that's awesome! Does this give us enough confidence the manylinux support will continue working or should we also have binary validation where we use test machines that are different from the build machines
packaging/pre_build_script.sh
Outdated
|
||
pip install setuptools wheel twine auditwheel | ||
|
||
# NB: This will make the nightly wheel compatible with the latest stable torch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the correct release strategy might be
- Nightly ao with nightly torch
- Release ao with release torch
So at least to merge this let's use the nightly torch versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the fix is very simple, we just need to remove the step to install torch here. The build workflow has already installed the correct torch version, i.e. nightly/release torch for nightly/release ao respectively
Yes, that's the benefit of using Nova build workflow. Nightly and release builds are both supported by the same workflows. When it comes to release, we could follow the same practice from PyTorch release https://github.com/pytorch/pytorch/blob/main/RELEASE.md of doing branch cut and build the release candidates.
You are spot on. We do have 2 layers of validation for release. What you see here is the first layer where we have the post build script and the smoke test Python to perform any validations we want on the same runner. If this fails, the binaries would not even be published. The second validation is indeed performed on a separate set of ephemeral "clean" runners. This is done from here https://github.com/pytorch/builder/actions/workflows/validate-binaries.yml and we run it manually when building a new release candidates for PyTorch and several others (vision, audio, text). This can come later. We also want to answer the question if torchao wants to follow the same release cadence as PyTorch or if it's released independently. The former means that we can hook torchao into the same https://github.com/pytorch/builder/actions/workflows/validate-binaries.yml workflow, while the latter means that a new validate binaries workflow is needed on torchao repo itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it generally looks good aside from a matrix question for the upload pypi bits.
test-infra-repository: pytorch/test-infra | ||
test-infra-ref: main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test-infra-repository: pytorch/test-infra | |
test-infra-ref: main |
These are set by default so no need to specify them explicitly
strategy: | ||
fail-fast: false | ||
matrix: | ||
include: | ||
- repository: pytorch/ao | ||
pre-script: packaging/pre_build_script.sh | ||
post-script: packaging/post_build_script.sh | ||
smoke-test-script: packaging/smoke_test.py | ||
package-name: torchao | ||
name: ${{ matrix.repository }} | ||
uses: pytorch/test-infra/.github/workflows/build_wheels_linux.yml@main | ||
with: | ||
repository: ${{ matrix.repository }} | ||
ref: "" | ||
test-infra-repository: pytorch/test-infra | ||
test-infra-ref: main | ||
build-matrix: ${{ needs.generate-matrix.outputs.matrix }} | ||
env-var-script: packaging/env_var_script_linux.sh | ||
pre-script: ${{ matrix.pre-script }} | ||
post-script: ${{ matrix.post-script }} | ||
package-name: ${{ matrix.package-name }} | ||
smoke-test-script: ${{ matrix.smoke-test-script }} | ||
trigger-event: ${{ github.event_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy: | |
fail-fast: false | |
matrix: | |
include: | |
- repository: pytorch/ao | |
pre-script: packaging/pre_build_script.sh | |
post-script: packaging/post_build_script.sh | |
smoke-test-script: packaging/smoke_test.py | |
package-name: torchao | |
name: ${{ matrix.repository }} | |
uses: pytorch/test-infra/.github/workflows/build_wheels_linux.yml@main | |
with: | |
repository: ${{ matrix.repository }} | |
ref: "" | |
test-infra-repository: pytorch/test-infra | |
test-infra-ref: main | |
build-matrix: ${{ needs.generate-matrix.outputs.matrix }} | |
env-var-script: packaging/env_var_script_linux.sh | |
pre-script: ${{ matrix.pre-script }} | |
post-script: ${{ matrix.post-script }} | |
package-name: ${{ matrix.package-name }} | |
smoke-test-script: ${{ matrix.smoke-test-script }} | |
trigger-event: ${{ github.event_name }} | |
name: build-wheel | |
uses: pytorch/test-infra/.github/workflows/build_wheels_linux.yml@main | |
with: | |
build-matrix: ${{ needs.generate-matrix.outputs.matrix }} | |
pre-script: packaging/pre_build_script.sh | |
post-script: packaging/post_build_script.sh | |
package-name: torchao | |
smoke-test-script: packaging/smoke_test.py | |
trigger-event: ${{ github.event_name }} |
This can be simplified to utilize the proper defaults, also a matrix is unnecessary here since you're only doing this for one repository
repository: pytorch/ao | ||
ref: "" | ||
test-infra-repository: pytorch/test-infra | ||
test-infra-ref: main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repository: pytorch/ao | |
ref: "" | |
test-infra-repository: pytorch/test-infra | |
test-infra-ref: main |
These are already set by default so you don't actually need them
.github/workflows/pypi_upload.yml
Outdated
@@ -0,0 +1,78 @@ | |||
name: upload-pypi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually might be good to breakout to pytorch/test-infra
eventually, seems generally useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me do that. Moving this to test-infra would actually solve the other comments you have about the complexity of the input matrix. I only need the matrix to feed it into ./test-infra/.github/actions/setup-binary-upload
, the one that set env.ARTIFACT_NAME
. For the context, this setup allows the upload to go on even when some build config fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it goes pytorch/test-infra#5217
.github/workflows/pypi_upload.yml
Outdated
# TODO (huydhn): This is a work around to upload only CUDA 12.1 to pypi. | ||
# Other binaries for different CUDA versions will be available on | ||
# download.pytorch.org | ||
if: ${{ inputs.trigger-event == 'schedule' && matrix.desired_cuda == 'cu121' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait so does this mean that we spawn a bunch of jobs here to upload to pypi but only actually do the upload for cu121
?
Why not just make it so it only does the upload for the specific one we want instead of spawning a bunch of jobs that are basically no-ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I tried to skip those no-op upload, but thought that checking for cu121
was like a hack anyway. So, I plan to make change further upstream to pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
later to limit the cuda version there instead.
Anyway, I think those redundant jobs will be dealt with after I update test-infra upload step to handle pypi upload. A small refactor is on the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I do want us to mimic what torch does. So for example in our official releases use cuda 11.8 and 12.1. Not sure if policy is different for cuda versions between nightlies and official release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PyTorch, both nightly and official releases can be installed from https://download.pytorch.org for all CUDA versions. We then upload to pypi only one blessed CUDA version, which is 12.1 atm https://pypi.org/project/torch/2.3.0/#files. So, what we have here indeed mimic what PyTorch has.
with: | ||
package-type: wheel | ||
os: linux | ||
with-cpu: disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would still want to upload cpu only binaries
This is to support torchao use case where it uploads 12.1 CUDA nightly build to https://pypi.org/project/torchao-nightly ### Testing Do some testing on torchao pytorch/ao#250: * CUDA 12.1 is uploaded https://github.com/pytorch/ao/actions/runs/9144442527/job/25142401608?pr=250 to https://pypi.org/project/torchao-nightly/2024.5.19/#files * Other CUDA versions are skipped https://github.com/pytorch/ao/actions/runs/9144442527/job/25142401944
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work thank you!
@pytorchbot drci |
A follow-up after #5217 and pytorch/ao#250, repo like `torch/ao` doesn't have a nightly branch and build nightly directly out of main on a daily schedule.
A follow-up after #5217 and pytorch/ao#250, repo like torch/ao doesn't have a nightly branch and build nightly directly out of main on a daily schedule. This was missed in #5222. It's a bit tricky to get this right off the bat. Also need to land D57657633 to allow nightly upload on the AWS side.
* Add nightly build workflow * Fix wrong env script path * Install pip deps * Try to install gxx_linux-64 and gcc_linux-64 * Another attempt * Set TORCH_CUDA_ARCH_LIST * Does this work? * Attempt to upload to pypi * Pass the arch * Try pypa/gh-action-pypi-publish@release/v1 * Wrong workflow syntax * Try auditwheel * Repair wheel manylinux2014_x86_64 * Debug * Debug * Almost there * Debug * Almost there * Pass -y to uninstall * Final testing * Upload other python versions * Remove debug * Remove build.yml * Add more validations * Run all unit tests * Fix typo * Run only test_ops * Passing the secrets * Move pypi upload to test-infra * Enable CPU build * Switch back to main after landing pytorch/test-infra#5217 * Remove unrelated copy/paste comments
Another attempt to properly use Nova wheel build workflow because #249 doesn't seem to build the correct wheel.
This requires pytorch/test-infra#5217 to handle the pypi upload.