Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wheels debug #4

Merged
merged 65 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
84ce4b2
initial commit, corrected docker image, rhe8
nickjbrowning Nov 26, 2024
c991e0a
initial wheel commit
nickjbrowning Nov 27, 2024
ed3e456
x
nickjbrowning Nov 27, 2024
612f6dc
updates
nickjbrowning Nov 27, 2024
48035f4
x
nickjbrowning Nov 27, 2024
3c27d40
debug
nickjbrowning Nov 27, 2024
5f68ee2
update sys path
nickjbrowning Nov 27, 2024
d01a10a
x
nickjbrowning Nov 27, 2024
75cd8ab
x
nickjbrowning Nov 27, 2024
1fad105
debug
nickjbrowning Nov 27, 2024
9385807
debug
nickjbrowning Nov 27, 2024
ba71ae9
x
nickjbrowning Nov 27, 2024
33d06f6
undo
nickjbrowning Nov 27, 2024
062aa90
x
nickjbrowning Nov 27, 2024
0cea7c3
updated linear with architecture wrappers.
nickjbrowning Nov 27, 2024
b983b39
x
nickjbrowning Nov 27, 2024
222fa75
remove unecessary ifdef
nickjbrowning Nov 27, 2024
be53140
sed
nickjbrowning Nov 27, 2024
20dbdc6
status
nickjbrowning Nov 27, 2024
5299685
status
nickjbrowning Nov 27, 2024
5cb23bd
update
nickjbrowning Nov 27, 2024
600337d
update mckaelist.
nickjbrowning Nov 27, 2024
9bc9db9
separated findcuda fixes.
nickjbrowning Nov 27, 2024
58c9393
try again
nickjbrowning Nov 27, 2024
46166f3
update
nickjbrowning Nov 27, 2024
5e47157
update with conditional.
nickjbrowning Nov 27, 2024
9255427
dockerfile update.
nickjbrowning Nov 27, 2024
5130e0b
updates
nickjbrowning Nov 27, 2024
0f19345
some updates.
nickjbrowning Nov 27, 2024
b732fc2
update
nickjbrowning Nov 28, 2024
0c54126
Merge branch 'master' into wheels
nickjbrowning Dec 1, 2024
c458b26
updates.
nickjbrowning Dec 1, 2024
f7e3f40
x
nickjbrowning Dec 1, 2024
8e04dba
test
nickjbrowning Dec 1, 2024
f6611d4
try no isolation.
nickjbrowning Dec 1, 2024
230e82c
added a print.
nickjbrowning Dec 1, 2024
5091ce2
added torch find
nickjbrowning Dec 1, 2024
1fe3be0
moved find_package
nickjbrowning Dec 1, 2024
435b118
x
nickjbrowning Dec 1, 2024
906eb13
cmake
nickjbrowning Dec 1, 2024
522163c
updates
nickjbrowning Dec 1, 2024
bc704e7
fixed upload.
nickjbrowning Dec 1, 2024
9108875
multiple wheel configs...
nickjbrowning Dec 1, 2024
17102bf
typo.
nickjbrowning Dec 1, 2024
bad52b1
offset.
nickjbrowning Dec 1, 2024
ed024c6
fixes.
nickjbrowning Dec 1, 2024
08c64ed
updates.
nickjbrowning Dec 1, 2024
32ca0d6
try again.
nickjbrowning Dec 1, 2024
acbfc72
try again.
nickjbrowning Dec 1, 2024
39e5a87
fix.
nickjbrowning Dec 1, 2024
1200d89
oops...
nickjbrowning Dec 1, 2024
1ed20eb
possibly remove unuseful python version.
nickjbrowning Dec 1, 2024
af6998c
echo pythonversion.
nickjbrowning Dec 1, 2024
16bd2f1
x
nickjbrowning Dec 1, 2024
0a6ce00
variable expansion
nickjbrowning Dec 1, 2024
1301b50
fixed more substitutions.
nickjbrowning Dec 1, 2024
ae50c1d
typo.
nickjbrowning Dec 1, 2024
d49d590
not sure if I need this.
nickjbrowning Dec 1, 2024
ffffddc
debug messages.
nickjbrowning Dec 1, 2024
2b303de
potential fix.
nickjbrowning Dec 2, 2024
32de4b5
removed unecessary env.
nickjbrowning Dec 2, 2024
20e2eed
lets try aarch64...
nickjbrowning Dec 2, 2024
6e4a034
update.
nickjbrowning Dec 2, 2024
c97d64f
updates.
nickjbrowning Dec 2, 2024
d85dc5d
x
nickjbrowning Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 0 additions & 67 deletions .github/workflows/main.disabled

This file was deleted.

78 changes: 78 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Build PyTorch Wheels

on:
push:
branches:
- main
pull_request:

jobs:
build-wheels:
runs-on: ubuntu-latest
strategy:
matrix:
cibw-arch: ["x86_64"]
python-version: ["3.10", "3.11", "3.12"]
pytorch-version: ["2.4.0", "2.5.0"]
cuda-version: ["12.1", "12.4"]
env:
CIBW_SKIP: cp36-* cp37-* cp38-* cp39-*

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Github Actions Envs Setup
run: |
CUVERSION="${{ matrix.cuda-version }}"
PYTHONVERSION="${{ matrix.python-version }}"

CU_VERSION_NO_DOT=${CUVERSION//./}
echo CU_VERSION_NO_DOT=${CU_VERSION_NO_DOT} >> $GITHUB_ENV

CU_VERSION_DASH=${CUVERSION//./-}
echo CU_VERSION_DASH=${CU_VERSION_DASH} >> $GITHUB_ENV

PYTHON_VER_NO_DOT=${PYTHONVERSION//./}
echo PYTHON_VER_NO_DOT=${PYTHON_VER_NO_DOT} >> $GITHUB_ENV

# Build the custom Manylinux Docker image
- name: Build Manylinux Docker Image
run: |
docker build --no-cache \
-t manylinux2014_"${{ matrix.cibw-arch }}" \
--build-arg PYTHON_VER="${{ matrix.python-version }}" \
--build-arg PYTHON_VER_NO_DOT="${{ env.PYTHON_VER_NO_DOT }}" \
--build-arg CUDA_VER="${{ matrix.cuda-version }}" \
--build-arg CUDA_VER_NO_DOT="${{ env.CU_VERSION_NO_DOT }}" \
--build-arg CUDA_VER_DASH="${{ env.CU_VERSION_DASH }}" \
--build-arg PYTORCH_VERSION="${{ matrix.pytorch-version }}" \
scripts/manylinux2014_"${{ matrix.cibw-arch }}"

# Set up Python environment
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "${{ matrix.python-version }}"

- name: Build wheels
uses: pypa/cibuildwheel@v2.22.0
env:
CUDA_HOME: /usr/local/cuda
PIP_EXTRA_INDEX_URL: "https://download.pytorch.org/whl/cu${{ env.CUVERSION }}"
CIBW_BUILD_VERBOSITY: 3
CIBW_BUILD: "cp${{ env.PYTHON_VER_NO_DOT }}-*"
CIBW_BUILD_FRONTEND: "pip; args: --no-build-isolation"
CIBW_SKIP: "*-musllinux* *-win32 *-manylinux_i686"
CIBW_ARCHS: "${{ matrix.cibw-arch }}"
CIBW_MANYLINUX_X86_64_IMAGE: "manylinux2014_${{ matrix.cibw-arch }}"
CIBW_ENVIRONMENT: >
CUDA_HOME=/usr/local/cuda
PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu${{ env.CUVERSION }}"
CIBW_REPAIR_WHEEL_COMMAND_LINUX: |
auditwheel repair --exclude libcuda.so --exclude libcuda.so.1 --exclude libc10.so --exclude libtorch.so --exclude libtorch_cpu.so --exclude libtorch_cuda.so --exclude libc10_cuda.so --exclude libcudart.so --exclude libnvToolsExt.so --exclude libnvrtc.so --exclude libnvrtc.so.12 -w {dest_dir} {wheel}

- uses: actions/upload-artifact@v4
with:
name: "cuda_mace-py-${{ env.PYTHON_VER_NO_DOT }}-torch-${{matrix.pytorch-version}}+cu${{ env.CU_VERSION_NO_DOT }}-${{ matrix.cibw-arch }}"
path: ./wheelhouse/*.whl
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
build/*
local/*
tests/*
dist/*
*.pyc
*.pt
.vscode
cuda_mace.egg-info/
*.model
*.model
6 changes: 5 additions & 1 deletion cuda_mace/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ set(BIN_INSTALL_DIR "bin" CACHE PATH "Path relative to CMAKE_INSTALL_PREFIX wher
set(INCLUDE_INSTALL_DIR "include" CACHE PATH "Path relative to CMAKE_INSTALL_PREFIX where to install headers")

find_package(Python COMPONENTS Interpreter REQUIRED)
message(STATUS "Python Version: ${Python_VERSION}")
message (STATUS "Python Path: ${Python_EXECUTABLE}")

include(CheckLanguage)
check_language(CUDA)
Expand All @@ -33,7 +35,9 @@ endif()
string(STRIP ${TORCH_CMAKE_PATH_OUTPUT} TORCH_CMAKE_PATH_OUTPUT)
set(CMAKE_PREFIX_PATH "${CMAKE_PREFIX_PATH};${TORCH_CMAKE_PATH_OUTPUT}")

find_package(Torch 1.13 REQUIRED)
message(STATUS "TORCH_CMAKE_PATH_OUTPUT: ${TORCH_CMAKE_PATH_OUTPUT}")

find_package(Torch 2.0 REQUIRED)

add_library(cuda_mace SHARED
"jit_wrappers/src/cubic_spline_wrapper.cpp"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ jit_forward_message_passing(torch::Tensor X, torch::Tensor Y, torch::Tensor radi
dim3 bdim(NWARPS_PER_BLOCK * WARP_SIZE, 1, 1);

AT_DISPATCH_FLOATING_TYPES(
X.type(), "forward_gpu",
X.scalar_type(), "forward_gpu",
([&] {
unsigned int space = 0;
void *sptr;
Expand Down Expand Up @@ -187,7 +187,7 @@ jit_backward_message_passing(torch::Tensor X, torch::Tensor Y, torch::Tensor rad
Y, torch::TensorOptions().dtype(Y.dtype()).device(Y.device()));

AT_DISPATCH_FLOATING_TYPES(
X.type(), "backward_gpu", ([&] {
X.scalar_type(), "backward_gpu", ([&] {
dim3 bdim(NWARPS_PER_BLOCK * WARP_SIZE, 1, 1);
dim3 gdim(nnodes, 1);

Expand Down
4 changes: 2 additions & 2 deletions cuda_mace/jit_wrappers/src/symmetric_contraction_wrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ std::vector<torch::Tensor> jit_symmetric_contraction_forward(
dim3 bdim(WARP_SIZE, NWARPS_PER_BLOCK, 1);

AT_DISPATCH_FLOATING_TYPES(
X.type(), "symmetric_contraction_forwards", ([&] {
X.scalar_type(), "symmetric_contraction_forwards", ([&] {
unsigned int shared_size = 0;

void *sptr = nullptr;
Expand Down Expand Up @@ -173,7 +173,7 @@ torch::Tensor jit_symmetric_contraction_backward(torch::Tensor gradX,
dim3 bdim(WARP_SIZE, 4, 1);

AT_DISPATCH_FLOATING_TYPES(
gradX.type(), "symm_contraction_backward", ([&] {
gradX.scalar_type(), "symm_contraction_backward", ([&] {
unsigned int space =
WARP_SIZE * 16 * sizeof(scalar_t); // buffer_grad storage

Expand Down
71 changes: 49 additions & 22 deletions scripts/manylinux2014_x86_64/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,36 +1,63 @@
# Use manylinux docker image as a base
FROM quay.io/pypa/manylinux2014_x86_64

# ------------
# Install cuda
# ------------
# Set environment variables for Python and CUDA versions
ARG PYTHON_VER="3.11"
ARG PYTHON_VER_NO_DOT="311"
ARG CUDA_VER="12.4"
ARG CUDA_VER_NO_DOT="124"
ARG CUDA_VER_DASH="12-4"
ARG PYTORCH_VERSION="2.4.1"

ARG VER="12-4"
ARG ARCH="x86_64"
RUN echo "PYTHON_VERSION: ${PYTHON_VER} NO-DOT: ${PYTHON_VER_NO_DOT}"
# Install system dependencies and CUDA
RUN yum install -y yum-utils gcc gcc-c++ make zlib-devel bzip2-devel libffi-devel \
&& yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo \
&& yum install -y \
cuda-toolkit-${CUDA_VER_DASH} \
&& yum clean all \
&& rm -rf /var/cache/yum/* \
&& echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf.d/999_nvidia_cuda.conf

RUN yum install -y yum-utils
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
RUN yum -y install cuda-compiler-${VER}.${ARCH} \
cuda-libraries-${VER}.${ARCH} \
cuda-libraries-devel-${VER}.${ARCH}
RUN yum clean all
RUN rm -rf /var/cache/yum/*
RUN echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf.d/999_nvidia_cuda.conf
# Remove all other Python versions
#ENV PATH="/opt/python/cp311-cp311/bin:${PATH}"
#RUN ln -sf /opt/python/cp311-cp311/bin/python3.11 /opt/python/cp311-cp311/bin/python

# -------------------------
# Set environment variables
# -------------------------
# Set Python version environment variables dynamically
ENV PATH="/opt/python/cp${PYTHON_VER_NO_DOT}-cp${PYTHON_VER_NO_DOT}/bin:${PATH}"
RUN ln -sf /opt/python/cp${PYTHON_VER_NO_DOT}-cp${PYTHON_VER_NO_DOT}/bin/python${PYTHON_VER} /opt/python/cp${PYTHON_VER_NO_DOT}-cp${PYTHON_VER_NO_DOT}/bin/python

RUN python -m ensurepip --upgrade

# Set environment variables for CUDA
ENV PATH="/usr/local/cuda/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"
ENV CUDA_HOME=/usr/local/cuda
ENV CUDA_ROOT=/usr/local/cuda
ENV CUDA_PATH=/usr/local/cuda
ENV CUDADIR=/usr/local/cuda

RUN echo "CUDA_HOME: ${CUDA_HOME}"
# --------
# Commands
# --------
RUN yum install -y git-all

# Verify the CUDA installation
RUN echo "CUDA_HOME: ${CUDA_HOME}" && \
nvcc --version

#RUN pip install torch==2.4.1+cu124 --extra-index-url https://download.pytorch.org/whl/cu124
#RUN pip install numpy cmake
RUN pip install torch==${PYTORCH_VERSION}+cu${CUDA_VER_NO_DOT} --extra-index-url https://download.pytorch.org/whl/cu${CUDA_VER_NO_DOT}
RUN pip install numpy cmake

RUN mkdir /workspace

# Add the remove_python.sh script to the container
COPY remove_unused_python.sh /scripts/remove_unused_python.sh
# Make the script executable
RUN chmod +x /scripts/remove_unused_python.sh
# Run the script to remove all Python versions except Python 3.11
RUN /scripts/remove_unused_python.sh python${PYTHON_VER}

# Clean up to reduce image size
#RUN yum clean all && rm -rf /var/cache/yum/*

CMD ["/bin/bash"]
# Default command (bash)
CMD ["/bin/bash"]
9 changes: 9 additions & 0 deletions scripts/manylinux2014_x86_64/remove_unused_python.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

#FindPython is weird and doesn't respect $PATH or even CMAKE $Python_Exec variables.
for python in /usr/local/bin/python*; do
if [[ "$python" != *"$1"* ]]; then
echo "Removing $python";
rm -f "$python";
fi;
done
11 changes: 9 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

__author__ = "Nicholas J. Browning"
__credits__ = "Nicholas J. Browning (2023), https://github.com/nickjbrowning"
__license__ = "MIT"
__license__ = "Academic Software License"
__version__ = "0.1"
__maintainer__ = "Nicholas J. Browning"
__email__ = "nickjbrowning@gmail.com"
Expand Down Expand Up @@ -67,7 +67,7 @@ def run(self):
os.makedirs(build_dir, exist_ok=True)

cmake_options = [
f"-DCMAKE_INSTALL_PREFIX={install_dir}"
f"-DCMAKE_INSTALL_PREFIX={install_dir}",
#f"-DPYTHON_EXECUTABLE={sys.executable}",
]

Expand All @@ -82,6 +82,13 @@ def run(self):
cmake_options.append(f"-DCMAKE_C_FLAGS={ARCHFLAGS}")
cmake_options.append(f"-DCMAKE_CXX_FLAGS={ARCHFLAGS}")

subprocess.run(
["cmake", "--version"],
cwd=build_dir,
check=True,
)

print (["cmake", source_dir, *cmake_options])
subprocess.run(
["cmake", source_dir, *cmake_options],
cwd=build_dir,
Expand Down
Loading