Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for windows_cuda #19

Conversation

regro-cf-autotick-bot
Copy link
Contributor

This PR has been triggered in an effort to update windows_cuda.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.

This package has the following downstream children:

And potentially more.

If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot.
The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. If you would like a local version of this bot, you might consider using rever. Rever is a tool for automating software releases and forms the backbone of the bot's conda-forge PRing capability. Rever is both conda (conda install -c conda-forge rever) and pip (pip install re-ver) installable.
Finally, feel free to drop us a line if there are any issues!
This PR was generated by https://github.com/regro/autotick-bot/actions/runs/379456304, please use this URL for debugging

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member

Requires windows builds to make any sense, therefore blocked on #17.

@h-vetinari h-vetinari changed the title Rebuild for windows_cuda [Pending #17] Rebuild for windows_cuda Dec 9, 2020
@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from 353c8b7 to 7ce24cc Compare December 9, 2020 21:35
@h-vetinari
Copy link
Member

@jaimergp I saw you added the windows packaging for nvcc: conda-forge/nvcc-feedstock#49

Not sure if I need to change something else, but it's not working out of the box here... Even though I see

"Exporting and adding $CUDA_PATH ('C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0') to $PATH"

in the conda-forge build setup, this variable is then apparently empty by the time we get around to activate.bat:

(D:\bld\faiss-split_1607560645796\_h_env) (base) D:\bld\faiss-split_1607560645796\work>call "C:\Miniconda\Scripts\..\condabin\conda.bat" activate --stack "D:\bld\faiss-split_1607560645796\_build_env" 

(base) D:\bld\faiss-split_1607560645796\work>if defined CUDA_HOME (set "CUDA_HOME_CONDA_NVCC_BACKUP=" ) 

(base) D:\bld\faiss-split_1607560645796\work>if defined CUDA_PATH (
set "CUDA_PATH_CONDA_NVCC_BACKUP="  
 if not defined CUDA_HOME (set "CUDA_HOME=" ) 
) 

(base) D:\bld\faiss-split_1607560645796\work>if defined CFLAGS (set "CFLAGS_CONDA_NVCC_BACKUP=" ) 

(base) D:\bld\faiss-split_1607560645796\work>if defined CPPFLAGS (set "CPPFLAGS_CONDA_NVCC_BACKUP=" ) 

(base) D:\bld\faiss-split_1607560645796\work>if defined CXXFLAGS (set "CXXFLAGS_CONDA_NVCC_BACKUP=" ) 

(base) D:\bld\faiss-split_1607560645796\work>if not defined CUDA_PATH (
for /F "usebackq tokens=*" %a in (`where nvcc.exe`) do set "CUDA_NVCC_EXECUTABLE=%a"   || goto :error  
 if "" == "" (
echo "Cannot determine CUDA_PATH: nvcc.exe not in PATH"  
 exit /b 1 
)  else (for /F "usebackq tokens=*" %a in (`python -c "from pathlib import Path; print(Path('').parents[1])"`) do set "CUDA_PATH=%a"   || goto :error ) 
) 
INFO: Could not find files for the given pattern(s).
"Cannot determine CUDA_PATH: nvcc.exe not in PATH"

@jaimergp
Copy link
Member

@h-vetinari Yes, I saw this happen in other recipe too. Tracking here: conda-forge/nvcc-feedstock#53. Please add a comment there pointing to this PR so I have more data points. I should have some free time next weeks and will look into this.

I have tried a different variable export mechanism in this PR conda-forge/conda-forge-ci-setup-feedstock#131, using the activation scripts of conda-forge-ci-setup, but it didn't make a difference for prismatic.

I am not sure why this doesn't work here either, but a thing to notice is that both this repo and prismatic are multi-output recipes. It does work in single output recipes like OpenMM, though 🤷 I wonder if this is an issue at the conda build level, where some env variables do not survive long enough to reach here.

The workaround for now is, unfortunately, setting %CUDA_PATH% manually to:

set "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%cuda_compiler_version%"

@h-vetinari
Copy link
Member

The workaround for now is, unfortunately, setting %CUDA_PATH% manually to:

set "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%cuda_compiler_version%"

@jaimergp Thanks for the info & workaround, but I have no idea how to provide it. The builds-scripts are too late (at least for the nvcc activation script), and setting it in azure-pipelines-win.yml didn't work either...

@jaimergp
Copy link
Member

Since nvcc activation is failing in this recipe, we might need to bypass it for now.

What's the manual way of telling CMake where to find CUDA in this project? I normally need to set up CUDA_TOOLKIT_ROOT_DIR (with forward slashes): https://github.com/conda-forge/openmm-feedstock/blob/master/recipe/bld.bat#L10

You'll also need these lines: https://github.com/conda-forge/nvcc-feedstock/blob/master/recipe/windows/activate.bat#L59-L61

and make nvcc use -ccbin: https://github.com/conda-forge/nvcc-feedstock/blob/master/recipe/windows/nvcc_windows.bat

These are a lot of "workarounds" needed so you can also choose to wait a bit until I debug the multi-output problem with the activation script (hopefully before the year ends!).

@h-vetinari
Copy link
Member

These are a lot of "workarounds" needed so you can also choose to wait a bit until I debug the multi-output problem with the activation script (hopefully before the year ends!).

Thanks for the quick reply! I don't mind adding work-arounds (if we're talking about a handful of lines) and removing them once fixed. I actually just realized that I had not successfully committed the changes in azure-pipelines-win.yml (due to doing git add . in /recipe 🤦‍♂️), but anyway, that would have not survived a re-render, so I prefer your approach.

I'm hoping that cmake is smart enough to add the -ccbin, at least it is on linux.

@jaimergp
Copy link
Member

I have included a fallback mechanism that will try to look for CUDA in the default location, in conda-forge/nvcc-feedstock#52

In the meantime, let's try adding these lines first thing in your bld.bat:

set "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%cuda_compiler_version%"
set "ORIG_PKG_VERSION=%PKG_VERSION%"
set "PKG_VERSION=%cuda_compiler_version%"
call %BUILD_PREFIX%\etc\conda\activate.d\nvcc_activate.bat
set "PKG_VERSION=%ORIG_PKG_VERSION%"
set "ORIG_PKG_VERSION="

Maybe it's PREFIX instead of BUILD_PREFIX but you'll see :)

@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from fb391a4 to 12281d2 Compare December 10, 2020 14:47
@jaimergp
Copy link
Member

A new nvcc wrapper was released so you might be lucky now. Otherwise... well, I'll try to help you next week by debugging on my Windows machine. Sorry for all the trouble!

@h-vetinari
Copy link
Member

A new nvcc wrapper was released so you might be lucky now. Otherwise... well, I'll try to help you next week by debugging on my Windows machine. Sorry for all the trouble!

To the contrary, thank you for your help on this! :)

@h-vetinari
Copy link
Member

@jaimergp
Sounds like we made a step forward, but that some quotes are missing around the include-path, and it gets split by spaces into things the compiler doesn't know anything about:

NVIDIA
      cl /c /IC:\Program /Zi /W1 /WX- /diagnostics:classic /Od /Ob0 /D WIN32 /D _WINDOWS /D "CMAKE_INTDIR=\"Debug\"" /D _MBCS /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Qspectre /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"cmTC_b6410.dir\Debug\\" /Fd"cmTC_b6410.dir\Debug\vc141.pdb" /Gd /TP /errorReport:queue  Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include "D:\bld\faiss-split_1607680204242\work\_build\CMakeFiles\CMakeTmp\testCXXCompiler.cxx"
    c1xx : fatal error C1083: Cannot open source file: 'Files\NVIDIA': No such file or directory [D:\bld\faiss-split_1607680204242\work\_build\CMakeFiles\CMakeTmp\cmTC_b6410.vcxproj]
      GPU
      
    c1xx : fatal error C1083: Cannot open source file: 'GPU': No such file or directory [D:\bld\faiss-split_1607680204242\work\_build\CMakeFiles\CMakeTmp\cmTC_b6410.vcxproj]
      Computing
    c1xx : fatal error C1083: Cannot open source file: 'Computing': No such file or directory [D:\bld\faiss-split_1607680204242\work\_build\CMakeFiles\CMakeTmp\cmTC_b6410.vcxproj]
      include
    c1xx : fatal error C1083: Cannot open source file: 'Toolkit\CUDA\v10.0\include': No such file or directory [D:\bld\faiss-split_1607680204242\work\_build\CMakeFiles\CMakeTmp\cmTC_b6410.vcxproj]
      testCXXCompiler.cxx

@jaimergp
Copy link
Member

Maybe we need some kind of quotes here, but this should have been caught by the tests 🤔

I'll need to open a new PR for this. If you want you can try to define CFLAGS and friends manually, using the example output in cuda_compiler_versionNone, and see where that gets you.

@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from 30e0c5f to 28a77f4 Compare December 12, 2020 08:29
@h-vetinari h-vetinari changed the title [Pending #17] Rebuild for windows_cuda Rebuild for windows_cuda Dec 12, 2020
@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from 28a77f4 to ad7b695 Compare December 12, 2020 15:16
@h-vetinari h-vetinari closed this Dec 14, 2020
@h-vetinari h-vetinari reopened this Dec 14, 2020
@h-vetinari
Copy link
Member

Seems I started this too closely on the heels of conda-forge/nvcc-feedstock#57 being merged - the builds here still picked up the old build nvcc (9 instead of 10) - let's try again...

@h-vetinari h-vetinari closed this Dec 14, 2020
@h-vetinari h-vetinari reopened this Dec 14, 2020
@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from 7bfb6d4 to d7d1e47 Compare December 15, 2020 08:05
@h-vetinari
Copy link
Member

It appears that this traces back to cudaGetDeviceCount from cuda_profiler_api.h, which seems to segfault on windows (but not linux) when there are no actual GPUs. @kkraus14, would you be able to comment?

@jaimergp
Copy link
Member

Take into account that the CUDA installation on Windows lacks the drivers, since the Nvidia installer refuses to install them due to the missing GPU.

@jaimergp
Copy link
Member

Check how we handle this at OpenMM. We can create packages with the full test suite (which does not run on the CI) and then run them locally.

@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch 4 times, most recently from c5bb9cc to 08e3732 Compare February 17, 2021 22:39
@h-vetinari
Copy link
Member

@jaimergp
This just built successfully for the first time. 🥳

Thanks so much for your help debugging this 🙏 🙃

@h-vetinari h-vetinari added the automerge Merge the PR when CI passes label Feb 17, 2021
@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from 08e3732 to 28466cc Compare February 17, 2021 23:55
@h-vetinari h-vetinari force-pushed the rebuild-windows_cuda-0-1_h69cec1 branch from 28466cc to e2f408d Compare February 18, 2021 00:04
@github-actions github-actions bot merged commit c56a351 into conda-forge:master Feb 18, 2021
@github-actions
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • azure: passed

Thus the PR was passing and merged! Have a great day!

@regro-cf-autotick-bot regro-cf-autotick-bot deleted the rebuild-windows_cuda-0-1_h69cec1 branch February 18, 2021 05:35
@jaimergp
Copy link
Member

Wow, this makes me so glad! Congratulations, this was quite the journey! 🎉

@h-vetinari
Copy link
Member

h-vetinari commented Feb 18, 2021

Thanks for the kind words!

Journey isn't over yet, unfortunately - ran the gpu-specific test suite locally and it has a bunch of segfaults 😒
Opened a PR to mark them broken, though I really should have tested the builds from artefacts of the PR directly... 🤦‍♂️

@jaimergp
Copy link
Member

Soon...

@h-vetinari h-vetinari mentioned this pull request Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants