-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kosmos-2.5 - Deployment Challenges on a Windows 11 + RTX 3090 PC #1596
Comments
After the above, I'm able to run Kosmos-2.5 on a single RTX3090 Windows 11 PC via WSL-Ubuntu. I created (and am closing) this issue in case anyone is facing the same errors and thus may benefit from my experience above. If you're facing any Python package issues, Python v3.10.12 worked for me and my entire requirements.txt looks as below. Feel free to reach out if you think I can help in any way:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I've spent a greater part of the last ten days trying to get the Kosmos-2.5 model working on my Windows 11 PC, relevant specs below:
This proved ridiculously impossible despite following the elaborate (/s) steps mentioned in the Kosmos-2.5 repo, and I ran around in circles trying to fix this. Turns out this model is at the moment EXTREMELY temperamental to the software environment and Python v3.11 causes many, many issues, and one must stick to v3.10.x.
Devs, I REALLY wish you'd mentioned this in the Kosmos repo! Since PyTorch & FlashAttention2 have no issues with v3.11, I didn't think Kosmos would either given it's not mentioned anywhere!
Turns out, sticking to the default v3.10.12 of WSL-Ubuntu works, but figuring this out was quite the journey. Sharing it below as well as all the steps that worked in case it may help someone facing the same issues.
Amongst the many errors I faced were (DO NOT TRY ANY OF THE RESOLUTIONS IN THIS SECTION, THEY'RE SHARED FOR REFERENCE ONLY. THE SOLUTION IS IN THE SECTION THAT FOLLOWS THIS ONE):
ImportError: cannot import name II form omegaconf
Resolutions tried :
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
Resolutions tried:
omegaconf.errors.ConfigAttributeError: Missing key seed full_key: common.seed object_type=dict
Resolution tried:
'seed': 42
to the init args of inference.py as per adviseValueError: Default process group has not been initialized, please make sure to call init_process_group.
Resolution tried:
a. To inference.py, added
import torch.distributed as dist
to imports andto
init()
beforeuse_cuda = True
b. To gpt.py, added the below to the
build_model
method of theGPTmodel
class:c. Ran with environment variables:
...which then lead to:
RuntimeError: use_libuv was requested but PyTorch was build without libuv support
Resolution tried:
$env:USE_LIBUV = "0"
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not NoneType
This then led to a host of modifications to the .py files which led to messes best forgotten. So anyways...
TURNS OUT THE ISSUE WAS THE PYTHON VERSION 3.11.x ALL ALONG! PLEASE STICK TO 3.10.x!
SHARING MY WORKING WINDOWS 11 WSL SETUP BELOW:
Make sure Nvidia GPU drivers & CUDA (I used v12.4) are installed in the host Windows 11 system
Update if not
Now open a WSL-terminal by typing
wsl
in the Start Menu or a Command Prompt, or by searching forUbuntu
in the Start MenuInstall CUDA Toolkit v12.4.1:
git clone
andcd
repo:2.5.9.post1
):The text was updated successfully, but these errors were encountered: