Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Actions build error with torch 1.12 on macOS #300

Closed
davidmezzetti opened this issue Jul 1, 2022 · 12 comments
Closed

GitHub Actions build error with torch 1.12 on macOS #300

davidmezzetti opened this issue Jul 1, 2022 · 12 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@davidmezzetti
Copy link
Member

Segmentation faults are occurring with torch 1.12 and at least macOS. See this PR on transformers for more info - huggingface/transformers#17925

@davidmezzetti davidmezzetti added this to the v4.6.0 milestone Jul 1, 2022
@davidmezzetti davidmezzetti self-assigned this Jul 1, 2022
@davidmezzetti davidmezzetti added the bug Something isn't working label Jul 13, 2022
@davidmezzetti
Copy link
Member Author

Update build.yml to unpin torch==1.11.0 when this is complete.

@davidmezzetti
Copy link
Member Author

davidmezzetti commented Jul 29, 2022

PyTorch just tagged 1.12.1-rc3. Hopefully 1.12.1 is released soon.

Link to milestone - https://github.com/pytorch/pytorch/milestone/30

@davidmezzetti
Copy link
Member Author

davidmezzetti commented Aug 15, 2022

macOS builds are getting a Segmentation fault: 11 error code with torch==1.12.1. This issue will remain open but this appears to be a GitHub Actions specific issue just for macOS. For now, macOS builds will be pinned to torch==1.11.0.

@davidmezzetti davidmezzetti removed the bug Something isn't working label Aug 15, 2022
@davidmezzetti davidmezzetti changed the title Runtime errors occuring with torch 1.12 GitHub Actions build errors occuring with torch 1.12 on macOS Aug 15, 2022
@davidmezzetti davidmezzetti changed the title GitHub Actions build errors occuring with torch 1.12 on macOS GitHub Actions build error with torch 1.12 on macOS Aug 15, 2022
@davidmezzetti davidmezzetti added this to the v5.1.0 milestone Oct 10, 2022
@davidmezzetti
Copy link
Member Author

This issue is localized to macOS Python 3.7. Python 3.8+ works fine on macOS. Added troubleshooting note here: https://neuml.github.io/txtai/install/#environment-specific-prerequisites

@mhgrove
Copy link

mhgrove commented Jan 18, 2023

fwiw, I hit this error w/ Python 3.9.5 on OSX 12.5 using torch==1.13.1, I changed to 1.11.0 and I got past the initial issue that was occurring when executing embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})

The first time the seg fault happened, it looked as if it was in the middle of some (2) downloads, presumably of those embeddings. After the torch downgrade, 4 more downloads completed successfully. But I hit a different error.

I am trying to run the first code snippet in this example.

When I get to searching the embeddings, it fails and I see this on the console:

OMP: Error #15: Initializing libomp.a, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
Abort trap: 6

@davidmezzetti
Copy link
Member Author

I appreciate you sharing this info as it's the most I've seen to date. In doing a few searches, a lot of links point to mkl. What does pip show mkl return? Did you try running Python with the env variable KMP_DUPLICATE_LIB_OK=TRUE? Curious if that is the only problem or another pops up.

I've assumed the issue, at least with the GitHub Actions build, is a package conflict but I'm not a mac user, so I've been limited in what I can debug.

@mhgrove
Copy link

mhgrove commented Jan 20, 2023

No problem, I'm glad it's helpful.

$ pip3 show mkl
WARNING: Package(s) not found: mkl

So, nothing. And yeah, I did try with that env variable set and it did not change what happened, it fails with that abort trap 6 thing.

I am behind on both python and osx versions. I think i can rustle up another laptop and update macos and try with that to see if it helps.

@davidmezzetti
Copy link
Member Author

Appreciate it, anything you can share would be beneficial.

How did you install txtai? Was it a simple pip install or were optional dependencies installed? fasttext has caused issues on macOS in the past.

@mhgrove
Copy link

mhgrove commented Jan 23, 2023

It was just the pip install. I'm new to python, so i was going for easy :)

@mhgrove
Copy link

mhgrove commented Jan 26, 2023

To close the loop on this, I tried the same code on a different machine with macos 12.6.2 and that ran the example just fine. Also used a simple pip install in that case, there were the following warnings, but everything worked in the simple case.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy-transformers 1.1.7 requires transformers<4.21.0,>=3.4.0, but you have transformers 4.26.0 which is incompatible.
cached-path 1.1.2 requires huggingface-hub<0.6.0,>=0.0.12, but you have huggingface-hub 0.12.0 which is incompatible.
allennlp 2.9.3 requires spacy<3.3,>=2.1.0, but you have spacy 3.4.4 which is incompatible.
allennlp 2.9.3 requires transformers<4.19,>=4.1, but you have transformers 4.26.0 which is incompatible.

When I moved that exact code over to my program, I started getting different segfault the first time i called Embeddings#index. Stripping down my code eventually let me to realize that something that was coming in with spacy was causing the problem. Once I moved all of the spacy related code to a different file, everything worked.

I switched back over to the stand alone example and added a simple import spacy, but that didnt trigger the segfault.

Long story short, works fine on a new OSX and with a minimal program, but there might be some deeper conflict w/ some library shared w/ spacy.

@davidmezzetti
Copy link
Member Author

davidmezzetti commented Jan 27, 2023

The message below you provided previously is the best hint yet, thank you for that!

OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/

This issue discusses a conflict between Faiss and PyTorch that appears relevant. This PR also seems relevant.

Also tried fix in this issue with no success.

#377 is the current open txtai issue tracking this.

@mhgrove
Copy link

mhgrove commented Jan 27, 2023

Oh, glad it was helpful! I've had a lot of fun playing with the software since I got things running, so thanks for that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants