Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentencepiece==0.1.92 seems breaking something #505

Closed
boy2000-007man opened this issue Jun 9, 2020 · 16 comments
Closed

sentencepiece==0.1.92 seems breaking something #505

boy2000-007man opened this issue Jun 9, 2020 · 16 comments

Comments

@boy2000-007man
Copy link

with newly released sentencepiece==0.1.92

Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import transformers, torch
>>> transformers.__version__
'2.9.1'
>>> torch.__version__
'1.4.0'
>>> torch.rand(3)
Segmentation fault (core dumped)

However, downgrade to sentencepiece==0.1.91 solves this issue

@taku910
Copy link
Collaborator

taku910 commented Jun 9, 2020

Thank you. I cannot reproduce this error. Can you take the stack trace or result of strace?

I will temporarlly disable 0.19.2 release just in case. Thank you

@taku910
Copy link
Collaborator

taku910 commented Jun 9, 2020

Downgraded.

@boy2000-007man
Copy link
Author

Thanks for your prompt action.
I can still repro within a new virtualenv with only transformers==2.9.1 and torch==1.4.0 installed

(env) xug@XUG:~$ python -c "import transformers,torch; torch.rand(3)"
(env) xug@XUG:~$ pip install sentencepiece==0.1.92
WARNING: The candidate selected for download or install is a yanked version: 'sentencepiece' candidate (version 0.1.92 at https://files.pythonhosted.org/packages/68/e5/0366f50a00db181f4b7f3bdc408fc7c4177657f5bf45cb799b79fb4ce15c/sentencepiece-0.1.92-cp36-cp36m-manylinux1_x86_64.whl#sha256=7fd16c761339f593596b63e50810a2d2eff964d428ab79a49674c7371c055561 (from https://pypi.org/simple/sentencepiece/))
Reason for being yanked: Crash bug is reported (confirming)
Collecting sentencepiece==0.1.92
  Using cached sentencepiece-0.1.92-cp36-cp36m-manylinux1_x86_64.whl (1.2 MB)
Installing collected packages: sentencepiece
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.91
    Uninstalling sentencepiece-0.1.91:
      Successfully uninstalled sentencepiece-0.1.91
Successfully installed sentencepiece-0.1.92
(env) xug@XUG:~$ python -c "import transformers,torch; torch.rand(3)"
Segmentation fault (core dumped)

Last time I see a similar issue in scipy/scipy#11237.
Not sure what additional information you need to debug.

@corynezin
Copy link

0.1.92 is also causing segfaults for me during training. (Haven't had time to track down the exact line)

@starplanet
Copy link

+1

@hugoperrin
Copy link

I can confirm, I have been having the exact same issue with version 0.1.92 which I don't with 0.1.91.
In my case, I'm having a segmentation fault on pytorch (1.3.1) and transformers (2.11.0) torch.nn.Module init. Using faulthandler I managed to get the details that it creates a seg fault at the following:

Current thread 0x00007fa347061280 (most recent call first):
  File ".../lib/python3.7/site-packages/torch/nn/modules/module.py", line 74 in __init__
  File ".../lib/python3.7/site-packages/torch/nn/modules/loss.py", line 10 in __init__
  File ".../lib/python3.7/site-packages/torch/nn/modules/loss.py", line 593 in __init__

That line happens to be:

torch._C._log_api_usage_once("python.nn_module")

I suspect something in the sentencepiece new version prevent torch binaries from normally register its resources.

Downgrading to version 0.1.91 fixes the issue for me as well.

@BillDorn
Copy link

I'm also seeing segfault. Here is gdb output if that helps gist.

@taku910
Copy link
Collaborator

taku910 commented Jun 16, 2020

Thank you for the investigation.

We've upgraded internal protobuf version in 0.19.2. Not sure the root cause, but I beleve that this causes the crash. I wil rollback this protobuf version and release 0.19.3 soon.

@summacod
Copy link

I had this issue while building tensorflow-text from source, i tried to build mainly to see what was breaking with sentencepiece, proto buffer versions seems to be messing up.

ERROR: /home/XXXX/Downloads/text/tensorflow_text/core/kernels/BUILD:310:18: C++ compilation of rule '//tensorflow_text/core/kernels:sentencepiece_kernels' failed (Exit 1)
In file included from tensorflow_text/core/kernels/sentencepiece_kernels.cc:23:
bazel-out/k8-opt/bin/external/com_google_sentencepiece/src/sentencepiece.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
17 | #error This file was generated by an older version of protoc which is
| ^~~~~
bazel-out/k8-opt/bin/external/com_google_sentencepiece/src/sentencepiece.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
18 | #error incompatible with your Protocol Buffer headers. Please
| ^~~~~
bazel-out/k8-opt/bin/external/com_google_sentencepiece/src/sentencepiece.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.
19 | #error regenerate this file with a newer version of protoc.
| ^~~~~
Target //oss_scripts/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 4.338s, Critical Path: 4.10s
INFO: 5 processes: 5 local.
FAILED: Build did NOT complete successfully

@taku910
Copy link
Collaborator

taku910 commented Jun 18, 2020

Seems that protobuf is not a root cause. The error persits after downloading protobuf.
Here's the stack trace

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fff00000000 in ?? ()
(gdb) where
#0  0x00007fff00000000 in ?? ()
#1  0x00007fff8a4b9b59 in c10::detail::LogAPIUsageFakeReturn(std::string const&) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libc10.so
#2  0x00007fff8a4ab715 in c10::TensorImpl::TensorImpl(c10::Storage&&, c10::TensorTypeSet, caffe2::TypeMeta const&, c10::optional<c10::Device>) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libc10.so
#3  0x00007fff8a4ac1be in c10::TensorImpl::TensorImpl(c10::Storage&&, c10::TensorTypeSet) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libc10.so
#4  0x00007fff8c5de204 in at::Tensor at::detail::make_tensor<c10::TensorImpl, c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::TensorTypeId>(c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >&&, c10::TensorTypeId&&) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#5  0x00007fff8c5cc85f in at::native::empty_cpu(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#6  0x00007fff8c81530b in at::CPUType::(anonymous namespace)::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#7  0x00007fff8c85cb37 in c10::detail::wrap_kernel_functor_unboxed_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>), at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#8  0x00007fff8c5cd0b0 in at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#9  0x00007fff8c5d6f0d in at::native::rand(c10::ArrayRef<long>, at::Generator*, c10::TensorOptions const&) ()
   from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#10 0x00007fff8c5d6fbe in at::native::rand(c10::ArrayRef<long>, c10::TensorOptions const&) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#11 0x00007fff8c90f283 in at::TypeDefault::rand(c10::ArrayRef<long>, c10::TensorOptions const&) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#12 0x00007fff8c954709 in c10::detail::wrap_kernel_functor_unboxed_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(c10::ArrayRef<long>, c10::TensorOptions const&), at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<long>, c10::TensorOptions const&> >, at::Tensor (c10::ArrayRef<long>, c10::TensorOptions const&)>::call(c10::OperatorKernel*, c10::ArrayRef<long>, c10::TensorOptions const&) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch.so
#13 0x00007fffd55473e3 in torch::rand(c10::ArrayRef<long>, c10::TensorOptions const&) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#14 0x00007fffd53810d4 in torch::autograd::THPVariable_rand(_object*, _object*, _object*) () from /usr/local/google/home/taku/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#15 0x00000000005cbe1b in _PyMethodDef_RawFastCallKeywords ()
#16 0x0000000000538838 in ?? ()
#17 0x000000000053f4d7 in _PyEval_EvalFrameDefault ()
#18 0x00000000005393b6 in _PyEval_EvalCodeWithName ()
#19 0x000000000064ead3 in PyEval_EvalCode ()
#20 0x00000000006420d3 in ?? ()
#21 0x00000000006422db in PyRun_StringFlags ()
#22 0x0000000000642bfb in PyRun_SimpleStringFlags ()
#23 0x000000000067af07 in ?? ()
#24 0x000000000067afee in _Py_UnixMain ()
#25 0x00007ffff7e08e0b in __libc_start_main (main=0x4d1210 <main>, argc=3, argv=0x7fffffffdf48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdf38) at ../csu/libc-start.c:308
#26 0x00000000005d1dda in _start ()

@guillaumekln
Copy link
Contributor

Is it the same as pytorch/pytorch#8358? This looks like another case of incompatibility with PyTorch which releases non standard manylinux1 packages, not SentencePiece fault.

Sometimes the import order helps working around the issue:

$ python
>>> import torch
>>> import sentencepiece
>>> torch.nn.Module()
Segmentation fault
$ python
>>> import sentencepiece
>>> import torch
>>> torch.nn.Module()
Module()

@summacod
Copy link

summacod commented Jun 19, 2020 via email

@summacod
Copy link

summacod commented Jun 21, 2020 via email

@taku910
Copy link
Collaborator

taku910 commented Oct 14, 2020

Seems we can't reproduce this bug after torch>=1.5.0. Let me close this bug. If you reproduce the errors, please let me know.

@wassimseif
Copy link

Not sure how related this is but it might help.
installing & importing transformers would cause a segmentation fault. I fixed it by pip install sentencepiece==0.1.91.

Here's my stack trace

Screen Shot 2020-11-08 at 11 42 52 AM

Screen Shot 2020-11-08 at 11 44 54 AM

@taku910 taku910 reopened this Nov 13, 2020
@taku910
Copy link
Collaborator

taku910 commented Jan 8, 2021

Not reproduced on sentencepiece v0.1.95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants