Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation with ResNet fails #103

Open
daecheolyou opened this issue Oct 20, 2021 · 7 comments
Open

Simulation with ResNet fails #103

daecheolyou opened this issue Oct 20, 2021 · 7 comments
Assignees

Comments

@daecheolyou
Copy link

During simulation with ResNet, a segmentation fault occurs at gem5.
I created ResNet pb and pbtxt file by running smaug/experiments/models/imagenet-resnet/resnet_network.py
All configuration files are the same with minerva example, but only model_files was modfied so that it points to generated pb and pbtxt file.
Input trace was generated by running trace.sh

Below is the stdout log at the end.

Scheduling data (Data).
Scheduling data_1 (Data).
Scheduling data_10 (Data).
Scheduling data_100 (Data).
Scheduling data_101 (Data).
Scheduling data_102 (Data).
Scheduling data_103 (Data).
Scheduling data_104 (Data).
Scheduling data_105 (Data).
Scheduling data_106 (Data).
Scheduling data_107 (Data).
Scheduling data_108 (Data).
Scheduling data_109 (Data).

stderr log before the backtrace shows the following message.

gem5 has encountered a segmentation fault!

Please, let me know if I configured something wrong.
Thanks.

@xyzsam
Copy link
Member

xyzsam commented Oct 21, 2021

Yuan, can you take a look at this?

@yaoyuannnn
Copy link
Member

yaoyuannnn commented Oct 21, 2021

Yes, will take a look this week.

@yaoyuannnn
Copy link
Member

Just a guess, did you update trace_file_name in gem5.cfg to use the correct trace file?

@daecheolyou
Copy link
Author

It doesn't need to be modified, but model_files was modified so that it points to pbtxt and pb file under imagenet-resnet. Trace file was generated with trace.sh, whose input is model_files and output file name is always dynamic_trace_acc0.gz.

@yaoyuannnn
Copy link
Member

I just tried running resnet50, while it's still running but it has started running the accelerator for the first convolution layer (conv0), which clearly passed the point where your simulation crashed. In order to reduce the trace size for this relatively large network, the only different I made was using --sample-level=very_high in trash.sh (the same in run.sh). And other than updating the protobuf inputs, the rest of the configuration files are the same as the ones in sims/smv/tests/minerva.

@xyzsam
Copy link
Member

xyzsam commented Oct 26, 2021

Did the simulator leave any stacktraces indicating where the segfault occurred?

@daecheolyou
Copy link
Author

daecheolyou commented Oct 26, 2021

Below is the stack trace for the simulation failure.
I ran simulation several times with resnet, and sometimes it reached further than the log I originally posted.
For example, it has reached until Scheduling relu2_b (ReLU).
However, it encountered a segmentaion fault eventually with the same kind of stack trace below.

/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z15print_backtracev+0x2c)[0x55a3fb5e722c]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(+0x6e92ff)[0x55a3fb5f92ff]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f8073fc9890]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0xcf)[0x7f80725f6d9f]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN6X86ISA7Decoder10decodeInstENS_11ExtMachInstE+0x2e6c1)[0x55a3fc00f141]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN6X86ISA7Decoder6decodeENS_11ExtMachInstEm+0x244)[0x55a3fbfa88f4]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN6X86ISA7Decoder6decodeERNS_7PCStateE+0x22b)[0x55a3fbfa8beb]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN12DefaultFetchI9O3CPUImplE5fetchERb+0x979)[0x55a3fbb0eb69]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN12DefaultFetchI9O3CPUImplE4tickEv+0xd3)[0x55a3fbb0fe23]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN9FullO3CPUI9O3CPUImplE4tickEv+0x12b)[0x55a3fbaedb3b]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xd9)[0x55a3fb5ef709]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x148)[0x55a3fb610e28]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z8simulatem+0xcba)[0x55a3fb611dda]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(+0x7bf6d1)[0x55a3fb6cf6d1]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(+0x5e8754)[0x55a3fb4f8754]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x64d7)[0x7f8074276c47]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f80742705d9]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6ac0)[0x7f8074277230]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5bf6)[0x7f8074276366]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7d8)[0x7f80743b5908]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f80742705d9]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x76)[0x7f80743206f6]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(_Z6m5MainiPPc+0x83)[0x55a3fb5f8013]
/workspace/gem5-aladdin/src/aladdin/../../build/X86/gem5.opt(main+0x38)[0x55a3fb448e08]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants