Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: invalid pc-encoded table #310

Closed
OlegElizarov opened this issue Sep 29, 2022 · 12 comments
Closed

Panic: invalid pc-encoded table #310

OlegElizarov opened this issue Sep 29, 2022 · 12 comments
Labels
known-issue This issue is known to us, we are working on it

Comments

@OlegElizarov
Copy link

Hello guys.
We have a problem in our service and sonic appeared in trace. Unfortunately it happens only in production environment so I don't know how to reproduce this behavior.
Go version in our project - 1.18
Sonic version - 1.5.0
Can you help us to analyze the problem? If you need more information - just say it.
If it's not because of sonic than sorry. You project is really amazing and we love it! ❤️

Stack trace:

runtime: invalid pc-encoded table f=google.golang.org/protobuf/internal/impl.(*MessageInfo).checkInitialized-fm pc=0x7d78cf targetpc=0x7d78d3 tab=[0/0]0x0
	value=0 until pc=0x7d786a
	value=32 until pc=0x7d78a2
	value=0 until pc=0x7d78be
	value=32 until pc=0x7d78cf
fatal error: invalid runtime symbol table

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigreturn called from 0x7
stack: frame={sp:0xc00010b380, fp:0xc00010b388} stack=[0xc000104000,0xc00010c000)
0x000000c00010b280:  0x000000c00010b2f8  0x000000000044dd05 <runtime.sigtrampgo+0x00000000000001a5> 
0x000000c00010b290:  0x000000c00000001b  0x000000c000100000 
0x000000c00010b2a0:  0x000000c00010b2b8  0x000000c156c09ba0 
0x000000c00010b2b0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2c0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2d0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2e0:  0x000000c04a7b5a00  0x000000c00010b4b0 
0x000000c00010b2f0:  0x000000c00010b380  0x000000c00010b320 
0x000000c00010b300:  0x000000000046e7ee <runtime.sigtrampgo+0x000000000000002e>  0x000000000000001b 
0x000000c00010b310:  0x000000c00010b4b0  0x000000c00010b380 
0x000000c00010b320:  0x000000c00010b370  0x000000000046dcfd <runtime.sigtramp+0x000000000000003d> 
0x000000c00010b330:  0x000000000000001b  0x000000c00010b4b0 
0x000000c00010b340:  0x000000c00010b380  0x000000c1d13948db 
0x000000c00010b350:  0x0000000000000000  0x0000000000000000 
0x000000c00010b360:  0x00000000000000ef  0x000000c00213b6d8 
0x000000c00010b370:  0x000000c00213b658  0x000000000046de00 <runtime.sigreturn+0x0000000000000000> 
0x000000c00010b380: <0x0000000000000007 >0x0000000000000000 
0x000000c00010b390:  0x000000c000104000  0x0000000000000000 
0x000000c00010b3a0:  0x0000000000008000  0x00000000011a7150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000e290> 
0x000000c00010b3b0:  0x00000000011a9150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x0000000000010290>  0x00000000000ff77d 
0x000000c00010b3c0:  0x000000c00213b6e8  0x0000000000000000 
0x000000c00010b3d0:  0x0000000000000000  0x00000000000000ef 
0x000000c00010b3e0:  0x000000c00213b6d8  0x000000c169519d5b 
0x000000c00010b3f0:  0x00000000000000a4  0x000000c00213b658 
0x000000c00010b400:  0x000000c1d13948db  0x000000c1d1394883 
0x000000c00010b410:  0x0000000000000076  0x00000000000000a4 
0x000000c00010b420:  0x000000c00213b618  0x000000000119afdf <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000211f> 
0x000000c00010b430:  0x0000000000000206  0x002b000000000033 
0x000000c00010b440:  0x0000000000000000  0x0000000000000000 
0x000000c00010b450:  0x0000000000000000  0x0000000000000000 
0x000000c00010b460:  0x000000c00010b540  0x0000000000000000 
0x000000c00010b470:  0x0000000000000000  0x0000000000000000 
0x000000c00010b480:  0x0000000000000000 
runtime.throw({0x1b16e2e?, 0x0?})
	/usr/local/go/src/runtime/panic.go:992 +0x71
runtime.pcvalue({0x283a320?, 0x2ba9320?}, 0x9da15, 0x7d78d3, 0xc00010ad68, 0x1)
	/usr/local/go/src/runtime/symtab.go:963 +0x57a
runtime.funcspdelta({0x283a320?, 0x2ba9320?}, 0x0?, 0x0?)
	/usr/local/go/src/runtime/symtab.go:1038 +0x34
runtime.gentraceback(0x0?, 0xc00010afd0?, 0x3a?, 0x40?, 0x0, 0xc00010afd0, 0x40, 0x0, 0x0?, 0x6)
	/usr/local/go/src/runtime/traceback.go:191 +0x67c
runtime.sigprof(0x119afdf, 0xc00010b270?, 0x0?, 0xc04a7b5a00, 0xc000100000)
	/usr/local/go/src/runtime/proc.go:4507 +0x108
runtime.sighandler(0x1b?, 0xc000100000?, 0xc00010b2b8?, 0xc156c09ba0?)
	/usr/local/go/src/runtime/signal_unix.go:613 +0x5e6
runtime.sigtrampgo(0x1b, 0xc00010b4b0, 0xc00010b380)
	/usr/local/go/src/runtime/signal_unix.go:477 +0x1a5
runtime.sigtrampgo(0x1b, 0xc00010b4b0, 0xc00010b380)
	<autogenerated>:1 +0x2e
runtime.sigtramp()
	/usr/local/go/src/runtime/sys_linux_amd64.s:361 +0x3d
runtime: unexpected return pc for runtime.sigreturn called from 0x7
stack: frame={sp:0xc00010b380, fp:0xc00010b388} stack=[0xc000104000,0xc00010c000)
0x000000c00010b280:  0x000000c00010b2f8  0x000000000044dd05 <runtime.sigtrampgo+0x00000000000001a5> 
0x000000c00010b290:  0x000000c00000001b  0x000000c000100000 
0x000000c00010b2a0:  0x000000c00010b2b8  0x000000c156c09ba0 
0x000000c00010b2b0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2c0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2d0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2e0:  0x000000c04a7b5a00  0x000000c00010b4b0 
0x000000c00010b2f0:  0x000000c00010b380  0x000000c00010b320 
0x000000c00010b300:  0x000000000046e7ee <runtime.sigtrampgo+0x000000000000002e>  0x000000000000001b 
0x000000c00010b310:  0x000000c00010b4b0  0x000000c00010b380 
0x000000c00010b320:  0x000000c00010b370  0x000000000046dcfd <runtime.sigtramp+0x000000000000003d> 
0x000000c00010b330:  0x000000000000001b  0x000000c00010b4b0 
0x000000c00010b340:  0x000000c00010b380  0x000000c1d13948db 
0x000000c00010b350:  0x0000000000000000  0x0000000000000000 
0x000000c00010b360:  0x00000000000000ef  0x000000c00213b6d8 
0x000000c00010b370:  0x000000c00213b658  0x000000000046de00 <runtime.sigreturn+0x0000000000000000> 
0x000000c00010b380: <0x0000000000000007 >0x0000000000000000 
0x000000c00010b390:  0x000000c000104000  0x0000000000000000 
0x000000c00010b3a0:  0x0000000000008000  0x00000000011a7150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000e290> 
0x000000c00010b3b0:  0x00000000011a9150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x0000000000010290>  0x00000000000ff77d 
0x000000c00010b3c0:  0x000000c00213b6e8  0x0000000000000000 
0x000000c00010b3d0:  0x0000000000000000  0x00000000000000ef 
0x000000c00010b3e0:  0x000000c00213b6d8  0x000000c169519d5b 
0x000000c00010b3f0:  0x00000000000000a4  0x000000c00213b658 
0x000000c00010b400:  0x000000c1d13948db  0x000000c1d1394883 
0x000000c00010b410:  0x0000000000000076  0x00000000000000a4 
0x000000c00010b420:  0x000000c00213b618  0x000000000119afdf <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000211f> 
0x000000c00010b430:  0x0000000000000206  0x002b000000000033 
0x000000c00010b440:  0x0000000000000000  0x0000000000000000 
0x000000c00010b450:  0x0000000000000000  0x0000000000000000 
0x000000c00010b460:  0x000000c00010b540  0x0000000000000000 
0x000000c00010b470:  0x0000000000000000  0x0000000000000000 
0x000000c00010b480:  0x0000000000000000 
runtime.sigreturn()
	/usr/local/go/src/runtime/sys_linux_amd64.s:466
@AsterDY
Copy link
Collaborator

AsterDY commented Oct 13, 2022

Can you give me some usage codes? It seems being introduced by a corrupt memory pointer -- Is there any unsafe pointer in your decoded|encoded object?

@OlegElizarov
Copy link
Author

Sonic has widespread usage in our project. So it may be a little bit hard. But let's try.

First , unsafe in our project:

Only this one

// BytesToString ...
func BytesToString(item []byte) string {
	return *(*string)(unsafe.Pointer(&item))
}

About sonic usage, we are using it for marshaling big proto structures and also map[string]interface{} (some values are proto structures)

@AsterDY
Copy link
Collaborator

AsterDY commented Oct 13, 2022

Actually BytesToString is quite an unsafe behavior, maybe the item byte-slice is used somewhere concurrently, or has got freed by GC. You should make sure it is fully secure for sonic once you passed it

@AsterDY AsterDY added the known-issue This issue is known to us, we are working on it label Oct 28, 2022
@OlegElizarov
Copy link
Author

OlegElizarov commented Oct 28, 2022

Hello again 👋🏻
It still panics :(

So we dropped unsafe code from our project. The only unsafe that exists is unsafe in protobuf internals.
Also I can add that this issues appeared in our other project.(unsafe is only in protobuf too)

So could you at least try to research this problem?
I can add more stack traces if you need. Unfortunately, still have no idea how to reproduce it.

@AsterDY
Copy link
Collaborator

AsterDY commented Oct 31, 2022

Can u try to integrate your practical codes as a UT and run go test -race -gcflags=-d=checkptr to check if there is any panic?

@chenzhuoyu
Copy link
Member

@AsterDY It's SIGPROF again, as shown in the stack backtrace.

@chenzhuoyu
Copy link
Member

@OlegElizarov Do you have pprof of some kind of profiler attached to your program? It’s a known issue that Sonic is not very compatible with SIGPROF. We got a few reports, and still working on this.

@OlegElizarov
Copy link
Author

OlegElizarov commented Nov 7, 2022

Hello again. Thank for your help guys.

@AsterDY , we tried to add (-race -gcflags=-d=checkptr ) but there was no problems with sonic in this mode.
Now we are trying to add gotraceback=crash to get coredump but there is a problem that after pc-encoded table panic our pod restarts and we don't know how to save dump now. But we are researching it.

@chenzhuoyu , yep , pprof is always on in our service.(some kind of rule in our corporation ) Is it possible to do something ? (some workaround)

@AsterDY
Copy link
Collaborator

AsterDY commented Nov 15, 2022

Go pprof won't make sonic crash which we have tried... Maybe it's introduce by other kind of profiling

@AsterDY
Copy link
Collaborator

AsterDY commented Feb 9, 2023

At present, since Go ASM doesn't support generating pcsp pcdata for stack-unstable functions, the program will crash if the runtime do gentraceback() while sonic's native func is running. We are trying to resolve this problem but still need a lot of work to do.

@tianxiaogu
Copy link

tianxiaogu commented Feb 9, 2023

The root cause may be the output of asm2asm.py contains many pushq/popq instructions but these instructions are encoded via .byte/.word/.long directives. The Go assembler uses a trivial algorithm to compute PCSP. In my understanding, it does a linear scan of instructions which implicitly requires that all push/pop pairs must be in the same basic block.

  1. https://github.com/golang/go/blob/a4d5fbc3a48b63f19fcd2a4d040a85c75a2709b5/src/cmd/internal/obj/x86/obj6.go#L843
pushq // +8
if <>
  pop   // -8 
else 
  pop   // -8

Since it may be difficult to ask Golang to improve its assembler, a possible solution is to use attribute naked for C functions (perhaps by modifying emitted LLVM IR) and insert C prologue/epilogue manually without using push and pop in asm2asm.py.

  1. https://llvm.org/docs/LangRef.html#id1695

Note that this still cannot prevent LLVM from generating instructions that adjust stacks.

  1. LONG $0xe0e48348 // andq $-32, %rsp

Another potential issue is the stack alignment. In amd64 linux, the stack alignment is 16 byte so that movaps can operate on stack memory operands without stack realignment.

  1. https://leria-info.univ-angers.fr/~jeanmichel.richer/assembly/doc/movaps.pdf
  2. https://gitlab.com/x86-psABIs/x86-64-ABI
  3. https://github.com/llvm/llvm-project/blob/e5906f64a63d064e6a9ea2c46ebc9c285ca02bd1/llvm/lib/Target/X86/X86Subtarget.cpp#L288

The weird thing is that Go aligns stack to 8 bytes on amd64.

  1. https://github.com/golang/go/blob/master/src/cmd/compile/abi-internal.md#stack-layout

Go realigns the stack when making call to c via cgo.

  1. https://github.com/golang/go/blob/a4d5fbc3a48b63f19fcd2a4d040a85c75a2709b5/src/runtime/asm_amd64.s#L797

Fortunately, there are only a few potential dangerous instructions in native_amd64.s, which are also guarded by stack realignment.

  1. LONG $0x4c297cc5; WORD $0x4024 // vmovaps %ymm9, $64(%rsp)
  2. LONG $0xe0e48348 // andq $-32, %rsp

@liuq19
Copy link
Collaborator

liuq19 commented Jun 19, 2024

fixed in #644

@liuq19 liuq19 closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
known-issue This issue is known to us, we are working on it
Projects
None yet
Development

No branches or pull requests

5 participants