Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaningful error message required rather than core dump #109

Closed
tjolliffe opened this issue Nov 9, 2017 · 9 comments
Closed

Meaningful error message required rather than core dump #109

tjolliffe opened this issue Nov 9, 2017 · 9 comments
Assignees
Labels

Comments

@tjolliffe
Copy link

Hi Richard,
I tried to use siegfried on a zip file which was too large to process (46gb in size).
Siegfried attempted to load the whole zip into memory and failed, displaying the message below.
An out-of-memory error message would be better than a core dump in this instance.

Y:\XXXX\Converted videos, film\Consignment AV>sf -z -csv "Consignment AV.zip"

\XXXX\Working\XXXX_ConsignmentAV_sf.csv
Exception 0xc0000006 0x0 0x440629dfc 0x4581f5
PC=0x4581f5

github.com/richardlehane/siegfried/internal/siegreader.(*Reader).ReadAt(0xc04253
68c0, 0xc0426345e6, 0x7010, 0x7a1a, 0x3c0632e6c, 0xa36330, 0x3c655e8, 0xd0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea
der.go:123 +0xd6
io.(*SectionReader).Read(0xc045cfa2a0, 0xc0426345e6, 0x7010, 0x7a1a, 0xc045c9f26
0, 0xc04201b080, 0xc045c9f260)
C:/go/src/io/io.go:465 +0x83
bufio.(*Reader).Read(0xc045b8d4a0, 0xc0426345e6, 0x7010, 0x7a1a, 0x5e6, 0x0, 0x0
)
C:/go/src/bufio/bufio.go:199 +0x1aa
io.ReadAtLeast(0x9c49a0, 0xc045b8d4a0, 0xc042634000, 0x75f6, 0x8000, 0x75f6, 0x7
d3b80, 0xc045c9f300, 0x9c49a0)
C:/go/src/io/io.go:309 +0x8d
io.ReadFull(0x9c49a0, 0xc045b8d4a0, 0xc042634000, 0x75f6, 0x8000, 0xc045c9f3b0,
0x411a6d, 0xc042046018)
C:/go/src/io/io.go:327 +0x5f
compress/flate.(*decompressor).copyData(0xc04257d300)
C:/go/src/compress/flate/inflate.go:663 +0xf5
compress/flate.(*decompressor).Read(0xc04257d300, 0xc1c5cd6000, 0x1000, 0x800000
00, 0x0, 0x100000000, 0xc145cd6000)
C:/go/src/compress/flate/inflate.go:347 +0x79
archive/zip.(*pooledFlateReader).Read(0xc045cf66a0, 0xc1c5cd6000, 0x1000, 0x8000
0000, 0x0, 0x0, 0x0)
C:/go/src/archive/zip/register.go:90 +0x139
archive/zip.(*checksumReader).Read(0xc045bbee10, 0xc1c5cd6000, 0x1000, 0x8000000
0, 0x100000000, 0x100000000, 0x0)
C:/go/src/archive/zip/reader.go:194 +0x7f
github.com/richardlehane/siegfried/internal/siegreader.(*stream).fill(0xc0422316
80, 0x80000000, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str
eam.go:76 +0xd3
github.com/richardlehane/siegfried/internal/siegreader.(*stream).CanSeek(0xc0422
31680, 0x0, 0xc042221001, 0xc045d00400, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str
eam.go:155 +0x229
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).identify(0xc0
4207d8c0, 0xc045cf66c0, 0xc045cf2c00, 0xc045cf2c60, 0xc044701b48, 0x0, 0x1)
c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/id
entify.go:96 +0x15d6
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Id
entify
c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/by
tematcher.go:173 +0xd3

goroutine 1 [chan receive, 3 minutes]:
github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc04207d970, 0xc
045cf66c0, 0x0, 0x0, 0xc045bc0ee0, 0x70, 0x0, 0x0, 0xc04202d150, 0x47d227, ...)
c:/gopath/src/github.com/richardlehane/siegfried/siegfried.go:385 +0x103
4
main.identifyRdr(0x3c30030, 0xc045bbee10, 0xc045be0000, 0xc042034fc0, 0x809918)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:205 +0x127

main.identifyRdr(0x9c5960, 0xc0424386c8, 0xc04239e580, 0xc042034fc0, 0x809918)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:266 +0x59f

main.readFile(0xc04239e580, 0xc042034fc0, 0x809918)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:183 +0xa9
main.identifyFile(0xc04239e580, 0xc042034fc0, 0x809918)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:191 +0x97
main.identify.func1(0xc0420480a0, 0x12, 0x9cc400, 0xc042035020, 0x0, 0x0, 0xc042
5edbf0, 0xc0420e61a0)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows
.go:113 +0x4fb
path/filepath.walk(0xc0420480a0, 0x12, 0x9cc400, 0xc042035020, 0xc042231590, 0x0
, 0x50)
C:/go/src/path/filepath/path.go:356 +0x88
path/filepath.Walk(0xc0420480a0, 0x12, 0xc042231590, 0x7, 0x0)
C:/go/src/path/filepath/path.go:403 +0x124
main.identify(0xc042034fc0, 0xc0420480a0, 0x12, 0x0, 0x0, 0x0, 0x809918, 0xed194
6cf2, 0xa16ca0)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows
.go:116 +0xe3
main.main()
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:466 +0xad4

goroutine 4 [chan receive, 3 minutes]:
main.printer(0xc042034fc0, 0xc042231540)
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:153 +0xba
created by main.main
c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:388 +0x8a1

goroutine 287 [chan receive, 3 minutes]:
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer.func6(
0xc045cf2cc0, 0xc044701b60, 0xc044701b58, 0xc044701b50, 0xc045cfa300, 0xc04207d8
c0, 0xc045cfa330, 0xc045cfa390, 0xc045cf6780, 0xc045cfa360, ...)
c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/sc
orer.go:390 +0x57
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).sc
orer
c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/sc
orer.go:389 +0x3cf

goroutine 321 [semacquire, 3 minutes]:
sync.runtime_SemacquireMutex(0xc0422316bc, 0xc045b33d00)
C:/go/src/runtime/sema.go:71 +0x44
sync.(*Mutex).Lock(0xc0422316b8)
C:/go/src/sync/mutex.go:134 +0xf5
github.com/richardlehane/siegfried/internal/siegreader.(*stream).Slice(0xc042231
680, 0x41000, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str
eam.go:100 +0x74
github.com/richardlehane/siegfried/internal/siegreader.(*Reader).setBuf(0xc045ce
5480, 0x41000, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea
der.go:50 +0x56
github.com/richardlehane/siegfried/internal/siegreader.(Reader).ReadByte(0xc045
ce5480, 0xc045b33f05, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea
der.go:70 +0x86
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(

fwac).match(0xc04256bb20, 0x9c4e20, 0xc045ce5480, 0xc045cf2d80)
c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa
rdlehane/match/fwac/fwac.go:448 +0x2be
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/ma
tch/fwac.(*fwac).Index
c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa
rdlehane/match/fwac/fwac.go:439 +0x86

goroutine 299 [select, 3 minutes]:
github.com/richardlehane/siegfried/internal/siegreader.(*stream).EofSlice(0xc042
231680, 0x0, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str
eam.go:132 +0x12a
github.com/richardlehane/siegfried/internal/siegreader.(*ReverseReader).setBuf(0
xc045d12140, 0x0, 0x0, 0x0)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea
der.go:169 +0x56
github.com/richardlehane/siegfried/internal/siegreader.(*ReverseReader).ReadByte
(0xc045d12140, 0xc04256b920, 0x7706c0, 0xc045c7ee60)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea
der.go:212 +0x86
github.com/richardlehane/siegfried/internal/siegreader.(LimitReverseReader).Rea
dByte(0xc042221040, 0xc045770a80, 0x65, 0x65)
c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea
der.go:274 +0x68
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(

fwac).match(0xc04256b940, 0x9c4de0, 0xc042221040, 0xc045d00480)
c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa
rdlehane/match/fwac/fwac.go:448 +0xd0
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/ma
tch/fwac.(*fwac).Index
c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa
rdlehane/match/fwac/fwac.go:439 +0x86
rax 0x440622e6c
rbx 0x7010
rcx 0x440629e7c
rdi 0xc0426345e6
rsi 0x440622e6c
rbp 0xc045c9f1a0
rsp 0xc045c9f138
r8 0x1
r9 0x0
r10 0xc0426345e6
r11 0x20
r12 0x0
r13 0x0
r14 0x456320
r15 0x0
rip 0x4581f5
rflags 0x10283
cs 0x33
fs 0x53
gs 0x2b

Y:\XXXX\Converted videos, film\Consignment AV>

@richardlehane richardlehane self-assigned this Nov 9, 2017
@richardlehane
Copy link
Owner

Hey Terry - I've reproduced on my local with a synthetic file (cool tool on windows for generating files of arbitrary size fill of random bytes: RDFC).

My computer heroically survived unzipping a 2GB file, a 5GB file, but finally choked on a 15GB file. Giving a runtime error:

runtime: out of memory: cannot allocate 17179869184-byte block (17242652672 in use)
fatal error: out of memory

runtime stack:
runtime.throw(0x7f1b7a, 0xd)
C:/tools/go/src/runtime/panic.go:605 +0x9c
runtime.largeAlloc(0x400000000, 0x1b0101, 0xc04e53e305)
C:/tools/go/src/runtime/malloc.go:829 +0x11b
runtime.mallocgc.func1()
C:/tools/go/src/runtime/malloc.go:722 +0x4d
runtime.systemstack(0xc042018600)
C:/tools/go/src/runtime/asm_amd64.s:344 +0x7e
runtime.mstart()
C:/tools/go/src/runtime/proc.go:1125

goroutine 11 [running]:
runtime.systemstack_switch()
C:/tools/go/src/runtime/asm_amd64.s:298 fp=0xc0420293f8 sp=0xc0420293f0 pc=0x4549e0
runtime.mallocgc(0x400000000, 0x763dc0, 0x1, 0xc042539628)
C:/tools/go/src/runtime/malloc.go:721 +0x7f7 fp=0xc0420294a0 sp=0xc0420293f8 pc=0x410df7
runtime.makeslice(0x763dc0, 0x400000000, 0x400000000, 0x1000, 0x1000, 0x0)
C:/tools/go/src/runtime/slice.go:54 +0x7e fp=0xc0420294d0 sp=0xc0420294a0 pc=0x44006e
github.com/richardlehane/siegfried/internal/siegreader.(*stream).grow(...)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:56
github.com/richardlehane/siegfried/internal/siegreader.(*stream).fill(0xc042539630, 0x200000000, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:70 +0x259 fp=0xc042029558 sp=0xc0420294d0 pc=0x6556e9
github.com/richardlehane/siegfried/internal/siegreader.(*stream).CanSeek(0xc042539630, 0x0, 0xc04202f201, 0xc04586e000, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:155 +0x229 fp=0xc0420295b0 sp=0xc042029558 pc=0x656179
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).identify(0xc0420698c0, 0xc04255f7c0, 0xc04203aa80, 0xc04203aae0, 0xc042607920, 0x0, 0x1)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/identify.go:96 +0x15d6 fp=0xc042029fa8 sp=0xc0420295b0 pc=0x68e086
runtime.goexit()
C:/tools/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042029fb0 sp=0xc042029fa8 pc=0x457681
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Identify
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/bytematcher.go:173 +0xd3

goroutine 1 [chan receive, 3 minutes]:
github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc042069970, 0xc04255f7c0, 0x0, 0x0, 0xc0425e34c0, 0x1f, 0x0, 0x0, 0xc04202d150, 0x47d227, ...)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/siegfried.go:385 +0x1034
main.identifyRdr(0x1080820, 0xc0425395e0, 0xc042380600, 0xc04254b680, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:205 +0x127
main.identifyRdr(0x9c7960, 0xc0420047b0, 0xc042380580, 0xc04254b680, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:266 +0x59f
main.readFile(0xc042380580, 0xc04254b680, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:183 +0xa9
main.identifyFile(0xc042380580, 0xc04254b680, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:191 +0x97
main.identify.func1(0xc04200c100, 0xf, 0x9ce440, 0xc04254b6e0, 0x0, 0x0, 0xc0425e3380, 0xc0420ce1a0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:113 +0x4fb
path/filepath.walk(0xc04200c100, 0xf, 0x9ce440, 0xc04254b6e0, 0xc042539590, 0x0, 0x50)
C:/tools/go/src/path/filepath/path.go:356 +0x88
path/filepath.Walk(0xc04200c100, 0xf, 0xc042539590, 0x763c00, 0xc04202edf0)
C:/tools/go/src/path/filepath/path.go:403 +0x124
main.identify(0xc04254b680, 0xc04200c100, 0xf, 0x0, 0x0, 0x0, 0x809fc0, 0xed171eefe, 0xa18d00)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:116 +0xe3
main.main()
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:466 +0xad4

goroutine 9 [chan receive, 3 minutes]:
main.printer(0xc04254b680, 0xc0425394f0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:153 +0xba
created by main.main
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:388 +0x8a1

goroutine 12 [chan receive, 3 minutes]:
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer.func6(0xc04203ab40, 0xc042607938, 0xc042607930, 0xc042607928, 0xc0421e2900, 0xc0420698c0, 0xc0421e2930, 0xc0421e2990, 0xc04255f860, 0xc0421e2960, ...)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:390 +0x57
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:389 +0x3cf

goroutine 14 [semacquire, 3 minutes]:
sync.runtime_SemacquireMutex(0xc04253966c, 0x411900)
C:/tools/go/src/runtime/sema.go:71 +0x44
sync.(*Mutex).Lock(0xc042539668)
C:/tools/go/src/sync/mutex.go:134 +0xf5
github.com/richardlehane/siegfried/internal/siegreader.(*stream).Slice(0xc042539630, 0x59000, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:100 +0x74
github.com/richardlehane/siegfried/internal/siegreader.(*Reader).setBuf(0xc042511780, 0x59000, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:50 +0x56
github.com/richardlehane/siegfried/internal/siegreader.(*Reader).ReadByte(0xc042511780, 0xc04260bf00, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:70 +0x86
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).match(0xc04255fb80, 0x9c6e20, 0xc042511780, 0xc04203a960)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0x2be
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).Index
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

goroutine 18 [select, 3 minutes]:
github.com/richardlehane/siegfried/internal/siegreader.(*stream).EofSlice(0xc042539630, 0x0, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:132 +0x12a
github.com/richardlehane/siegfried/internal/siegreader.(*ReverseReader).setBuf(0xc0421fd4c0, 0x0, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:169 +0x56
github.com/richardlehane/siegfried/internal/siegreader.(*ReverseReader).ReadByte(0xc0421fd4c0, 0xc045ac21e0, 0x770c00, 0xc04255fc20)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:212 +0x86
github.com/richardlehane/siegfried/internal/siegreader.(*LimitReverseReader).ReadByte(0xc04202f2d0, 0xc045758a80, 0x65, 0x65)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:274 +0x68
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).match(0xc045ac2200, 0x9c6de0, 0xc04202f2d0, 0xc04586e060)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0xd0
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).Index
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

It may be a little hard to guard against this error given that computers are all different in their RAM capacity, so can't just set an arbitrary limit of say 3GB of zipped content. But I'll see what can be done.

@richardlehane
Copy link
Owner

p.s. interesting that I got a golang runtime panic where you got a Windows OS exception panic... maybe because accessing the file over the network?

@richardlehane
Copy link
Owner

hmmm tried it on the same zip that broke for you TJ, but I didn't get your error, got one that looks a lot like my synthetic file error:

runtime: out of memory: cannot allocate 17179869184-byte block (17244651520 in use)
fatal error: out of memory

runtime stack:
runtime.throw(0x7f1b7a, 0xd)
C:/tools/go/src/runtime/panic.go:605 +0x9c
runtime.largeAlloc(0x400000000, 0x1b0101, 0x434006)
C:/tools/go/src/runtime/malloc.go:829 +0x11b
runtime.mallocgc.func1()
C:/tools/go/src/runtime/malloc.go:722 +0x4d
runtime.systemstack(0xa193d8)
C:/tools/go/src/runtime/asm_amd64.s:344 +0x7e
runtime.mstart()
C:/tools/go/src/runtime/proc.go:1125

goroutine 277 [running]:
runtime.systemstack_switch()
C:/tools/go/src/runtime/asm_amd64.s:298 fp=0xc04202d3f8 sp=0xc04202d3f0 pc=0x4549e0
runtime.mallocgc(0x400000000, 0x763dc0, 0x1, 0xc042075e48)
C:/tools/go/src/runtime/malloc.go:721 +0x7f7 fp=0xc04202d4a0 sp=0xc04202d3f8 pc=0x410df7
runtime.makeslice(0x763dc0, 0x400000000, 0x400000000, 0x1000, 0x1000, 0x0)
C:/tools/go/src/runtime/slice.go:54 +0x7e fp=0xc04202d4d0 sp=0xc04202d4a0 pc=0x44006e
github.com/richardlehane/siegfried/internal/siegreader.(*stream).grow(...)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:56
github.com/richardlehane/siegfried/internal/siegreader.(*stream).fill(0xc042447bd0, 0x200000000, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:70 +0x259 fp=0xc04202d558 sp=0xc04202d4d0 pc=0x6556e9
github.com/richardlehane/siegfried/internal/siegreader.(*stream).CanSeek(0xc042447bd0, 0x0, 0xc045bf0b01, 0xc045dbc600, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:155 +0x229 fp=0xc04202d5b0 sp=0xc04202d558 pc=0x656179
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).identify(0xc0420638c0, 0xc045da0280, 0xc045dca0c0, 0xc045dca120, 0xc045d62b78, 0x0, 0x1)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/identify.go:96 +0x15d6 fp=0xc04202dfa8 sp=0xc04202d5b0 pc=0x68e086
runtime.goexit()
C:/tools/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc04202dfb0 sp=0xc04202dfa8 pc=0x457681
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Identify
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/bytematcher.go:173 +0xd3

goroutine 1 [chan receive, 9 minutes]:
github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc042063970, 0xc045da0280, 0x0, 0x0, 0xc042015e30, 0x70, 0x0, 0x0, 0xc042029150, 0x47d227, ...)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/siegfried.go:385 +0x1034
main.identifyRdr(0x16c0130, 0xc042075e00, 0xc04237a600, 0xc04254f020, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:205 +0x127
main.identifyRdr(0x9c7960, 0xc04243a140, 0xc04237a580, 0xc04254f020, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:266 +0x59f
main.readFile(0xc04237a580, 0xc04254f020, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:183 +0xa9
main.identifyFile(0xc04237a580, 0xc04254f020, 0x809fc0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:191 +0x97
main.identify.func1(0xc04200a300, 0x12, 0x9ce440, 0xc04254f080, 0x0, 0x0, 0xc042009c20, 0xc0420c81a0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:113 +0x4fb
path/filepath.walk(0xc04200a300, 0x12, 0x9ce440, 0xc04254f080, 0xc042447ae0, 0x0, 0x50)
C:/tools/go/src/path/filepath/path.go:356 +0x88
path/filepath.Walk(0xc04200a300, 0x12, 0xc042447ae0, 0x763c00, 0xc04246c480)
C:/tools/go/src/path/filepath/path.go:403 +0x124
main.identify(0xc04254f020, 0xc04200a300, 0x12, 0x0, 0x0, 0x0, 0x809fc0, 0xed171eefe, 0xa18d00)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:116 +0xe3
main.main()
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:466 +0xad4

goroutine 33 [chan receive, 9 minutes]:
main.printer(0xc04254f020, 0xc042447a40)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:153 +0xba
created by main.main
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:388 +0x8a1

goroutine 278 [chan receive, 9 minutes]:
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer.func6(0xc045dca180, 0xc045d62bb0, 0xc045d62ba8, 0xc045d62ba0, 0xc0457bfec0, 0xc0420638c0, 0xc0457bfef0, 0xc0457bff50, 0xc045da0340, 0xc0457bff20, ...)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:390 +0x57
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:389 +0x3cf

goroutine 280 [semacquire, 9 minutes]:
sync.runtime_SemacquireMutex(0xc042447c0c, 0x80a700)
C:/tools/go/src/runtime/sema.go:71 +0x44
sync.(*Mutex).Lock(0xc042447c08)
C:/tools/go/src/sync/mutex.go:134 +0xf5
github.com/richardlehane/siegfried/internal/siegreader.(*stream).Slice(0xc042447bd0, 0x42000, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:100 +0x74
github.com/richardlehane/siegfried/internal/siegreader.(*Reader).setBuf(0xc045dbe280, 0x42000, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:50 +0x56
github.com/richardlehane/siegfried/internal/siegreader.(*Reader).ReadByte(0xc045dbe280, 0xc045d1ff38, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:70 +0x86
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).match(0xc0457ae040, 0x9c6e20, 0xc045dbe280, 0xc045dca240)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0x2be
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).Index
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

goroutine 327 [select, 9 minutes]:
github.com/richardlehane/siegfried/internal/siegreader.(*stream).EofSlice(0xc042447bd0, 0x0, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:132 +0x12a
github.com/richardlehane/siegfried/internal/siegreader.(*ReverseReader).setBuf(0xc045d8f840, 0x0, 0x0, 0x0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:169 +0x56
github.com/richardlehane/siegfried/internal/siegreader.(*ReverseReader).ReadByte(0xc045d8f840, 0xc0457c0200, 0x770c00, 0xc045cb5d40)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:212 +0x86
github.com/richardlehane/siegfried/internal/siegreader.(*LimitReverseReader).ReadByte(0xc045bf0bd0, 0xc045e8c000, 0x65, 0x65)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:274 +0x68
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).match(0xc0457c0220, 0x9c6de0, 0xc045bf0bd0, 0xc045dbc6c0)
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0xd0
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).Index
c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

@richardlehane
Copy link
Owner

richardlehane commented Nov 17, 2017

TJ: did some research into this. Detecting and preventing out of memory errors is evidently a hard problem! But the next release of golang (1.10) has something promising: they are working on "* APIs for memory and CPU resource control". This will hopefully allow me to detect available memory before attempting to allocate a big slice.

So likely any fix to this won't land before golang 1.10 which is due early 2018.

In the meantime, if you are using the "-z" flag: be aware that if your compressed file contains really big files, you can hit these out of memory errors. Temporary solution is to unzip before scanning with siegfried.

@tjolliffe
Copy link
Author

tjolliffe commented Nov 17, 2017

Thanks Richard. My default approach will be to unzip pre-SF scan from now on anyway.

@richardlehane
Copy link
Owner

A possible alternate approach is to back-up stream contents to a temp file on disk. That way I won't need to reserve such a large chunk of memory. It is a little less tidy and may mean a significant slowdown in some scenarios but it will at least avoid things blowing up like this.

@richardlehane
Copy link
Owner

I fixed it with the temp file approach... see no panic...
image

... but it took 41 mins :(

Behaviour now is: if sf is reading from a stream (which it does for contents of compressed files and when something is piped to sf -), then will use up to ARBITRARY_LIMIT of RAM to copy stream for scanning. Once ARBITRARY_LIMIT is hit, remainder of stream is copied to a temp file on disk. Doing the latter of course is a lot slower because really heavy IO. But it puts a cap on memory use and avoids out of memory panics.

Picking the right ARBITRARY_LIMIT is a challenge: it really depends on how much RAM different users have to spend. Also consider that you can have streams within streams within streams (e.g. a zip file that contains another zip file that itself contains a zip file) so might need multiples of the ARBITRARY_LIMIT. With the promised Golang 1.10 features for assessing available memory - may be able to make this smarter in future.

Currently ARBITRARY_LIMIT is set to ~65MB. I'm open to suggestions for changes to this setting. It could also be made configurable with a flag (e.g. -zlimit) if anyone would use that. E.g. if you have a lot of warc.gz files that are 1GB in size (a common size I think for web harvesting), you'd probably want a 1GB ARBITRARY_LIMIT so you could unload these into RAM.

@tjolliffe
Copy link
Author

I think an adjustable limit would be a good idea due to the wide variety in specs for user machines. Perhaps a short description in help page to assist users guesstimate their optimal ARBITRARY_LIMIT. Regarding the default limit size, it would be interesting to see how much faster it would be to process the same "consignment AV.zip'' test file if the ARBITRARY_LIMIT is set to 10 times the size (~650MB).

@richardlehane
Copy link
Owner

image

Did a couple of tests and the difference in full RAM or temp disk use is actually pretty marginal for me.
For a 3GB file, zipped, ~1m using full RAM or ~1m 3secs using temp disk after 65MB. More than anything this shows how great SSDs are!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants