Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't open a given nffile which nfdump does #11

Closed
gabrielmocan opened this issue Apr 9, 2024 · 12 comments
Closed

Can't open a given nffile which nfdump does #11

gabrielmocan opened this issue Apr 9, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@gabrielmocan
Copy link

Hi Pete,

I have a nffile that is properly decoded using classic nfdump 1.7.4-a16f86f but throws an error when using go-nfdump v0.0.4.

Below some logs and I'm attaching the sample.

root# nfdump -V
nfdump: Version: 1.7.4-a16f86f Options: NSEL-NEL ZSTD BZIP2 Date: 2024-04-07 15:15:55 +0200
root# nfdump -r nfcapd.202404090803 | head 
Date first seen            Event  XEvent Proto      Src IP Addr:Port          Dst IP Addr:Port     X-Src IP Addr:Port        X-Src IP Addr:Port   In Byte Out Byte
2024-04-09 08:08:07.000 <no-evt> <no-evt> UDP     170.245.223.82:56023 ->   104.237.172.31:443            0.0.0.0:0     ->          0.0.0.0:0        86000        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> UDP      45.164.10.165:35258 ->   157.240.216.16:443            0.0.0.0:0     ->          0.0.0.0:0        1.3 M        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> UDP      170.79.169.43:50342 ->    128.14.119.70:28571          0.0.0.0:0     ->          0.0.0.0:0        78000        0
2024-04-09 08:07:21.000 <no-evt> <no-evt> TCP     128.201.197.87:48740 ->  168.196.119.196:443            0.0.0.0:0     ->          0.0.0.0:0        1.2 M        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> TCP      13.107.213.33:443   ->   190.89.233.250:53696          0.0.0.0:0     ->          0.0.0.0:0        1.5 M        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> TCP     169.150.250.39:443   ->    170.245.67.26:52192          0.0.0.0:0     ->          0.0.0.0:0        1.5 M        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> TCP     104.237.189.22:443   ->     168.194.15.8:60156          0.0.0.0:0     ->          0.0.0.0:0        1.5 M        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> TCP     157.240.216.60:443   ->     45.182.170.9:7122           0.0.0.0:0     ->          0.0.0.0:0        4.3 M        0
2024-04-09 08:08:07.000 <no-evt> <no-evt> UDP    148.153.194.118:8953  ->   170.79.169.242:27538          0.0.0.0:0     ->          0.0.0.0:0        1.4 M        0

root# nfdumpNative nfcapd.202404090803
panic: runtime error: slice bounds out of range [:290] with capacity 172

goroutine 11 [running]:
github.com/phaag/go-nfdump.NewRecord(...)
	/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.4/record.go:67
github.com/phaag/go-nfdump.(*NfFile).AllRecords.func1()
	/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.4/nffile.go:280 +0x1655
created by github.com/phaag/go-nfdump.(*NfFile).AllRecords in goroutine 1
	/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.4/nffile.go:268 +0x7b

broken.sample.zip

@phaag
Copy link
Owner

phaag commented Apr 10, 2024

I'll have a look. Thanks for the sample! This always helps!

@phaag phaag closed this as completed in 36154a6 Apr 16, 2024
@phaag phaag added the bug Something isn't working label Apr 16, 2024
@phaag phaag self-assigned this Apr 16, 2024
@phaag
Copy link
Owner

phaag commented Apr 16, 2024

It was indeed a boundary check error in the go decoding code. I fixed that in master. An updated new version will follow.
Nfdump has a boundary check integrated already, but I also improved that in nfdump master. The boundary check skips bad records. For some reason, one single record in your file is corrupt.

@gabrielmocan
Copy link
Author

@phaag I have yet another file that is not passing boundary check and panicking. Sample is attached.

Record 64357: decoding error: Record body boundary check error
Record 64406: decoding error: Record body boundary check error
panic: runtime error: slice bounds out of range [:1635282] with capacity 1635280

goroutine 6 [running]:
github.com/phaag/go-nfdump.(*NfFile).AllRecords.func1()
        /Users/gabemocan-mw/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.5/nffile.go:277 +0x14a0
created by github.com/phaag/go-nfdump.(*NfFile).AllRecords in goroutine 1
        /Users/gabemocan-mw/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.5/nffile.go:270 +0x80
exit status 2

broken2.sample.zip

@phaag
Copy link
Owner

phaag commented May 6, 2024

That sample is really corrupt! However, I need to friendly exit or skip.

@phaag phaag reopened this May 6, 2024
@phaag
Copy link
Owner

phaag commented May 6, 2024

A datablock is missing records. Do you have multiple processes writing to the same file?

% nfdump -v broken2.sample                                                                                                                                                                                                  Darwin 23.4.0
File       : broken2.sample
Version    : 2 - not compressed
Created    : 2024-05-05 13:52:00
Created by : nfcapd
nfdump     : f1070400
encryption : no
Appdx blks : 1
Data blks  : 6
Checking data blocks
Block 5 num records 9255 != counted records: 9250

@gabrielmocan
Copy link
Author

A datablock is missing records. Do you have multiple processes writing to the same file?

It's a single nfcapd -n ... -n ... -n ... process with multiple directories, one for each exporter, so, no multiple processes writing to the same file.

But I suspect something is wrong with the VM hosting this collector. I'm having segfaults I can't explain on my processing code, although no errors on nfcapd process. Maybe physical memory fault or faulty storage, still not sure.

That sample is really corrupt! However, I need to friendly exit or skip.

For now that would do the trick, just to avoid the panic calls.

@phaag
Copy link
Owner

phaag commented May 6, 2024

I added another data block boundary check! It spits an error, but does no longer crashes!

@phaag
Copy link
Owner

phaag commented May 6, 2024

Have you checked the syslog file? any specific error messages of the collector?

@gabrielmocan
Copy link
Author

I added another data block boundary check! It spits an error, but does no longer crashes!

Thanks! Will try right away.

Have you checked the syslog file? any specific error messages of the collector?

Apparently no errors on the collector side, I run it on a dedicated container. Logs are clean.

@gabrielmocan
Copy link
Author

I added another data block boundary check! It spits an error, but does no longer crashes!

I guess the output can be less verbose, this line Next block... it's not needed, the error log is the important.

go run . nfdumpNative ../tests/samples/broken2.sample
Next block - type: 3, records: 11892, size: 2097072  
Next block - type: 3, records: 11883, size: 2097060
Next block - type: 3, records: 11874, size: 2097048
Next block - type: 3, records: 11881, size: 2097100
Next block - type: 3, records: 11857, size: 2097004
Next block - type: 3, records: 9255, size: 1635280
Record 64357: decoding error: Record body boundary check error
Record 64406: decoding error: Record body boundary check error
DataBlock error: count: 9255, size: 1635280. Found: 9250, size: 1635280

@phaag
Copy link
Owner

phaag commented May 6, 2024

Sorry - fixed.

@phaag phaag closed this as completed May 7, 2024
@gabrielmocan
Copy link
Author

@phaag just to feedback to you, this VM had faulty memory. That's why files were so messed up! Still, we made the code more resilient. That's good anyways.

But I suspect something is wrong with the VM hosting this collector. I'm having segfaults I can't explain on my processing code, although no errors on nfcapd process. Maybe physical memory fault or faulty storage, still not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants