Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix processing of PCAP files with trimmed packets #657

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Phikimon
Copy link

Network traffic datasets oftentimes omit the actual packet contents to reduce the dataset volume and probably for the sake of privacy. They do it with the use of tcpdump's --snapshot-length option or using scripts to only preserve headers up to transport layer. One example are MAWI lab datasets where packets are trimmed to lengths in the range 34-96 bytes depending on the packet type.

$ capinfos -s mawi.pcap | grep "size limit"
Packet size limit:   inferred: 34 bytes - 96 bytes (range)
$ editcap -F pcap -r mawi.pcap singlepacket.pcap 1
$ tcpdump -Atnnr ./singlepacket.pcap 2>/dev/null
IP 203.189.86.188.443 > 207.141.234.127.20627: Flags [.], seq 2465589976:2465591396, ack 326528497, win 31088, length 1420
E....X@.;..d..V.......P......vm.P.yp|...

We see that TCP reports contents of length 1420, but the actual contents printed in ASCII do not exceed 100 bytes. This indicates the packet is trimmed. By running a very simple dpkt script (see below) that would copy all packets from singlepacket.pcap to copied.pcap we get the following result with tcpdump repoting an error:

$ python3 dpktcopy.py singlepacket.pcap copied.pcap
$ tcpdump -Atnnr ./copied.pcap 2>/dev/null
IP truncated-ip - 1420 bytes missing! 203.189.86.188.443 > 207.141.234.127.20627: Flags [.], seq 2465589976:2465591396, ack 326528497, win 31088, length 1420
E....X@.;..d..V.......P......vm.P.yp|...

Wireshark would exhibit similar behavior such as failure to associate packets belonging to the same flow. This happens because each packet in pcap format has two fields in the header associated with it: 'len' and 'caplen', they give tcpdump a hint whether the packet was trimmed. Currently dpkt ignores 'len' field and only uses 'caplen'.

To fix this, I provide two commits - one for the Writer side and another for the Reader side. The former allows providing 'len' value to be written in the pcap packet header, and the latter exposes this value from the pcap file to the user.

The code changes to preserve 'len' field with proposed API would be minimal:

 import dpkt
 import sys
 
 in_file = open(sys.argv[1], 'rb')
 out_file = open(sys.argv[2], 'wb')
 
-pcap = dpkt.pcap.Reader(in_file)
+pcap = dpkt.pcap.PktlenReader(in_file)
 writer = dpkt.pcap.Writer(out_file)
 
-def callback(ts, buf):
+def callback(ts, pktlen, buf):
     eth = dpkt.ethernet.Ethernet(buf)
-    writer.writepkt(eth, ts)
+    writer.writepkt(eth, ts, pktlen)
 
 pcap.dispatch(0, callback)
$ python3 newdpktcopy.py singlepacket.pcap newcopied.pcap
$ tcpdump -Atnnr ./newcopied.pcap  2>/dev/null
IP 203.189.86.188.443 > 207.141.234.127.20627: Flags [.], seq 2465589976:2465591396, ack 326528497, win 31088, length 1420
E....X@.;..d..V.......P......vm.P.yp|...

Filipp Mikoian added 2 commits September 21, 2023 06:47
Retains default behavior, ensuring packet header field 'len' equals 'caplen'.
Fixes issues with files generated with tcpdump's --snapshot-length and
those with trimmed packets. This prevents errors like "truncated-ip" in
tcpdump and fixes Wireshark misinterpretations that prevent it, for
example, from associating TCP packets within a single flow.

Authored-by: Filipp Mikoian <filipp@u.nus.edu>
This class extends pcap Reader by exposing len field from packet header
to the user. This provides information about the original packet size as
it was transmitted. For trimmed packets with corrupted len fields there is
actually no other way to know the read packet size.

The main purpose of PktlenReader however is to be used in conjunction with
Writer.writepkt()'s pktlen argument to preserve this pcap packet header
field when working on files generated with tcpdump's --snapshot-length.

Authored-by: Filipp Mikoian <filipp@u.nus.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant