Skip to content

Supported Data Formats

Shane Alcock edited this page Apr 22, 2021 · 4 revisions

Flowtuple 3

For more details on the flowtuple3 format, see https://github.com/CAIDA/corsaro3/wiki/Flowtuple-Formats

Module: pyavro_stardust.flowtuple3

Reader Class: AvroFlowtuple3Reader

Record Class: AvroFlowtuple3

Attributes

Numeric Attribute Enum Type: Flowtuple3AttributeNum

String Attribute Enum Type: Flowtuple3AttributeStr

Attribute ID Dict Key Type Notes
ATTR_FT3_TIMESTAMP timestamp Numeric Timestamp of the interval this flowtuple was observed in
ATTR_FT3_SRC_IP src_ip Numeric Source IP address, as a 32 bit integer
ATTR_FT3_DST_IP dst_ip Numeric Destination IP address, as a 32 bit integer
ATTR_FT3_SRC_PORT src_port Numeric Source port for TCP/UDP flowtuples, ICMP Type for ICMP flowtuples
ATTR_FT3_DST_PORT dst_port Numeric Destination port for TCP/UDP flowtuples, ICMP Code for ICMP flowtuples
ATTR_FT3_PROTOCOL protocol Numeric Transport protocol
ATTR_FT3_TTL ttl Numeric IP TTL
ATTR_FT3_TCP_FLAGS tcpflags Numeric Only applies to TCP flows; the 8 bits of TCP flags as an integer (ignores NS flags)
ATTR_FT3_IP_LEN ip_len Numeric Length of the packet, starting from the IP header
ATTR_FT3_SYN_LEN tcp_synlen Numeric Size of the TCP header; only applies to TCP SYN flows
ATTR_FT3_SYNWIN_LEN tcp_synwinlen Numeric Announced receive window; only applies to TCP SYN flows
ATTR_FT3_PKT_COUNT packets Numeric Number of packets observed matching this flowtuple
ATTR_FT3_ISSPOOFED is_spoofed Numeric Set to 1 if source address was inferred to be spoofed, 0 otherwise
ATTR_FT3_ISMASSCAN is_masscan Numeric Set to 1 if the packet was likely created by the masscan tool, 0 otherwise
ATTR_FT3_ASN asn Numeric ASN that the source IP address corresponded to, according to prefix2asn data
ATTR_FT3_MAXMIND_CONTINENT maxmind_continent String Geo-location of the source IP address, according to Maxmind (continent level)
ATTR_FT3_MAXMIND_COUNTRY maxmind_country String Geo-location of the source IP address, according to Maxmind (country level)
ATTR_FT3_NETACQ_CONTINENT netacq_continent String Geo-location of the source IP address, according to Netacuity (continent level)
ATTR_FT3_NETACQ_COUNTRY netacq_country String Geo-location of the source IP address, according to Netacuity (country level)

Flowtuple 4

For more details on the flowtuple4 format, see https://github.com/CAIDA/corsaro3/wiki/Flowtuple-Formats

Module: pyavro_stardust.flowtuple4

Reader Class: AvroFlowtuple4Reader

Record Class: AvroFlowtuple4

Attributes

Numeric Attribute Enum Type: Flowtuple4AttributeNum

String Attribute Enum Type: Flowtuple4AttributeStr

Attribute ID Dict Key Type Notes
ATTR_FT4_TIMESTAMP timestamp Numeric Timestamp of the interval this flowtuple was observed in
ATTR_FT4_SRC_IP src_ip Numeric Source IP address, as a 32 bit integer
ATTR_FT4_DST_NET dst_net Numeric Destination IP network, as a 32 bit integer
ATTR_FT4_DST_PORT dst_port Numeric Destination port for TCP/UDP flows, Type and Code for ICMP flows (as a 16 bit integer)
ATTR_FT4_PROTOCOL protocol Numeric Transport protocol
ATTR_FT4_PKT_COUNT packets Numeric Number of packets that matched this flowtuple
ATTR_FT4_UNIQ_DST_IPS uniq_dst_ips Numeric Number of unique destination IP addresses seen for this flowtuple
ATTR_FT4_UNIQ_PKT_SIZES uniq_pkt_sizes Numeric Number of unique packet sizes seen for this flowtuple
ATTR_FT4_UNIQ_TTLS uniq_ttls Numeric Number of unique IP TTLs seen for this flowtuple
ATTR_FT4_UNIQ_SRC_PORTS uniq_src_ports Numeric Number of unique TCP/UDP source ports seen for this flowtuple
ATTR_FT4_UNIQ_TCP_FLAGS uniq_tcp_flags Numeric Number of unique TCP flag combinations seen for this flowtuple
ATTR_FT4_FIRST_SYN_LEN first_syn_len Numeric Size of the TCP header of the first observed SYN packet for this flowtuple
ATTR_FT4_FIRST_TCP_RWIN first_tcp_rwin Numeric Announced receive window size in the first observed SYN packet for this flowtuple
ATTR_FT4_ASN asn Numeric ASN that the source IP address corresponded to, according to prefix2asn data
ATTR_FT4_MAXMIND_CONTINENT maxmind_continent String Geo-location of the source IP address, according to Maxmind (continent level)
ATTR_FT4_MAXMIND_COUNTRY maxmind_country String Geo-location of the source IP address, according to Maxmind (country level)
ATTR_FT4_NETACQ_CONTINENT netacq_continent String Geo-location of the source IP address, according to Netacuity (continent level)
ATTR_FT4_NETACQ_COUNTRY netacq_country String Geo-location of the source IP address, according to Netacuity (country level)
ATTR_FT4_COMMON_PKT_SIZES common_pkt_sizes Numeric Array Commonly observed IP packet sizes for this flowtuple
ATTR_FT4_COMMON_PKT_SIZE_FREQS N/A Numeric Array Frequencies of each of the common IP packet sizes
ATTR_FT4_COMMON_TTLS common_ttls Numeric Array Commonly observed IP TTLs for this flowtuple
ATTR_FT4_COMMON_TTL_FREQS N/A Numeric Array Frequencies of each of the common IP TTLs
ATTR_FT4_COMMON_SRC_PORTS common_src_ports Numeric Array Commonly observed TCP/UDP source ports for this flowtuple
ATTR_FT4_COMMON_SRC_PORT_FREQS N/A Numeric Array Frequencies of each of the common TCP/UDP source ports
ATTR_FT4_COMMON_TCP_FLAGS common_tcp_flags Numeric Array Commonly observed TCP flags combinations for this flowtuple, expressed as an 8-bit integer
ATTR_FT4_COMMON_TCP_FLAG_FREQS N/A Numeric Array Frequencies of each of the commonly observed TCP flag combinations

Notes

  • the asDict() method takes an additional optional parameter (needarrays). If set to 0, the resulting dictionary will NOT include the numeric arrays for the various "common" fields. This can greatly improve performance if you are using asDict() but have no use for the contents of these arrays. The default is 1, which will include the arrays.
  • the asDict() method will combine the common values with their frequencies when constructing the dictionary. For instance, each list item returned in the common_ttls will actually be a dictionary containing two keys and their corresponding values: value which is the common value itself, and freq which is the frequency that value was seen.
  • if you instead use getNumericArray() to access the "common" values from the flowtuple record, note that the array indexes for the arrays containing values and the arrays containing their corresponding frequencies are correlated. For instance, the frequency at index 0 of ATTR_FT4_COMMON_TTL_FREQS is the frequency for the TTL at index 0 in the ATTR_FT4_COMMON_TTLS array.

RSDOS

For more details on the DoS data format, https://github.com/CAIDA/corsaro3/wiki/DoS-Plugin

Module: pyavro_stardust.rsdos

Reader Class: AvroRsdosReader

Record Class: AvroRsdos

Attributes

Numeric Attribute Enum Type: RsdosAttribute

Attribute ID Dict Key Type Notes
ATTR_RSDOS_TIMESTAMP timestamp Numeric The timestamp of the interval where this attack was observed
ATTR_RSDOS_PACKET_LEN packet_len Numeric The IP length of the first packet observed for this attack
ATTR_RSDOS_TARGET_IP target_ip Numeric The IP address of the assumed target of the attack, as a 32 bit integer
ATTR_RSDOS_TARGET_PROTOCOL target_protocol Numeric The transport protocol used to conduct the attack
ATTR_RSDOS_ATTACKER_IP_CNT attacker_count Numeric The number of unique IP addresses that "sent" attack traffic
ATTR_RSDOS_ATTACK_PORT_CNT attack_port_count Numeric The number of unique source ports used by the attackers
ATTR_RSDOS_TARGET_PORT_CNT target_port_count Numeric The number of unique ports on the target that observed this attack
ATTR_RSDOS_ICMP_MISMATCHES icmp_mismatches Numeric Number of ICMP address mismatches observed for this attack
ATTR_RSDOS_BYTE_CNT byte_count Numeric Number of bytes seen by the telescope as a result of this attack
ATTR_RSDOS_PACKET_CNT packet_count Numeric Number of packets seen by the telescope as a result of this attack
ATTR_MAX_PPM_INTERVAL max_ppm_interval Numeric Peak packets-per-minute observed for this attack + interval
ATTR_RSDOS_START_TIME_SEC start_time_sec Numeric Timestamp when the first attack packet was observed (in seconds since the epoch)
ATTR_RSDOS_START_TIME_USEC start_time_usec Numeric Timestamp when the first attack packet was observed (in microseconds since the start of the second)
ATTR_RSDOS_LATEST_TIME_SEC latest_time_sec Numeric Timestamp when the most recent attack packet was observed (in seconds since the epoch)
ATTR_RSDOS_LATEST_TIME_USEC latest_time_usec Numeric Timestamp when the most recent attack packet was observed (in microseconds since the start of the second)

Notes

  • RSDOS records also include a raw byte capture of the first packet observed for each attack. There is a method called getRsdosPacketString() in the AvroRsdos class that will return this packet as a Python bytes object. Also note that, due to an oversight on our part, the capture currently includes the corsaro-tag headers from our internal processing pipeline, which will need to be skipped to reach the original packet contents that were seen on the wire.
Clone this wiki locally