This is an experimental format for representing DNS information in CBOR with the goals to:
- Be able to stream the information
- Support incomplete, broken and/or invalid DNS
- Have close to no data quality and signature degradation
- Support additional non-DNS meta data (such as ICMP/TCP attributes)
In CBOR you are expected to have one root element, most likely an array or map. This format does not have a root element, instead you are expected to read one CBOR array element at a time as a stream of CBOR elements with the first array element being the stream initiator object.
[stream_init]
[message]
...
[message]
Here are some number on the compression rate compared to PCAP:
Uncompressed | PCAP | CDS | Factor |
---|---|---|---|
client | 458373 | 133640 | 0,2915 |
zonalizer | 51769844 | 9450475 | 0,1825 |
large ditl | 1003931674 | 298167709 | 0,2970 |
small ditl | 1651252 | 603314 | 0,3653 |
Gzipped | PCAP | CDS | Factor | F/Uncompressed |
---|---|---|---|---|
client | 108136 | 45944 | 0,4248 | 0,1002 |
zonalizer | 12468329 | 2485620 | 0,1993 | 0,0480 |
large ditl | 327227203 | 117569598 | 0,3592 | 0,1171 |
small ditl | 539323 | 253402 | 0,4698 | 0,1534 |
Xzipped | PCAP | CDS | Factor | F/Uncompressed |
---|---|---|---|---|
client | 76248 | 36308 | 0,4761 | 0,0792 |
zonalizer | 7894356 | 1695920 | 0,2148 | 0,0327 |
large ditl | 267031412 | 86747604 | 0,3248 | 0,0864 |
small ditl | 442260 | 206596 | 0,4671 | 0,1251 |
client
is a couple of hours of DNS from my workstationzonalizer
is half a day from Zonalizer which continuously tests gTLDslarge ditl
,small ditl
are capture from DITL
int
: A CBOR integer (major type 0x00)uint
: A CBOR integer (value >= 0, major type 0x00)nint
: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning seeNegative Integers
simple
: A CBOR simple value (major type 0xe0)bytes
: A CBOR byte string (major type 0x40)string
: A CBOR UTF-8 string (major type 0x60)any
: Any CBOR valuebool
: A CBOR booleanrindex
: A CBOR negative integer that is a reverse index, seeDeduplication
union
: Can be used to merge the given array or map into the current objectoptional
: The attribute or object reference is optional
CBOR encodes negative numbers in a special way and this format uses that for none negative number to tell them apart.
Because of that, all negative numbers needs special decoding:
value = -value - 1
The object code below uses:
[
and]
to indicate the start and end of an arraytype name
per object attributename
per object reference...
to indicate a list of previous definition(
,|
and)
to indicate list of various types that the attribute can be
The initial object in the stream.
[
string version,
union stream_option option,
...
]
version
: The version of the formatoption
: A list of stream option objects
A stream option that can specify critical information about the stream and
how it should be decoded, see Stream Options
for more information.
[
uint option_type,
optional any option_value
]
option_type
: The type of option represented as a numberoption_value
: The option value
A message object that describes various DNS packets or other information.
[
optional bool is_complete,
union timestamp timestamp,
simple message_bits,
union ip_header ip_header,
union ( icmp_message | udp_message | tcp_message | dns_message ) content
]
is_complete
: Will exist and be false if the message is not complete and following attributes may not existstimestamp
: A timestamp objectmessage_bits
: Bitmap indicating message content- Bit 0: 0=Not DNS 1=DNS
- Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP
- Bit 2: Fragmented (0=no 1=yes)
- Bit 3: Malformed (0=no 1=yes)
ip_header
: An IP header objectcontent
: The message content, may be an ICMP, UDP, TCP or DNS message object
The timestamp object of a message.
[
( uint seconds | nint diff_from_last ),
optional uint useconds
optional uint nseconds
]
seconds
: The seconds of a UNIX timestampdiff_from_last
: The differentially from lasttimestamp.seconds
useconds
: The microseconds of a UNIX timestamp or ifdiff_from_last
is used it will be the differentially from lasttimestamp.useconds
nseconds
: The nanoseconds of a UNIX timestamp or ifdiff_from_last
is used it will be the differentially from lasttimestamp.nseconds
The IP header of a message.
[
( uint | nint ) ip_bits,
optional bytes src_addr,
optional bytes dest_addr,
optional ( uint | nint ) src_dest_port
]
ip_bits
: Bitmap indicating IP header content, if the type isnint
it also indicates that it is a reverse from last, seeDeduplication
for more information- Bit 0: address family (0=AF_INET, 1=AF_INET6)
- Bit 1: src_addr present
- Bit 2: dest_addr present
- Bit 3: port present
src_addr
: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6dest_addr
: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6src_dest_port
: A combined source and destination port, seeSource And Destination Port
The source and destination port are combined into one value. If both source and destination exists then the value is larger then 65535, the destination will be the high 16 bits and source the low otherwise it will only be the source. If the value is negative then only the destination exists.
if value > 0xffff then
src_port = value & 0xffff
dest_port = value >> 16
else if value < 0 then
dest_port = -value - 1
else
src_port = value
if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0
[
uint type,
uint code
]
type
: TODOcode
: TODO
if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0
TODO
if ip_header.ip_bits.2=1
[
uint seq_nr,
uint ack_nr,
uint tcp_bits,
uint window
]
seq_nr
: TODOack_nr
: TODOtcp_bits
: TODO- 0: URG
- 1: ACK
- 2: PSH
- 3: RST
- 4: SYN
- 5: FIN
window
: TODO
A DNS packet.
[
optional bool is_complete,
uint id,
uint raw_dns_header, # TODO
optional nint count_bits,
optional uint qdcount,
optional uint ancount,
optional uint nscount,
optional uint arcount,
optional simple rr_bits,
optional [
dns_question question,
...
],
optional [
resource_record answer,
...
],
optional [
resource_record authority,
...
],
optional [
resource_record additional,
...
],
optional bytes malformed
]
is_complete
: Will exist and be false if the message is not complete and following attributes may not existsid
: DNS identifierraw_dns_header
: TODOcount_bits
: Bitmap indicating which counts are present, seeNegative Integers
andDeduplication
- Bit 0: qdcount present
- Bit 1: ancount present
- Bit 2: nscount present
- Bit 3: arcount present
qdcount
: Number of question records if different from the number of entries inquestion
ancount
: Number of answer resource records if different from the number of entries inanswer
nscount
: Number of authority resource records if different from the number of entries inauthority
arcount
: Number of additional resource records if different from the number of entries inadditional
question
: The question recordsanswer
: The answer resource recordsauthority
: The authority resource recordsadditional
: The additional resource recordsmalformed
: Holds the bytes of the message that was not parsed
A DNS question record.
[
optional bool is_complete,
( bytes | compressed_name | rindex ) qname,
optional uint qtype,
optional nint qclass
]
is_complete
: Will exist and be false if the message is not complete and following attributes may not existsqname
: The QNAME as byte string, a name compression object or a reverse index, seeDeduplication
qtype
: The QTYPE, seeDeduplication
qclass
: The QCLASS, seeNegative Integers
andDeduplication
An compressed name which has references to other labels within the same message.
[
( bytes label | uint label_index | nint offset | simple extension_bits ),
...
]
label
: A byte string with a label partlabel_index
: An index to the N byte string label in the messageoffset
: The offset specified in the DNS message which could not be translated into a label indexextension_bits
: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits
A DNS resource record.
[
optional bool is_complete,
( bytes | compressed_name | rindex ) name,
optional simple rr_bits,
optional uint type,
optional uint class,
optional uint ttl,
optional uint rdlength,
( bytes | mixed_rdata ) rdata
]
is_complete
: Will exist and be false if the message is not complete and following attributes may not existsname
:rr_bits
: Bitmap indicating what is present, seeDeduplication
- Bit 0: type
- Bit 1: class
- Bit 2: ttl
- Bit 3: rdlength # TODO: reverse index for TTL?
type
: The resource record typeclass
: The resource record classttl
: The resource record ttlrdlength
: The resource record rdata lengthrdata
: The resource record data
An array mixed with resource data and compressed names.
[
( bytes | compressed_name ) rdata_part,
...
]
rdata_part
: The parts of the resource records data
Each option is specified here as OptionName(OptionNumber) and optional OptionValue type.
RLABELS(0) uint
: Indicates how many labels should be stored in the reverse label index before discarding themRLABEL_MIN_SIZE(1) uint
: The minimum size a label must be to be put in the reverse label indexRDATA_RINDEX_SIZE(2) uint
: Indicates how many rdata should be stored in the reverse rdata index before discarding themRDATA_RINDEX_MIN_SIZE(3) uint
: The minimum size a rdata must be to be put in the reverse rdata indexUSE_RDATA_INDEX(4)
: If present then the stream uses rdata indexingRDATA_INDEX_MIN_SIZE(5) uint
: The minimum size a rdata must be to be put in the rdata index
Deduplication is done in a few different ways, data may be left out to indicate that it is the same as the previous value, an index may be used to indicate that it is the same as the N previous value and a reverse index may be used to indicate that it is the N previous value looking backwards across the stream.
In other words, using the index deduplication you will need to build a table of the values you come across during the decoding of the stream, this table can grow very large.
As an smaller alternative a reverse index can indicate often used data from the N previous value looking back over the stream. This type of index also reorder itself to try and put the most used data always in the index.
TODO: details of each attribute and it's deduplication