Package protoscan
is a low-level reader for
protocol buffers encoded data
in Golang. The main feature is the support for lazy/conditional decoding of fields.
This library can help decoding performance in two ways:
-
fields can be conditionally decoded, skipping over fields that are not needed for a specific use-case,
-
decoding directly into specific types or perform other transformations, the extra state can be skipped by manually decoding into the types directly.
Please be aware that to decode an entire message it is still faster to use gogoprotobuf. After much testing I think this is due to the generated code inlining almost all code to eliminate the function call overhead.
Warning: Writing code with this library is like writing the auto-generated protobuf decoder and is very time-consuming. It should only be used for specific use cases and for stable protobuf definitions.
First, the encoded protobuf data is used to initialize a new Message. Then you iterate over the fields, reading or skipping them.
msg := protoscan.New(encodedData)
for msg.Next() {
switch msg.FieldNumber() {
case 1: // an int64 type
v, err := msg.Int64()
if err != nil {
// handle
}
case 3: // repeated number types can be returned as a slice
ids, err := msg.RepeatedInt64(nil)
if err != nil {
// handle
}
case 2: // for more control repeated+packed fields can be read using an iterator
iter, err := msg.Iterator(nil)
if err != nil {
// handle
}
userIDs := make([]UserID, 0, iter.Count(protoscan.WireTypeVarint))
for iter.HasNext() {
v, err := iter.Int64()
if err != nil {
// handle
}
userIDs = append(userIDs, UserID(v))
}
default:
msg.Skip() // required if value not needed.
}
}
if msg.Err() != nil {
// handle
}
After calling Next()
you MUST call an accessor function (Int64()
, RepeatedInt64()
,
Iterator()
, etc.) or Skip()
to ignore the field. All these functions, including
Next()
and Skip()
, must not be called twice in a row.
There is an accessor for each one the protobuf scalar value types.
For repeated fields there is a corresponding set of functions like
RepeatedInt64(buf []int64) ([]int64, error)
. Repeated fields may or may not be packed, so you
should pass in a pre-created buffer variable when calling. For example
var ids []int64
msg := protoscan.New(encodedData)
for msg.Next() {
switch msg.FieldNumber() {
case 1: // repeated int64 field
var err error
ids, err = msg.RepeatedInt64(ids)
if err != nil {
// handle
}
default:
msg.Skip()
}
}
if msg.Err() != nil {
// handle
}
If the ids are 'packed', RepeatedInt64()
will be called once. If the ids are simply repeated
RepeatedInt64()
will be called N times, but the resulting array of ids will be the same.
For more control over the values in a packed, repeated field use an Iterator. See above for an example.
Embedded messages can be handled recursively, or the raw data can be returned and decoded
using a standard/auto-generated proto.Unmarshal
function.
msg := protoscan.New(encodedData)
for msg.Next() {
fn := msg.FieldNumber()
// use protoscan recursively
if fn == 1 && needFieldNumber1 {
embeddedMsg, err := msg.Message()
for embeddedMsg.Next() {
switch embeddedMsg.FieldNumber() {
case 1:
// do something
default:
embeddedMsg.Skip()
}
}
}
// if you need the whole message decode the message in the standard way.
if fn == 2 && needFieldNumber2 {
data, err := msg.MessageData()
v := &ProtoBufThing()
err = proto.Unmarshal(data, v)
}
}
For Errors can occure for two reason:
- The field is being read as the incorrect type.
- The data is corrupted or somehow invalid.
Starting with a customer message with embedded orders and items and you only want to count the number of items in open orders.
message Customer {
required int64 id = 1;
optional string username = 2;
repeated Order orders = 3;
repeated int64 favorite_ids = 4 [packed=true];
}
message Order {
required int64 id = 1;
required bool open = 2;
repeated Item items = 3;
}
message Item {
// a big object
}
Sample Code:
openCount := 0
itemCount := 0
favoritesCount := 0
customer := protoscan.New(data)
for customer.Next() {
switch customer.FieldNumber() {
case 1: // id
id, err := customer.Int64()
if err != nil {
panic(err)
}
_ = id // do something or skip this case if not needed
case 2: // username
username, err := customer.String()
if err != nil {
panic(err)
}
_ = username // do something or skip this case if not needed
case 3: // orders
open := false
count := 0
orderData, _ := customer.MessageData()
order := protoscan.New(orderData)
for order.Next() {
switch order.FieldNumber() {
case 2: // open
v, _ := order.Bool()
open = v
case 3: // item
count++
// we're not reading the data but we still need to skip it.
order.Skip()
default:
// required to move past unneeded fields
order.Skip()
}
}
if open {
openCount++
itemCount += count
}
case 4: // favorite ids
iter, err := customer.Iterator(nil)
if err != nil {
panic(err)
}
// Typically this section would only be run once but it is valid
// protobuf to contain multiple sections of repeated fields that should
// be concatenated together.
favoritesCount += iter.Count(protoscan.WireTypeVarint)
default:
// unread fields must be skipped
customer.Skip()
}
}
fmt.Printf("Open Orders: %d\n", openCount)
fmt.Printf("Items: %d\n", itemCount)
fmt.Printf("Favorites: %d\n", favoritesCount)
// Output:
// Open Orders: 2
// Items: 4
// Favorites: 8
Groups are an old protobuf wire type that has been deprecated for a long time. They function as parentheses but with no "data length" information so their content can not be effectively skipped. Just the start and end group indicators can be read and skipped like any other field. This would cause the data to be read without the parentheses, whatever that may mean in practice. To get the raw protobuf data inside a group try something like:
var (
groupFieldNum = 123
groupData []byte
)
msg := New(data)
for msg.Next() {
if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeStartGroup {
start, end := msg.Index, msg.Index
for msg.Next() {
msg.Skip()
if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeEndGroup {
break
}
end = msg.Index
}
// groupData would be the raw protobuf encoded bytes of the fields in the group.
groupData = msg.Data[start:end]
}
}