Skip to content

Type-safe, automatic, asynchronous batch processing.

License

Notifications You must be signed in to change notification settings

mikestefanello/batcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Batcher

Go Report Card Test License: MIT Go Reference GoT

Overview

Batcher is a Go library that provides a type-safe, easy way to batch together arbitrary groups of items to be automatically and asynchronously processed. An item can be of any type that you want to pass to your processor. Items are queued within groups (using a string as a name). If you do not have a need to separately group items, you can specify the same, or empty, group name for all items. Each group of items is separately sent to the processor callback that you specify.

The queue can be configured to automatically execute the batch operation under any of the following circumstances:

  1. A specified duration has elapsed since the last process executed
  2. The amount of total items queued (across all groups) has exceeded a given threshold
  3. The amount of groups of items has exceeded a given threshold

Installation

go get github.com/mikestefanello/batcher

Usage

As an example, we'll create a batcher to process items of type Log which will be grouped together via their Level then written in bulk.

b, err := batcher.NewBatcher[Log](Config[Log]{
    // Process the batch if we've queued 5 different Level values
    GroupCountThreshold: 5,
    // Process the batch if we've queued 100 Log items
    ItemCountThreshold:  100,
    // Process the queue every 30 seconds
    DelayThreshold:      30 * time.Second,
    // Use 3 Goroutines to process the queue groups
    NumGoroutines:       3,
    // Execute this func for each queued group
    Processor:           func(group string, log []Log) {
        writeLogs(logs)
    },
})

Add a Log to the batch queue.

b.Add(log.Level, log)

Origin

This concept was originally devised to handle some challenges faced with PubSub messages but was made entirely generic for this library so it could be used for any purpose.

  1. Group incoming messages by a unique identifier, ie, a user ID, to deduplicate requests and ensure a given pod was only ever executing operations for a given ID in isolation.
  2. Group all incoming messages to bulk-export to persistent storage for archiving.

To roughly illustrate an example in code:

func main() {
    b, err := batcher.NewBatcher[*pubsub.Message](Config[*pubsub.Message]{
        GroupCountThreshold: 10,
        ItemCountThreshold:  100,
        DelayThreshold:      10 * time.Second,
        NumGoroutines:       3,
        Processor:           func(group string, items []*pubsub.Message) {
            err := doUserOperation(group)

            for _, m := range items {
                if err != nil {
                    m.Nack()
                } else {
                    m.Ack()
                }
            }
        },
    })


    // Consume the messages from PubSub
    err = subscription.Receive(ctx, func(ctx context.Context, message *pubsub.Message) {
        // For this example, Data is the user ID
        b.Add(string(message.Data), message)
    })
}