Skip to content

Commit

Permalink
feat: Block scheduler scaffolding (#15198)
Browse files Browse the repository at this point in the history
  • Loading branch information
owen-d authored Dec 2, 2024
1 parent be4f17e commit a10140d
Show file tree
Hide file tree
Showing 19 changed files with 730 additions and 37 deletions.
159 changes: 159 additions & 0 deletions pkg/blockbuilder/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Block Builder Architecture

## Overview

The Block Builder and Block Scheduler are separate components designed to build storage formats from ingested Kafka data. The Block Scheduler coordinates job distribution to multiple Block Builder instances, implementing a pull-based architecture that decouples read and write operations, allowing for independent scaling and simpler operational management. This document describes the architecture and interaction between components.

## Package Structure

The Block Builder system is organized into three main packages:

### pkg/blockbuilder/types
- Contains shared type definitions and interfaces
- Defines core data structures like `Job` and `Offsets`
- Provides interface definitions for:
- `Worker`: Interface for processing jobs and reporting status
- `Scheduler`: Interface for job scheduling and worker management
- `Transport`: Interface for communication between components

### pkg/blockbuilder/scheduler
- Implements the job queue and scheduling logic
- Manages job distribution to block builders
- Tracks job progress and ensures exactly-once processing
- Handles job state management and offset tracking

### pkg/blockbuilder/builder
- Implements the block builder worker functionality
- Processes assigned jobs and builds storage formats
- Manages transport layer communication
- Handles data processing and object storage interactions

## Component Diagram

```mermaid
graph TB
subgraph Kafka
KP[Kafka Partitions]
end
subgraph Block Scheduler
S[Scheduler]
Q[Job Queue]
PC[Partition Controller]
subgraph Transport Layer
T[gRPC/Transport Interface]
end
end
subgraph Block Builders
BB1[Block Builder 1]
BB2[Block Builder 2]
BB3[Block Builder N]
end
subgraph Storage
OS[Object Storage]
end
KP --> PC
PC --> S
S <--> Q
S <--> T
T <--> BB1
T <--> BB2
T <--> BB3
BB1 --> OS
BB2 --> OS
BB3 --> OS
```

## Job Processing Sequence

```mermaid
sequenceDiagram
participant PC as Partition Controller
participant S as Block Scheduler
participant Q as Queue
participant T as Transport
participant BB as Block Builder
participant OS as Object Storage
loop Monitor Partitions
PC->>PC: Check for new offsets
PC->>S: Create Job (partition, offset range)
S->>Q: Enqueue Job
end
BB->>T: Request Job
T->>S: Forward Request
S->>Q: Dequeue Job
Q-->>S: Return Job (or empty)
alt Has Job
S->>T: Send Job
T->>BB: Forward Job
BB->>OS: Process & Write Data
BB->>T: Report Success
T->>S: Forward Status
S->>PC: Commit Offset
else No Job
S->>T: Send No Job Available
T->>BB: Forward Response
end
```

## Core Components

### Job and Offsets
- `Job`: Represents a unit of work for processing Kafka data
- Contains a partition ID and an offset range
- Immutable data structure that can be safely passed between components
- `Offsets`: Defines a half-open range [min,max) of Kafka offsets to process
- Used to track progress and ensure exactly-once processing

### Block Scheduler
- Central component responsible for:
- Managing the job queue
- Coordinating Block Builder assignments
- Tracking job progress
- Implements a pull-based model where Block Builders request jobs
- Decoupled from specific transport mechanisms through the Transport interface

### Block Builder
- Processes jobs assigned by the Block Scheduler
- Responsible for:
- Building storage formats from Kafka data
- Writing completed blocks to object storage
- Reporting job status back to scheduler
- Implements the Worker interface for job processing

### Transport Layer
- Provides communication between Block Builders and Scheduler
- Abstracts transport mechanism (currently in-memory & gRPC)
- Defines message types for:
- Job requests
- Job completion notifications
- Job synchronization

## Design Principles

### Decoupled I/O
- Business logic is separated from I/O operations
- Transport interface allows for different communication mechanisms
- Enables easier testing through mock implementations

### Stateless Design
- Block Builders are stateless workers
- All state is managed by the Scheduler
- Allows for easy scaling and failover

### Pull-Based Architecture
- Block Builders pull jobs when ready
- Natural load balancing
- Prevents overloading of workers


### Interface-Driven Development
- Core components defined by interfaces
- Allows for multiple implementations
- Facilitates testing and modularity
16 changes: 16 additions & 0 deletions pkg/blockbuilder/builder/builder_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
package builder

import (
"github.com/grafana/loki/v3/pkg/blockbuilder/types"
)

// TestBuilder implements Worker interface for testing
type TestBuilder struct {
*Worker
}

func NewTestBuilder(builderID string, transport types.Transport) *TestBuilder {
return &TestBuilder{
Worker: NewWorker(builderID, transport),
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"context"
Expand All @@ -7,26 +7,16 @@ import (

"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/prometheus/prometheus/model/labels"

"github.com/grafana/dskit/backoff"
"github.com/prometheus/prometheus/model/labels"

"github.com/grafana/loki/v3/pkg/blockbuilder/types"
"github.com/grafana/loki/v3/pkg/kafka"
"github.com/grafana/loki/v3/pkg/kafka/partition"

"github.com/grafana/loki/pkg/push"
)

// [min,max)
type Offsets struct {
Min, Max int64
}

type Job struct {
Partition int32
Offsets Offsets
}

// Interface required for interacting with queue partitions.
type PartitionController interface {
Topic() string
Expand All @@ -43,7 +33,7 @@ type PartitionController interface {
// so it's advised to not buffer the channel for natural backpressure.
// As a convenience, it returns the last seen offset, which matches
// the final record sent on the channel.
Process(context.Context, Offsets, chan<- []AppendInput) (int64, error)
Process(context.Context, types.Offsets, chan<- []AppendInput) (int64, error)

Close() error
}
Expand Down Expand Up @@ -125,7 +115,7 @@ func (l *PartitionJobController) EarliestPartitionOffset(ctx context.Context) (i
)
}

func (l *PartitionJobController) Process(ctx context.Context, offsets Offsets, ch chan<- []AppendInput) (int64, error) {
func (l *PartitionJobController) Process(ctx context.Context, offsets types.Offsets, ch chan<- []AppendInput) (int64, error) {
l.part.SetOffsetForConsumption(offsets.Min)

var (
Expand Down Expand Up @@ -188,16 +178,16 @@ func (l *PartitionJobController) Process(ctx context.Context, offsets Offsets, c

// LoadJob(ctx) returns the next job by finding the most recent unconsumed offset in the partition
// Returns whether an applicable job exists, the job, and an error
func (l *PartitionJobController) LoadJob(ctx context.Context) (bool, Job, error) {
func (l *PartitionJobController) LoadJob(ctx context.Context) (bool, *types.Job, error) {
// Read the most recent committed offset
committedOffset, err := l.HighestCommittedOffset(ctx)
if err != nil {
return false, Job{}, err
return false, nil, err
}

earliestOffset, err := l.EarliestPartitionOffset(ctx)
if err != nil {
return false, Job{}, err
return false, nil, err
}

startOffset := committedOffset + 1
Expand All @@ -207,28 +197,27 @@ func (l *PartitionJobController) LoadJob(ctx context.Context) (bool, Job, error)

highestOffset, err := l.HighestPartitionOffset(ctx)
if err != nil {
return false, Job{}, err
return false, nil, err
}

if highestOffset < committedOffset {
level.Error(l.logger).Log("msg", "partition highest offset is less than committed offset", "highest", highestOffset, "committed", committedOffset)
return false, Job{}, fmt.Errorf("partition highest offset is less than committed offset")
return false, nil, fmt.Errorf("partition highest offset is less than committed offset")
}

if highestOffset == committedOffset {
level.Info(l.logger).Log("msg", "no pending records to process")
return false, Job{}, nil
return false, nil, nil
}

// Create the job with the calculated offsets
job := Job{
Partition: l.part.Partition(),
Offsets: Offsets{
Min: startOffset,
Max: min(startOffset+l.stepLen, highestOffset),
},
offsets := types.Offsets{
Min: startOffset,
Max: min(startOffset+l.stepLen, highestOffset),
}

// Convert partition from int32 to int
job := types.NewJob(int(l.part.Partition()), offsets)
return true, job, nil
}

Expand Down Expand Up @@ -279,7 +268,7 @@ func (d *dummyPartitionController) Commit(_ context.Context, offset int64) error
return nil
}

func (d *dummyPartitionController) Process(ctx context.Context, offsets Offsets, ch chan<- []AppendInput) (int64, error) {
func (d *dummyPartitionController) Process(ctx context.Context, offsets types.Offsets, ch chan<- []AppendInput) (int64, error) {
for i := int(offsets.Min); i < int(offsets.Max); i++ {
batch := d.createBatch(i)
select {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"github.com/prometheus/client_golang/prometheus"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"context"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"context"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"bytes"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"context"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"os"
Expand Down
58 changes: 58 additions & 0 deletions pkg/blockbuilder/builder/transport.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
package builder

import (
"context"

"github.com/grafana/loki/v3/pkg/blockbuilder/types"
)

var (
_ types.Transport = unimplementedTransport{}
_ types.Transport = &MemoryTransport{}
)

// unimplementedTransport provides default implementations that panic
type unimplementedTransport struct{}

func (t unimplementedTransport) SendGetJobRequest(_ context.Context, _ *types.GetJobRequest) (*types.GetJobResponse, error) {
panic("unimplemented")
}

func (t unimplementedTransport) SendCompleteJob(_ context.Context, _ *types.CompleteJobRequest) error {
panic("unimplemented")
}

func (t unimplementedTransport) SendSyncJob(_ context.Context, _ *types.SyncJobRequest) error {
panic("unimplemented")
}

// MemoryTransport implements Transport interface for in-memory communication
type MemoryTransport struct {
scheduler types.Scheduler
}

// NewMemoryTransport creates a new in-memory transport instance
func NewMemoryTransport(scheduler types.Scheduler) *MemoryTransport {
return &MemoryTransport{
scheduler: scheduler,
}
}

func (t *MemoryTransport) SendGetJobRequest(ctx context.Context, req *types.GetJobRequest) (*types.GetJobResponse, error) {
job, ok, err := t.scheduler.HandleGetJob(ctx, req.BuilderID)
if err != nil {
return nil, err
}
return &types.GetJobResponse{
Job: job,
OK: ok,
}, nil
}

func (t *MemoryTransport) SendCompleteJob(ctx context.Context, req *types.CompleteJobRequest) error {
return t.scheduler.HandleCompleteJob(ctx, req.BuilderID, req.Job)
}

func (t *MemoryTransport) SendSyncJob(ctx context.Context, req *types.SyncJobRequest) error {
return t.scheduler.HandleSyncJob(ctx, req.BuilderID, req.Job)
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package blockbuilder
package builder

import (
"bytes"
Expand Down
Loading

0 comments on commit a10140d

Please sign in to comment.