Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control block buffer size for performance #776

Merged
merged 6 commits into from
Oct 5, 2022
Merged

Conversation

gingerwizard
Copy link
Collaborator

Block buffer size has historically been 2. This limits the parallelization of scan and decoding. This change introduces the setting BlockBufferSize (defaults to 2). Users can increase this at the expense of memory in return for speedups on read e.g.

func TestRead(b *testing.T) {
	conn := getConnection()
	start := time.Now()
	rows, err := conn.Query(clickhouse.Context(context.Background(), clickhouse.WithBlockBufferSize(2)), fmt.Sprintf(`SELECT number FROM system.numbers_mt LIMIT 500000000`))
	if err != nil {
		b.Fatal(err)
	}
	var (
		col1 uint64
	)
	for rows.Next() {
		if err := rows.Scan(&col1); err != nil {
			b.Fatal(err)
		}
	}
	elapsed := time.Since(start)
	log.Printf("Read took %s", elapsed)
}

takes

=== RUN   TestRead
2022/10/04 11:10:12 Read took 45.440940766s

increasing to clickhouse.WithBlockBufferSize(100).

gives

=== RUN   TestRead
2022/10/04 11:25:15 Read took 22.623354852s

We leave users to tune as blocks can be variable size due to strings.

Can be set on the connection level with BlockBufferSize or via dsn with block_buffer_size or via specific query as shown above.

@gingerwizard gingerwizard merged commit 0fdf4d8 into main Oct 5, 2022
@gingerwizard gingerwizard deleted the control_buffer branch October 19, 2022 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant