-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use a more cache-friendly way to make join rows #7420
Comments
Sounds reasonable. Is there any quantitative evaluation of this issue? |
@shenli Here is a simple benchmark, the benchmark result is: ➜ go test -bench Benchmark -count 3 --benchmem
goos: darwin
goarch: amd64
BenchmarkCopyFieldByField-8 20000 61823 ns/op 0 B/op 0 allocs/op
BenchmarkCopyFieldByField-8 20000 64356 ns/op 0 B/op 0 allocs/op
BenchmarkCopyFieldByField-8 20000 65751 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumn-8 100000 19014 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumn-8 100000 19174 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumn-8 100000 18989 ns/op 0 B/op 0 allocs/op
PASS
ok _/tmp 12.195s the benchmark code is: package main
import "testing"
var (
numRows = 1024
numCols = 16
)
func genData() [][]int64 {
columns := make([][]int64, numCols)
for i := 0; i < numCols; i++ {
columns[i] = make([]int64, numRows)
}
return columns
}
func BenchmarkCopyFieldByField(b *testing.B) {
src := genData()
dst := genData()
b.ResetTimer()
for counter := 0; counter < b.N; counter++ {
for i := 0; i < numRows; i++ {
for j := 0; j < numCols; j++ {
dst[j][i] = src[j][i]
}
}
}
}
func BenchmarkCopyColumnByColumn(b *testing.B) {
src := genData()
dst := genData()
b.ResetTimer()
for counter := 0; counter < b.N; counter++ {
for j := 0; j < numCols; j++ {
for i := 0; i < numRows; i++ {
dst[j][i] = src[j][i]
}
}
}
} |
Great! Much faster! |
And with the "copy column by column" method, we can copy a continuously memory, which also speeds up the execution time: ➜ go test -bench Benchmark -count 3 --benchmem
goos: darwin
goarch: amd64
BenchmarkCopyFieldByField-8 30000 59081 ns/op 0 B/op 0 allocs/op
BenchmarkCopyFieldByField-8 20000 59146 ns/op 0 B/op 0 allocs/op
BenchmarkCopyFieldByField-8 30000 58260 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumn-8 100000 17915 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumn-8 100000 18199 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumn-8 100000 18134 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8 300000 3712 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8 300000 3635 ns/op 0 B/op 0 allocs/op
BenchmarkCopyColumnByColumnContinuously-8 300000 3628 ns/op 0 B/op 0 allocs/op
PASS
ok _/tmp 15.956s package main
import "testing"
var (
numRows = 1024
numCols = 16
)
func genData() [][]int64 {
columns := make([][]int64, numCols)
for i := 0; i < numCols; i++ {
columns[i] = make([]int64, numRows)
}
return columns
}
func BenchmarkCopyFieldByField(b *testing.B) {
src := genData()
dst := genData()
b.ResetTimer()
for counter := 0; counter < b.N; counter++ {
for i := 0; i < numRows; i++ {
for j := 0; j < numCols; j++ {
dst[j][i] = src[j][i]
}
}
}
}
func BenchmarkCopyColumnByColumn(b *testing.B) {
src := genData()
dst := genData()
b.ResetTimer()
for counter := 0; counter < b.N; counter++ {
for j := 0; j < numCols; j++ {
for i := 0; i < numRows; i++ {
dst[j][i] = src[j][i]
}
}
}
}
func BenchmarkCopyColumnByColumnContinuously(b *testing.B) {
src := genData()
dst := genData()
b.ResetTimer()
for counter := 0; counter < b.N; counter++ {
for j := 0; j < numCols; j++ {
copy(dst[j], src[j])
}
}
} |
Just fund that the cache utilization of function 145 func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) {
146 j.selected, err = expression.VectorizedFilter(j.ctx, j.conditions, chunk.NewIterator4Chunk(input), j.selected)
147 if err != nil {
148 return false, errors.Trace(err)
149 }
150 for i := 0; i < len(j.selected); i++ {
151 if !j.selected[i] {
152 continue
153 }
154 matched = true
155 output.AppendRow(input.GetRow(i))
156 }
157 return matched, nil
158 } We should copy the selected row column by column as well. |
In my perspective, the first argument of tryToMatch of joiner interface should be changed to pass chunk.column rather than chunk.Row. Am I right? If so, there are many places must to be changed. |
@laidahe No, it should be |
I'm working on it~ |
@supernan1994 Oh, Thanks! |
fixed by #7493 |
For left outer join, right outer join, and inner joins, we use the function
makeJoinRowToChunk
to concat the left and right records, take the left outer join as an example:in file executor/joiner.go:
This method is far away from efficiency:
Chunk
has an column-oriented memory layout, data inside the same column are stored continuously in the memory. The relationship between row and column can be expressed like the following picture:The function
makeJoinRowToChunk
copies data from the left and right row to the chunk field by field, which can be expressed like the following graph:As you can see, the cache utilization of this method is not efficient, which may introduce a lot of cache miss and data swap. Usually, the L1 data cache size is 32KB, the chunk size we use is 1K(1024 rows), if the joined table has too many columns, these data to be calculated can easily exceed the L1 data cache.
If we copy the data column by column, the L1 data cache miss ratio can be greatly reduced during the execution, and the performance can also be improved.
The text was updated successfully, but these errors were encountered: