mapreduce

Why we have this repo

mapreduce is part of go-zero, but a few people asked if mapreduce can be used separately. But I recommend you to use go-zero for many more features.

Why MapReduce is needed

In practical business scenarios we often need to get the corresponding properties from different rpc services to assemble complex objects.

For example, to query product details.

product service - query product attributes
inventory service - query inventory properties
price service - query price attributes
marketing service - query marketing properties

If it is a serial call, the response time will increase linearly with the number of rpc calls, so we will generally change serial to parallel to optimize response time.

Simple scenarios using WaitGroup can also meet the needs, but what if we need to check the data returned by the rpc call, data processing, data aggregation? The official go library does not have such a tool (CompleteFuture is provided in java), so we implemented an in-process data batching MapReduce concurrent tool based on the MapReduce architecture.

Design ideas

Let's sort out the possible business scenarios for the concurrency tool:

querying product details: supporting concurrent calls to multiple services to combine product attributes, and supporting call errors that can be ended immediately.
automatic recommendation of user card coupons on product details page: support concurrently verifying card coupons, automatically rejecting them if they fail, and returning all of them.

The above is actually processing the input data and finally outputting the cleaned data. There is a very classic asynchronous pattern for data processing: the producer-consumer pattern. So we can abstract the life cycle of data batch processing, which can be roughly divided into three phases.

data production generate
data processing mapper
data aggregation reducer

Data producing is an indispensable stage, data processing and data aggregation are optional stages, data producing and processing support concurrent calls, data aggregation is basically a pure memory operation, so a single concurrent process can do it.

Since different stages of data processing are performed by different goroutines, it is natural to consider the use of channel to achieve communication between goroutines.

How can I terminate the process at any time?

It's simple, just receive from a channel or the given context in the goroutine.

Choose the right version

v1 (default) - non-generic version
v2 (generics) - generic version, needs Go version >= 1.18

A simple example

Calculate the sum of squares, simulating the concurrency.

package main

import (
    "fmt"
    "log"

    "github.com/kevwan/mapreduce/v2"
)

func main() {
    val, err := mapreduce.MapReduce(func(source chan<- int) {
        // generator
        for i := 0; i < 10; i++ {
            source <- i
        }
    }, func(i int, writer mapreduce.Writer[int], cancel func(error)) {
        // mapper
        writer.Write(i * i)
    }, func(pipe <-chan int, writer mapreduce.Writer[int], cancel func(error)) {
        // reducer
        var sum int
        for i := range pipe {
            sum += i
        }
        writer.Write(sum)
    })
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("result:", val)
}

More examples: https://github.com/zeromicro/zero-examples/tree/main/mapreduce

References

go-zero: https://github.com/zeromicro/go-zero

Give a Star! ⭐

If you like or are using this project to learn or start your solution, please give it a star. Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
go.mod		go.mod
go.sum		go.sum
mapreduce.go		mapreduce.go
mapreduce_fuzz_test.go		mapreduce_fuzz_test.go
mapreduce_test.go		mapreduce_test.go
readme-cn.md		readme-cn.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mapreduce

Why we have this repo

Why MapReduce is needed

Design ideas

Choose the right version

A simple example

References

Give a Star! ⭐

About

Releases 10

Contributors 2

Languages

License

kevwan/mapreduce

Folders and files

Latest commit

History

Repository files navigation

mapreduce

Why we have this repo

Why MapReduce is needed

Design ideas

Choose the right version

A simple example

References

Give a Star! ⭐

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Contributors 2

Languages