index.json

[{"content":"","date":null,"permalink":"/posts/","section":"Blog","summary":"","title":"Blog"},{"content":"","date":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories"},{"content":"","date":null,"permalink":"/categories/concurrency/","section":"Categories","summary":"","title":"Concurrency"},{"content":"","date":null,"permalink":"/tags/go/","section":"Tags","summary":"","title":"Go"},{"content":"","date":null,"permalink":"/series/go-concurrency/","section":"Series","summary":"","title":"Go Concurrency"},{"content":"","date":null,"permalink":"/tags/goroutines/","section":"Tags","summary":"","title":"Goroutines"},{"content":"More Real-world Usage #Datadog has excellent software engineers. Nevertheless, it\u0026rsquo;s easy to find code with race conditions. Let\u0026rsquo;s examine a simplified version of waitForConfigsFromAD:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 package main import ( \u0026#34;fmt\u0026#34; \u0026#34;runtime\u0026#34; \u0026#34;time\u0026#34; \u0026#34;go.uber.org/atomic\u0026#34; ) type Config struct{} type Component interface { AddScheduler(s func([]Config)) } func waitForConfigsFromAD(discoveryMinInstances int, ac Component) (configs []Config, returnErr error) { configChan := make(chan Config) // signal to the scheduler when we are no longer waiting, so we do not continue // to push items to configChan waiting := atomic.NewBool(true) defer func() { waiting.Store(false) // ..and drain any message currently pending in the channel select { case \u0026lt;-configChan: default: } }() // add the scheduler in a goroutine, since it will schedule any \u0026#34;catch-up\u0026#34; immediately, // placing items in configChan go ac.AddScheduler(func(configs []Config) { for _, cfg := range configs { if waiting.Load() { runtime.Gosched() configChan \u0026lt;- cfg } } }) for len(configs) \u0026lt; discoveryMinInstances { cfg := \u0026lt;-configChan configs = append(configs, cfg) } return } Side note:1 This code is used in the Agent Check Status CLI. You will not notice resource leaks there.\nWhat Are the Issues? #Utilizing an atomic Boolean as a “finished” indicator, coupled with a deferred goroutine to set it and subsequently draining a channel seems clever2 to me.\nUnfortunately, it has a race condition. When we first test the atomic waiting (seeing it be true), and then in parallel exit waitForConfigsFromAD, spawning the deferred goroutine which tries to drain the channel, we leak the goroutine at line 38 because no one will ever read from configChan.\nTry it on the Go Playground.\nAn Alternative Implementation #Let us try a synchronous approach. The original is a little more complicated, but let\u0026rsquo;s simply assume we want to collect configurations until we either:\ncollected a fixed number. encountered a configuration error. canceled the passed context, for example in a time out. Also, we are interested in the list of encountered errors. On cancelation we just return what we have so far, without signaling an error.\nSimply put, we need a collector that allows us to wait until it is finished collecting according to the criteria above and ask what it has collected so far. Something like:\ntype Collector struct { // ... } func NewCollector(discoveryMinInstances int) *Collector { // ... } func (c *Collector) Schedule(configs []Config) { // ... } func (c *Collector) Done() \u0026lt;-chan struct{} { // ... } func (c *Collector) Result() ([]Config, error) { // ... } Given that, we can reimplement waitForConfigsFromAD:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 func waitForConfigsFromAD(ctx context.Context, discoveryMinInstances int, ac Component) ([]Config, error) { c := NewCollector(discoveryMinInstances) ac.AddScheduler(c.Schedule) select { case \u0026lt;-ctx.Done(): case \u0026lt;-c.Done(): } // ac.RemoveScheduler(c.Schedule) return c.Result() } Simple, synchronous code - we could even think about removing the scheduler from the component after it is done.\nNow everything else falls into place:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 type Collector struct { discoveryMinInstances int mu sync.Mutex // protects configs, errors configs []Config errors []error done chan struct{} setDone func() } func NewCollector(discoveryMinInstances int) *Collector { done := make(chan struct{}) setDone := sync.OnceFunc(func() { close(done) }) return \u0026amp;Collector{ discoveryMinInstances: discoveryMinInstances, done: done, setDone: setDone, } } func (c *Collector) Schedule(configs []Config) { for _, cfg := range configs { if filterErrors := filterInstances(cfg); len(filterErrors) \u0026gt; 0 { c.addErrors(filterErrors) c.setDone() continue } if !c.addConfig(cfg) { c.setDone() } } } func (c *Collector) Done() \u0026lt;-chan struct{} { return c.done } func (c *Collector) Result() ([]Config, error) { c.mu.Lock() defer c.mu.Unlock() configs := c.configs c.configs = nil err := errors.Join(c.errors...) c.errors = nil return configs, err } func (c *Collector) addConfig(cfg Config) bool { c.mu.Lock() defer c.mu.Unlock() if len(c.configs) \u0026lt; c.discoveryMinInstances { c.configs = append(c.configs, cfg) } return len(c.configs) \u0026lt; c.discoveryMinInstances } func (c *Collector) addErrors(errs []error) { c.mu.Lock() defer c.mu.Unlock() if len(errs) \u0026gt; 0 { c.errors = append(c.errors, errs...) } } There\u0026rsquo;s definite room for improvement here, but the key takeaway is that it can just be written down and it is easily testable.\nSummary #We replaced asynchronous code with a race condition with a synchronous, thread safe implementation. Like in previous posts, refactoring and separation of concerns helps structuring our tasks and avoid errors.\nThis blog is not intended to assign blame or irresponsibly publish bugs. We are here to learn.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n“Clear is better than clever”. Rob Pike. 2015. Go Proverbs with Rob Pike — Gopherfest 2015 — \u0026lt;golang.org/doc/effective_go.html#sharing\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"16 April 2024","permalink":"/posts/concurrency-bugs-2/","section":"Blog","summary":"More Real-world Usage #Datadog has excellent software engineers.","title":"More Concurrency Bugs"},{"content":"","date":null,"permalink":"/series/","section":"Series","summary":"","title":"Series"},{"content":"","date":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags"},{"content":"Errors Observed in Real-world Usage #In “ Problems With Concurrency” I mentioned that I see concurrency issues a lot. Let\u0026rsquo;s look at something I recently found in production:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 package main func task() ([]int, error) { // We want to calculate all results in parallel. So we create a wait group with 3 elements. We will call wg.Done() // after each goroutine is finished. We also create a mutex to lock the result while writing to it. We also create // an error channel to collect errors from the goroutines. var wg sync.WaitGroup var mu sync.Mutex errs := make(chan error) // Create result result := make([]int ,3) // We create a wait group with 3 elements. We will call wg.Done() after each goroutine is finished. wg.Add(3) // We start all goroutines. go setResult(result, 0, \u0026amp;wg, \u0026amp;mu, errs) go setResult(result, 1, \u0026amp;wg, \u0026amp;mu, errs) go setResult(result, 2, \u0026amp;wg, \u0026amp;mu, errs) wg.Wait() // Handle errors close(errs) for err := range errs { return nil, err } return result, nil } func setResult(result []int, i int, wg *sync.WaitGroup, mu *sync.Mutex, errs chan\u0026lt;- error) { defer wg.Done() r, err := calculate(i) if err != nil { errs \u0026lt;- err return } mu.Lock() result[i] = r mu.Unlock() } func calculate(i int) (int, error) { switch i { case 0: return 1, nil case 11: return 2, nil case 2: return 3, nil } return 0, errTest } var errTest = errors.New(\u0026#34;test\u0026#34;) Try it on the Go Playground.\nThis code is pretty similar to the one in “ Problems With Concurrency” before it was modified in the blog. It makes nearly that same mistake and deadlocks on error conditions. Obviously1, the fix is not to make this function more complex and use errs := make(chan error, 3).\nWhat Are the Issues? #This code very nicely demonstrates what you get when you start with concurrent code without thinking about the design.\nLet first fix it, then analyze its problems. The task is divided into three independent calculations, where failure in any one of them results in the failure of the entire task. The synchronous version would be:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 func task() ([]int, error) { const results = 3 result := make([]int, results) var err error for i := 0; i \u0026lt; results; i++ { result[i], err = calculate(i) if err != nil { break } } return result, err } Where we see immediately that we don\u0026rsquo;t need the mutex, since the results don\u0026rsquo;t overlap. Transforming into a parallel version gives us:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import \u0026#34;golang.org/x/sync/errgroup\u0026#34; func task() ([]int, error) { const results = 3 result := make([]int, results) var g errgroup.Group for i := 0; i \u0026lt; results; i++ { r := \u0026amp;result[i] // Added for clarity g.Go(func() (err error) { *r, err = calculate(i) return err }) } return result, g.Wait() } Given that this task appears straightforward, why are there bugs in the production code?\nThe comments provide valuable insights, beginning with the directive to \u0026ldquo;calculate \u0026hellip; in parallel\u0026rdquo;, but without considering the task\u0026rsquo;s purpose and communication methods. Consequently, it establishes \u0026hellip; as synchronization points, along with \u0026hellip; and \u0026hellip;\nThe code wasn\u0026rsquo;t initially designed with correctness as the primary concern. Instead, it began as an asynchronous version, employing go as a substitute for subroutine calls, with fixes being implemented reactively as problems arose. Regrettably, this pattern is frequently observed among junior Go developers.\nSummary #In designing asynchronous programs, it is often better to begin with a synchronous, correct version. This initial approach might even suffice in terms of performance, with parallelism introduced by a calling function, such as operating within one of multiple web requests. Additionally, it\u0026rsquo;s worth noting that concurrency doesn\u0026rsquo;t always need to be that extremely fine-grained, especially considering that the number of CPUs in a machine is limited.\nMoreover, channels and synchronization points should be purposefully integrated, not employed as mere necessities. The need for error checking shouldn\u0026rsquo;t be retrospectively fixed by a channel with slapped-on synchronization primitives.\nAs mentioned before, addition of magic numbers reduces maintainability.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"15 April 2024","permalink":"/posts/concurrency-bugs-1/","section":"Blog","summary":"Errors Observed in Real-world Usage #In “ Problems With Concurrency” I mentioned that I see concurrency issues a lot.","title":"Concurrency Bugs"},{"content":"","date":null,"permalink":"/series/structured-concurrency/","section":"Series","summary":"","title":"Structured Concurrency"},{"content":"","date":null,"permalink":"/tags/java/","section":"Tags","summary":"","title":"Java"},{"content":"\u0026hellip; continued from the previous post.\nStructured Concurrency Preview #I’ve written about structured concurrency, and Java has a preview API1 StructuredTaskScope2:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import com.fillmore_labs.blog.jvt.Slow2; import java.util.concurrent.StructuredTaskScope; void main() { var deadline = Instant.now().plusMillis(100L); try (var scope = new StructuredTaskScope.ShutdownOnFailure()) { for (int i = 0; i \u0026lt; 1_000; i++) { // var queryStart = Instant.now(); scope.fork( () -\u0026gt; { Slow2.fibonacci(27); // var duration = Duration.between(queryStart, Instant.now()); return null; }); } scope.joinUntil(deadline); } } In Java structured concurrency includes cancelation via thread interruption, aborting the unfinished calculations. We use our old recursive Fibonacci calculation as Slow2, made cancelable with:\nif (Thread.interrupted()) { throw new InterruptedException(); } When we run this, it exits after around 100 Milliseconds:\n\u0026gt; bazel run //:try6 INFO: Running command line: bazel-bin/try6 *** Finished 129 runs (871 canceled) in 113.267ms - avg 50.995ms, stddev 16.769ms Which shows us that all virtual threads are started, even though we could only finish 129. Extending the deadline to run to completion gives:\n*** Finished 1000 runs (0 canceled) in 373.313ms - avg 172.825ms, stddev 92.234ms So, Thread.interrupted() is not free (it\u0026rsquo;s the blue areas on top), but performant enough to call it often.\nAnother Example #Mirroring our Go experiments we define a task and a function calling it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 import java.time.Duration; import java.time.Instant; import java.util.concurrent.StructuredTaskScope; final Duration processingTime = Duration.ofSeconds(1); void main() throws Exception { try (var scope = new StructuredTaskScope.ShutdownOnFailure()) { var start = Instant.now(); scope.fork( () -\u0026gt; { task(\u0026#34;task1\u0026#34;, processingTime.dividedBy(3), null); return null; }); scope.fork( () -\u0026gt; { task(\u0026#34;task2\u0026#34;, processingTime.dividedBy(2), new TestException(\u0026#34;task2 failed\u0026#34;)); return null; }); scope.fork( () -\u0026gt; { task(\u0026#34;task3\u0026#34;, processingTime, null); return null; }); scope.join(); var result = scope.exception(); var duration = Duration.between(start, Instant.now()); System.out.println(STR.\u0026#34;*** Got \\\u0026#34;\\{result}\\\u0026#34; in \\{duration}\u0026#34;); } } void task(String name, Duration processingTime, Exception result) throws Exception { Thread.sleep(processingTime); if (result != null) { throw result; } } static class TestException extends Exception { TestException(String message) { super(message); } } Running this, we see similar results as in our previous experiment:\n\u0026gt; bazel run //:try7 INFO: Running command line: bazel-bin/try7 *** Got \u0026#34;com.fillmore_labs.blog.jvt.TestException: task2 failed\u0026#34; in 520,398ms So ShutdownOnFailure closely mimics Go\u0026rsquo;s errgroup.\nSummary #Java seems to bet on structured concurrency, at least for virtual threads in non-library code. It uses thread interruption as a means of cancelation, which requires having a handle to the running thread. We might eventually see support for context propagation, e.g. from OpenTelemetry, for the new constructs. This is conceptually very different for Go\u0026rsquo;s context, which is just hierarchically passed down and cancels tasks, including subtasks, regardless of whether the canceler is aware of them.\nRon Pressler, Alan Bateman. 2023. Structured Concurrency (Second Preview). In JDK Enhancement Proposals — September 2023 — JEP 462 — \u0026lt;openjdk.org/jeps/462\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe code is available on GitHub at github.com/fillmore-labs/blog-javavirtualthreads.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"10 April 2024","permalink":"/posts/javavirtualthreads-2/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Java Structured Concurrency"},{"content":"","date":null,"permalink":"/tags/jvm/","section":"Tags","summary":"","title":"Jvm"},{"content":"","date":null,"permalink":"/tags/virtual-threads/","section":"Tags","summary":"","title":"Virtual-Threads"},{"content":"Reading my articles about Go concurrency a friend asked me whether one could something similar in Java.\nProject Loom #Since the release JDK 21 Java has virtual threads1:\nThread.startVirtualThread(() -\u0026gt; { System.out.println(\u0026#34;Hello, world\u0026#34;); }); As an equivalent to Go’s goroutines:\ngo func() { fmt.Println(\u0026#34;Hello, world\u0026#34;) }() A Simple Example #Like our experiments in Go, we implement2 a simple recursive calculation of the Fibonacci sequence:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 package com.fillmore_labs.blog.jvt; public final class Slow { public static int fibonacci(int n) { if (n \u0026lt; 2) { return n; } var fn1 = fibonacci(n - 1); var fn2 = fibonacci(n - 2); return fn1 + fn2; } } Then call it 1,000 times:\n1 2 3 4 5 6 7 8 9 import com.fillmore_labs.blog.jvt.Slow; void main() { for (int i = 0; i \u0026lt; 1_000; i++) { // var queryStart = Instant.now(); Slow.fibonacci(27); // var duration = Duration.between(queryStart, Instant.now()); } } Running this on our good old N5105 CPU gives us:\n\u0026gt; bazel run //:try1 INFO: Running command line: bazel-bin/try1 *** Finished 1000 runs in 1.219s - avg 1.214ms, stddev 48.555µs Which is even a little faster3 than our Go version. Nice.\nSo, let’s try a naïve approach to parallelize things:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 package com.fillmore_labs.blog.jvt; public final class Parallel1 { public static int fibonacci(int n) { if (n \u0026lt; 2) { return n; } var ff1 = new FutureTask\u0026lt;\u0026gt;(() -\u0026gt; fibonacci(n - 1)); Thread.startVirtualThread(ff1); var ff2 = new FutureTask\u0026lt;\u0026gt;(() -\u0026gt; fibonacci(n - 2)); Thread.startVirtualThread(ff2); return ff1.get() + ff2.get(); } } Resulting in:\n\u0026gt; bazel run //:try2 INFO: Running command line: bazel-bin/try2 *** Finished 1000 runs in 279.364s - avg 279.346ms, stddev 54.647ms 4 Minutes and 20 Seconds is a little better that what Go did, but still much slower than our single-threaded solution.\nAnalyzing Flame Graphs #If we look at the flame graph of the single-threaded run:\n\u0026gt; bazel run //:bench1 -- -prof \u0026#34;async:output=flamegraph;direction=forward\u0026#34; Iteration 1: 1220.789 ms/op Benchmark Mode Cnt Score Error Units Bench1.measure ss 1220.789 ms/op We see a little time spent interpreting/compiling the program and mostly working on our Fibonacci implementation. Our naïve implementation looks like this:\nWe spend a lot of time blocked on a Mutex in the JVM Tool Interface, maybe the global JvmtiThreadState_lock?\nOther Approaches #Anyway, we are not here to debug the JVM, let’s try some other approaches.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 package com.fillmore_labs.blog.jvt; import java.util.concurrent.ExecutorService; public record Parallel3(ExecutorService e) { public int fibonacci(int n) { if (n \u0026lt; 2) { return n; } var ff1 = e.submit(() -\u0026gt; fibonacci(n - 1)); var fn2 = fibonacci(n - 2); return ff1.get() + fn2; } } Sharing an ExecutorService and using the ‘original’ thread to do some work improves things:\n\u0026gt; bazel run //:try3 INFO: Running command line: bazel-bin/try3 *** Finished 1000 runs in 179.452s - avg 179.426ms, stddev 41.363ms 3 Minutes is faster (interestingly enough we loose to go here) - but still slower that the single-threaded version.\nSo, let’s move parallelization to the calling function:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 import com.fillmore_labs.blog.jvt.Slow; import java.util.concurrent.Executors; void main() { try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { for (int i = 0; i \u0026lt; 1_000; i++) { // var queryStart = Instant.now(); executor.execute(() -\u0026gt; { Slow.fibonacci(27); // var duration = Duration.between(queryStart, Instant.now()); }); } } } \u0026gt; bazel run //:try4 INFO: Running command line: bazel-bin/try4 *** Finished 1000 runs in 349.151ms - avg 164.952ms, stddev 88.675ms This has a similar flame graph than the single-threaded version and is approximately 3.5 times faster.\nImprove Latency #Now let us limit the number of queued calls:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import com.fillmore_labs.blog.jvt.Slow; import java.util.concurrent.Executors; import java.util.concurrent.Semaphore; void main() throws InterruptedException { try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { var numCPU = Runtime.getRuntime().availableProcessors(); var pool = new Semaphore(numCPU); for (int i = 0; i \u0026lt; 1_000; i++) { // var queryStart = Instant.now(); pool.acquire(); executor.execute( () -\u0026gt; { Slow.fibonacci(27); // var duration = Duration.between(queryStart, Instant.now()); pool.release(); }); } } } \u0026gt; bazel run //:try5 INFO: Running command line: bazel-bin/try5 *** Finished 1000 runs in 359.420ms - avg 1.697ms, stddev 665.871µs Which improves our latency from 165ms to 1.7ms.\nSummary #Exercises on how many threads can be started on a certain machine are mostly boring - this metric primarily showcases the small initial stack size of virtual threads.\nSeeing Java adopt virtual threads is exciting. However, it\u0026rsquo;s unlikely that Java code will resemble Go or Erlang soon. Developing correct, efficient concurrent code is much more than just replacing one threading model with another4, also there are fundamental differences in existing (standard) libraries.\n\u0026hellip; continued in part two.\nRon Pressler, Alan Bateman. 2023. Virtual Threads. In JDK Enhancement Proposals — March 2023 — JEP 444 — \u0026lt;openjdk.org/jeps/444\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe code is available on GitHub at github.com/fillmore-labs/blog-javavirtualthreads.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis isn\u0026rsquo;t a comparison of Go and Java, at least not in terms of performance. Java excels in benchmarks and repetitive tasks.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAlan Bateman. 2023. The Challenges of Introducing Virtual Threads to the Java Platform - Project Loom — August 2023 — JVM Language Summit 2023 — \u0026lt;youtu.be/WsCJYQDPrrE?t=667\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"9 April 2024","permalink":"/posts/javavirtualthreads-1/","section":"Blog","summary":"Reading my articles about Go concurrency a friend asked me whether one could something similar in Java.","title":"Java Virtual Threads"},{"content":"\u0026hellip; continued from the previous post.\nShuffling through Stack Overflow questions I realized that there is one point I tried to make clear, but didn’t emphasize enough:\nWrite Synchronous Code First #Many programs work perfectly fine without concurrency. It\u0026rsquo;s better to prioritize creating a functional and thoroughly tested program initially and introduce concurrency when its benefits become apparent through observation of runtime behavior. Rushing into concurrent implementations riddled with bugs invariably results in more time spent on subsequent fixes of a bad design and less on real improvement; moreover, it\u0026rsquo;s more efficient to start with a straightforward approach and iterate towards improvement, rather than to deal with a buggy program and invest significant time in bug fixes afterward.\nProblems in the Wild #Let me give some examples. It’s so much better to have:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 func task() int { x := subX() y := subY(x) return y } func subX() int { return 1 } func subY(i int) int { return i + 1 } And realize - well, you can’t make it concurrent, because one function depends on the result of the other, and besides - it is fast enough - instead of being stuck with Frankenstein\u0026rsquo;s monster:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 func task() int { xy := make(chan int) result := make(chan int) go subX(xy) go subY(xy, result) return \u0026lt;-result } func subX(out chan\u0026lt;- int) { out \u0026lt;- 1 } func subY(in \u0026lt;-chan int, out chan\u0026lt;- int) { out \u0026lt;- \u0026lt;-in + 1 } which is slower than the first example.\nAnd it is also so much easier instead of guessing what should be concurrent up front to start with synchronous code:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 func task() (int, error) { s1, err := sub1() if err != nil { return 0, err } s2, err := sub2() if err != nil { return 0, err } return s1 + s2, nil } func sub1() (int, error) { return 1, nil } func sub2() (int, error) { return 1, nil } and should you find out that executing sub1 and sub2 concurrently could speeds thing up, transform it to:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 func task() (int, error) { var s1, s2 int var g errgroup.Group g.Go(func() (err error) { s1, err = sub1() return err }) g.Go(func() (err error) { s2, err = sub2() return err }) err := g.Wait() if err != nil { return 0, err } return s1 + s2, nil } func sub1() (int, error) { return 1, nil } func sub2() (int, error) { return 1, nil } What makes things much better here is that sub1 and sub2 still have unchanged, synchronous APIs, which means that all tests you’ve written are still valid and things are much easier to test, since you don’t have to deal with concurrency.\nThe transformation only happens in task, and at that scope concurrency is better to understand.\nBe Considerate #I do not believe everything should be written using structured concurrency. But seeing common Go bugs, I think that a lot of code written would benefit.\nSummary #Most of the code should be written synchronously first and should keep synchronous APIs as much as possible. Function literals are a great way to separate concurrency from subtasks.\n\u0026hellip; to be continued.\n","date":"28 March 2024","permalink":"/posts/structured-5/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"How to Write Concurrent Go Code"},{"content":"\u0026hellip; continued from the previous post.\nTwo Popular Choices #Perhaps the most popular exisiting library is golang.org/x/sync/errgroup by Bryan C. Mills:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 package main import ( \u0026#34;context\u0026#34; \u0026#34;fillmore-labs.com/blog/structured/pkg/task\u0026#34; \u0026#34;golang.org/x/sync/errgroup\u0026#34; ) func doWork(ctx context.Context) error { g, ctx := errgroup.WithContext(ctx) g.Go(func() error { return task.Task(ctx, \u0026#34;task1\u0026#34;, processingTime/3, nil) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task2\u0026#34;, processingTime/2, errFail) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task3\u0026#34;, processingTime, nil) }) return g.Wait() } and the older gopkg.in/tomb.v2 by Gustavo Niemeyer:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 package main import ( \u0026#34;context\u0026#34; \u0026#34;fillmore-labs.com/blog/structured/pkg/task\u0026#34; \u0026#34;gopkg.in/tomb.v2\u0026#34; ) func doWork(ctx context.Context) error { g, ctx := tomb.WithContext(ctx) g.Go(func() error { return task.Task(ctx, \u0026#34;task1\u0026#34;, processingTime/3, nil) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task2\u0026#34;, processingTime/2, errFail) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task3\u0026#34;, processingTime, nil) }) return g.Wait() } Summary #In practical scenarios, where numerous goroutines are initiated, structured concurrency guarantees their proper management without leaking resources. The managing goroutine must not exit until all its child goroutines complete. Additionally, structured concurrency guarantees thorough error reporting, preventing any errors from being overlooked or disregarded.\n\u0026hellip; continued in the next post.\n","date":"27 March 2024","permalink":"/posts/structured-4/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Existing Libraries"},{"content":"\u0026hellip; continued from the previous post.\nRefactor Our Original Approach #What we have done to the previously can also be done to the approach using an error channel:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 package main import \u0026#34;context\u0026#34; type Group struct { errc chan error cancel context.CancelCauseFunc count int } func NewGroup(cancel context.CancelCauseFunc) *Group { return \u0026amp;Group{errc: make(chan error, 1), cancel: cancel} } func (g *Group) Go(f func() error) { g.count++ go func() { g.errc \u0026lt;- f() }() } func (g *Group) Wait() error { var err error for range g.count { if e := \u0026lt;-g.errc; e != nil \u0026amp;\u0026amp; err == nil { err = e g.cancel(e) } } return err } Making our function:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 package main import ( \u0026#34;context\u0026#34; \u0026#34;fillmore-labs.com/blog/structured/pkg/task\u0026#34; ) func doWork(ctx context.Context) error { ctx, cancel := context.WithCancelCause(ctx) defer cancel(nil) g := NewGroup(cancel) g.Go(func() error { return task.Task(ctx, \u0026#34;task1\u0026#34;, processingTime/3, nil) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task2\u0026#34;, processingTime/2, errFail) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task3\u0026#34;, processingTime, nil) }) return g.Wait() } As we see we get a nearly identical result for the main function, witht the API neatly abstracting our soulution. One difference is that we have to call all subtasks asynchronously, since we need Group.Wait working on the error channel.\nSummary #We have seen two approaches to structured concurrency with nearly identical APIs.\n\u0026hellip; continued in the next post.\n","date":"26 March 2024","permalink":"/posts/structured-3/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Comparison to Our Original Approach"},{"content":"\u0026hellip; continued from the previous post.\nDo Not Communicate by Sharing Memory; Instead, Share Memory by Communicating1 # This approach can be taken too far. [\u0026hellip;] But as a high-level approach, using channels to control access makes it easier to write clear, correct programs.1\nLet\u0026rsquo;s just for comparison reformulate the scheme in the last post with shared variables and a sync.WaitGroup:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 package main import ( \u0026#34;context\u0026#34; \u0026#34;sync\u0026#34; \u0026#34;fillmore-labs.com/blog/structured/pkg/task\u0026#34; ) func doWork(ctx context.Context) error { ctx, cancel := context.WithCancelCause(ctx) defer cancel(nil) var firstErr error var once sync.Once setErr := func(err error) { if err == nil { return } once.Do(func() { firstErr = err cancel(err) }) } var wg sync.WaitGroup wg.Add(1) go func() { defer wg.Done() err := task.Task(ctx, \u0026#34;task1\u0026#34;, processingTime/3, nil) setErr(err) }() wg.Add(1) go func() { defer wg.Done() err := task.Task(ctx, \u0026#34;task2\u0026#34;, processingTime/2, errFail) setErr(err) }() err := task.Task(ctx, \u0026#34;task3\u0026#34;, processingTime, nil) setErr(err) wg.Wait() return firstErr } Here we replace the error channel with a function storing the first error and canceling the context. This works fine:\n\u0026gt; go run fillmore-labs.com/blog/structured/cmd/structured2 task1 \u0026lt;nil\u0026gt; task2 failed task3 context canceled Got \u0026#34;failed\u0026#34; error in 501ms But it is a lot of boilerplate.\nRefactor and Separate Concerns #We can easily extract the orchestration part:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 package main import ( \u0026#34;context\u0026#34; \u0026#34;sync\u0026#34; ) type Group struct { err error cancel context.CancelCauseFunc once sync.Once wg sync.WaitGroup } func NewGroup(cancel context.CancelCauseFunc) *Group { return \u0026amp;Group{cancel: cancel} } func (g *Group) Do(fn func() error) { err := fn() if err == nil { return } g.once.Do(func() { g.err = err if g.cancel != nil { g.cancel(err) } }) } func (g *Group) Go(fn func() error) { g.wg.Add(1) go func() { defer g.wg.Done() g.Do(fn) }() } func (g *Group) Wait() error { g.wg.Wait() return g.err } Making our function only:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 package main import ( \u0026#34;context\u0026#34; \u0026#34;fillmore-labs.com/blog/structured/pkg/task\u0026#34; ) func doWork(ctx context.Context) error { ctx, cancel := context.WithCancelCause(ctx) defer cancel(nil) g := NewGroup(cancel) g.Go(func() error { return task.Task(ctx, \u0026#34;task1\u0026#34;, processingTime/3, nil) }) g.Go(func() error { return task.Task(ctx, \u0026#34;task2\u0026#34;, processingTime/2, errFail) }) g.Do(func() error { return task.Task(ctx, \u0026#34;task3\u0026#34;, processingTime, nil) }) return g.Wait() } This separates processing and orchestration, which is nice and makes our code much more readable and improves testability.\nSummary #Separating orchestration from the processing code, we can reach simplified structured concurrency with improved readability while eliminating some sources of resource leaks.\n\u0026hellip; continued in the next post.\nEffective Go — 2009 — \u0026lt;golang.org/doc/effective_go.html#sharing\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"25 March 2024","permalink":"/posts/structured-2/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"An Alternative Approach"},{"content":"\u0026hellip; continued from the previous post.\nDifferent Categories of Concurrency #When recalling the previous article, a task was subdivided into three subtasks, all working towards a common objective (specifically, merging contributors).\nThe subtasks are started in the main task and reach completion, yielding results, within the lifespan of that overarching task.\nThis pattern was named “Structured Concurrency” by Martin Sústrik1 and further examined in Nathaniel J. Smith\u0026rsquo;s “Notes on structured concurrency, or: Go statement considered harmful”2.\nWhile I do not subscribe to every viewpoint expressed in these articles, I believe that this is at least a valid concurrency pattern. Also, it’s well-suited to Go\u0026rsquo;s hierarchical contexts and cancelation mechanisms.\nGroundwork #Let’s start with a typical subtask\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 package task import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;time\u0026#34; ) func Task(ctx context.Context, name string, processingTime time.Duration, result error) error { ready := time.NewTimer(processingTime) select { case \u0026lt;-ctx.Done(): ready.Stop() fmt.Println(name, ctx.Err()) return fmt.Errorf(\u0026#34;%s canceled: %w\u0026#34;, name, ctx.Err()) case \u0026lt;-ready.C: fmt.Println(name, result) } return result } We define a task.Task as a dummy workload, having a name as an identity and a processingTime after which it finishes.\nThe task has two properties that are important:\nIt has a synchronous API and returns an error It takes a context.Context parameter and exits early when the context is canceled. Having a context is especially important so that we don’t perform a lot of work which is irrelevant in case the overarching task already failed.\nConsider this scenario: You make a query to a service that fails to respond. Eventually, a higher-level context times out, and it is important for your function to terminate to prevent any potential resource leakage.\nWhen a request is canceled or times out, all the goroutines working on that request should exit quickly so the system can reclaim any resources they are using.3\nContext #We define an doWork function which takes a higher-level context and distributes the work over three subtasks.\nFirst, we create a sub-context of the passed context which will be canceled when we leave the scope. We pass this context to all created goroutines, ensuring we don’t leak resources.\nAdditionally, when we can’t complete our task (because a subtask failed) we cancel the remaining work so we can return early.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 package main import ( \u0026#34;context\u0026#34; \u0026#34;fillmore-labs.com/blog/structured/pkg/task\u0026#34; ) func doWork(ctx context.Context) error { ctx, cancel := context.WithCancelCause(ctx) defer cancel(nil) var g int errc := make(chan error) g++ go func() { errc \u0026lt;- task.Task(ctx, \u0026#34;task1\u0026#34;, processingTime/3, nil) }() g++ go func() { errc \u0026lt;- task.Task(ctx, \u0026#34;task2\u0026#34;, processingTime/2, errFail) }() g++ go func() { errc \u0026lt;- task.Task(ctx, \u0026#34;task3\u0026#34;, processingTime, nil) }() var err error for range g { if e := \u0026lt;-errc; e != nil \u0026amp;\u0026amp; err == nil { err = e cancel(err) } } return err } Since we fail task 2 we expect the following call sequence:\nsequenceDiagram participant Main create participant Work Main-\u003e\u003eWork: go Work(main ctx) Note right of Work: Create work ctx create participant Task1 Work-\u003e\u003eTask1: go Task1(work ctx) create participant Task2 Work-\u003e\u003eTask2: go Task2(work ctx) create participant Task3 Work-\u003e\u003eTask3: go Task3(work ctx) Note over Task1: Task1 completes destroy Task1 Task1-\u003e\u003eWork: no error (“nil”) Note over Task2: Task2 completes destroy Task2 Task2-\u003e\u003eWork: failed Note over Task3: Task3 processing Note right of Work: First error cancels work ctx Work--)Task3: cancel work ctx Note over Task3: Task3 interrupted destroy Task3 Task3-\u003e\u003eWork: canceled Note right of Work: All subtaks complete destroy Work Work-\u003e\u003eMain: failed Running Multiple Subtasks Concurrently #So, we run a number of subtasks in goroutines, simply counting them with g++ - which is easier to track than having a magic 3 at the top - and collect all results in the end. This is pretty simple code, and when we run it we get the expected result:\n\u0026gt; go run fillmore-labs.com/blog/structured/cmd/structured1 task1 \u0026lt;nil\u0026gt; task2 failed task3 context canceled Got \u0026#34;failed\u0026#34; error in 501ms Try it on the Go Playground.\nAlso, we see that the task returns nearly immediately after the first failure (500 milliseconds) and doesn\u0026rsquo;t let the third subtask unnecessarily consume resources.\nConsidering the possibility of “optimization”, where we could return the error immediately without awaiting the completion of canceled subtasks: Since we expect canceled subtasks to return quickly (they can just abort), we might save very little time in error scenarios, without any gain in normal processing. Compared to the risk of a resource leak this appears to be a bad tradeoff.\nSummary #Structured Concurrency is a pattern that is useful in writing correct Go programs. A context parameter und creating sub-contexts is helpful for avoiding resource leaks.\n\u0026hellip; continued in the next post.\nMartin Sústrik. 2016. Structured Concurrency. In 250bpm Blog — February 2016 — \u0026lt;250bpm.com/blog:71/\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nNathaniel J. Smith. 2018. Notes on structured concurrency, or: Go statement considered harmful. In njs blog — April 2018 — \u0026lt;vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSameer Ajmani. 2014. Go Concurrency Patterns: Context. In The Go Blog — July 2014 — \u0026lt;go.dev/blog/context\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"22 March 2024","permalink":"/posts/structured-1/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Structured Concurrency"},{"content":"\u0026hellip; continued from the previous post.\nProblems With Concurrency #In the first post of this series I was citing two papers that examined common bugs in handling of Go concurrency.\nWe took a long time to arrive here, but I wanted to show that goroutines are not the panacea especially beginners take them for. Understanding concurrent code is sometimes hard, even when it’s easy to write.\nOne thing I see often is leaking goroutines when handling errors:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 package main import ( \u0026#34;errors\u0026#34; \u0026#34;fmt\u0026#34; \u0026#34;sync\u0026#34; ) func MergeContributors(primaryAccount, secondaryAccount Account) error { // Create a WaitGroup to manage the goroutines. var waitGroup sync.WaitGroup c := make(chan error) // Perform 3 concurrent transactions against the database. waitGroup.Add(3) go func() { waitGroup.Wait() close(c) }() // Transaction #1, merge \u0026#34;commit\u0026#34; records go func() { defer waitGroup.Done() err := mergeCommits(primaryAccount, secondaryAccount) if err != nil { c \u0026lt;- err } }() // Transaction #2, merge \u0026#34;pull request\u0026#34; records go func() { defer waitGroup.Done() err := mergePullRequests(primaryAccount, secondaryAccount) if err != nil { c \u0026lt;- err } }() // Transaction #3, merge \u0026#34;merge\u0026#34; records go func() { defer waitGroup.Done() err := mergePullRequestMerges(primaryAccount, secondaryAccount) if err != nil { c \u0026lt;- err } }() // This line is bad! Get rid of it! // waitGroup.Wait() for err := range c { if err != nil { return err } } return markMerged(primaryAccount, secondaryAccount) } Can you spot the leak? Try it on the Go playground.\nThis code is adapted from “Synchronizing Go Routines with Channels and WaitGroups”. This code was not originally written by Sophie DeBenedetto and I believe she is a capable developer. It was chosen because it is pretty typical for goroutine leaks in practice, not to blame anyone personally.\nThe original code didn\u0026rsquo;t test error cases and was deadlocking if one process failed. A deadlock is probably easy to spot and will be fixed.\nWhen two ore more requests fail (which I assume is realistic, because the reason for the first failure might affect the other requests too), the first error will exit MergeContributors, the second will hang on c \u0026lt;- err (because no one will ever read the error channel) and the first started goroutine will hang on waitGroup.Wait().\nThe resulting bug is a memory and goroutine leak and is much more intricate and easily overlooked.\nOverall, we found that there are around 42% blocking bugs caused by errors in protecting shared memory, and 58% are caused by errors in message passing. Considering that shared memory primitives are used more frequently than message passing ones, message passing operations are even more likely to cause blocking bugs.1\nObservation 3. Contrary to the common belief that message passing is less error-prone, more blocking bugs in our studied Go applications are caused by wrong message passing than by wrong shared memory protection.1\nAnalysis #What could be considered a potential solution? One approach might use a buffered channel for errors, such as c := make(chan error, 3) (although 2 would suffice), which resolves the leak and therefore could be called a ‘fix’.\nHowever, this introduces additional magic numbers into the code, complicating its readability and maintainability. It assumes that we know the maximal number of spawned goroutines in advance, so it is more of a shotgun debugging approach rather than a comprehensive understanding and resolution of the underlying issue.\nWhat Would Be a Proper Fix? #Analyzing the code, we can identify five goroutines communicating:\nThe three worker threads, doing transactions The first goroutine, waiting for the workers to finish The main goroutine (running MergeContributors) waiting for all “child” goroutines to finish and either succeed or returning the first error that happened How do they communicate? The three worker goroutines communicate with first one, the “waiter”, with a sync.WaitGroup and all four (waiter and workers) communicate with the main goroutine via the error channel.\nThis gives us an idea why the original approach of using the WaitGroup in the main goroutine is overly complicated and lead to a deadlock, resolved in the blog post.\nThe problem we can easily identify is that the main goroutine does not do what we assume, “waiting for all child goroutines to finish and either returning the first error that happened or success”. When an error occurs it abandons its duties, leaking the remaining goroutines:\n53 54 55 56 57 58 59 // waitGroup.Wait() for err := range c { if err != nil { return err } } So, let’s fix that:\n53 54 55 56 57 58 59 60 61 var err error for e := range c { if err == nil { err = e } } if err != nil { return err } This does what we expect: Using the channel close as the signal for termination and returning the first error when all goroutines are finished.\nBut This Is Bad Code #Maybe. It is not tricky and does what the probable intend of the original code was. Let’s make it little bit more readable by grouping the relevant code:\n17 18 19 20 go func() { waitGroup.Wait() close(c) }() 53 54 55 56 57 58 59 60 61 var err error for e := range c { if err == nil { err = e } } if err != nil { return err } What might make the code bad is that it spawns a goroutine to do nothing but wait - it translates a wait group into a channel close.\nThe backpressure of the result channel is fine, we just don’t like it because we don’t plan to process the remaining data (errors).\nSummary #Naïve use of goroutines and synchronization primitives can introduce bugs, while not necessarily improving execution efficiency.\n\u0026hellip; continued in the next post.\nTengfei Tu, Xiaoyu Liu, Linhai Song, Yiying Zhang. 2019. Understanding Real-World Concurrency Bugs in Go. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems — April 2019 — Pages 865–878 — \u0026lt;doi.org/10.1145/3297858.3304069\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"21 March 2024","permalink":"/posts/goroutines-4/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Resource Leaks"},{"content":"\u0026hellip; continued from the previous post.\nCancelation #Assume we have only a limited amount of time and want to use the data we have until this point. While we could build our own solution, but Go has context.WithTimeout since version 1.7.\nLet us modify our Fibonacci function to use a context:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 package fibonacci import ( \u0026#34;context\u0026#34; \u0026#34;fmt\u0026#34; ) func SlowCtx(ctx context.Context, i int) (int, error) { select { case \u0026lt;-ctx.Done(): return 0, fmt.Errorf(\u0026#34;fibonacci canceled: %w\u0026#34;, context.Cause(ctx)) default: } if i \u0026lt; 2 { return i, nil } fn1, err1 := SlowCtx(ctx, i-1) if err1 != nil { return 0, err1 } fn2, err2 := SlowCtx(ctx, i-2) if err2 != nil { return 0, err2 } return fn1 + fn2, nil } We see some of elemts we were missing from the list of concurrency building blocks: Cancelation and error handling. Also note that checking for cancelation often will have a noticeable performance impact, we accept that for clearness and demonstration purposes.\nNow we must adapt main, too:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 package main import ( \u0026#34;context\u0026#34; \u0026#34;runtime\u0026#34; \u0026#34;time\u0026#34; \u0026#34;fillmore-labs.com/blog/goroutines/pkg/fibonacci\u0026#34; \u0026#34;golang.org/x/sync/semaphore\u0026#34; ) func main() { ctx := context.Background() tctx, cancel := context.WithTimeout(ctx, 100*time.Millisecond) defer cancel() numCPU := int64(runtime.NumCPU()) pool := semaphore.NewWeighted(numCPU) for range 1_000 { // queryStart := time.Now() if err := pool.Acquire(tctx, 1); err != nil { break } go func() { defer pool.Release(1) _, err := fibonacci.SlowCtx(tctx, 27) if err == nil { // duration := time.Since(queryStart) // done } else { // failed } }() } _ = pool.Acquire(ctx, numCPU) } Running this gives:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/try6 *** Finished 45 runs (4 failed) in 107ms - avg 11.1ms, stddev 3.55ms While we see a performance hit due to checking for cancelation too often and a not overly precise timer, the result is pretty satisfactory.\n\u0026gt; go test -trace trace6.out fillmore-labs.com/blog/goroutines/cmd/try6 ok fillmore-labs.com/blog/goroutines/cmd/try6 1.403s \u0026gt; go tool trace trace6.out Also, most goroutines are busy processing:\nand we are not blocked long waiting for other parts of the program (our runtime measurement):\nIf we modify the semaphore pool size to be 1,000 instead of runtime.NumCPU() we get results like:\n*** Finished 48 runs (952 failed) in 108ms - avg 57.8ms, stddev 30.6ms We build up lots of unnecessary goroutines in the beginning which just hang around until canceled:\nThis is also visible in the goroutine analysis:\nAnd some blocking for the few routines that manage to finish:\nSummary #We introduced the concept of cancelation and error handling, so that we can limit the amount of work done when we are no longer interested in the result, for example because another part of the task failed or some deadline timed out.\n\u0026hellip; continued in the next post.\n","date":"20 March 2024","permalink":"/posts/goroutines-3/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Avoiding Unnecessary Work"},{"content":"\u0026hellip; continued from the previous post.\nBack to the Drawing Board #Let us keep the original Slow implementation and move parallelization to main:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 package main import ( \u0026#34;sync\u0026#34; \u0026#34;fillmore-labs.com/blog/goroutines/pkg/fibonacci\u0026#34; ) func main() { var wg sync.WaitGroup for range 1_000 { // queryStart := time.Now() wg.Add(1) go func() { defer wg.Done() _ = fibonacci.Slow(27) // duration := time.Since(queryStart) }() } wg.Wait() } Now check if things improved:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/try4 *** Finished 1000 runs in 371ms - avg 185ms, stddev 106ms This is approximately 3.85 times faster than the single-core solution, which is what we would expect on a four-core machine.\nWe could stop here, but some numbers immediately jump out. We measure the time between the begin of the request (the start of the goroutine) and when the calculation is done.\nWhile the single-core solution needed 1.47 milliseconds for one calculation, our latest program makes us wait on average 185 milliseconds for the result. Also, the response times vary wildly, with over 100 milliseconds of standard deviation.\nLet us diagnose why this is:\n\u0026gt; go test -trace trace4.out fillmore-labs.com/blog/goroutines/cmd/try4 ok fillmore-labs.com/blog/goroutines/cmd/try4 0.374s \u0026gt; go tool trace trace4.out Examining the goroutine analysis of fillmore-labs.com/blog/goroutines/cmd/try4.Run4.func1:\nWe can seed that we spawn a lot of goroutines that are are mostly waiting to be scheduled and it takes the scheduler a while to finish all of them:\nThe Go scheduler is good and has implemented some interesting prioritization tricks, so there is little penalty for this build-up, but we make scheduling bad for us and any other part of our application.\nIf we look at the block times:\nWe see the runtime calculations blocking, since we use a channel to send the duration to another goroutine, so placing unnecessary load on the scheduler affects the whole program. Also, we are just wasting RAM by creating workloads that can not be executed.\nDo Not Commission More Work Than Can Be Done #Let us modify the way we schedule our calculations. Instead of submitting them all at once, we use a semaphore to only submit as many goroutines as can be executed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 package main import ( \u0026#34;context\u0026#34; \u0026#34;runtime\u0026#34; \u0026#34;fillmore-labs.com/blog/goroutines/pkg/fibonacci\u0026#34; \u0026#34;golang.org/x/sync/semaphore\u0026#34; ) func main() { ctx := context.Background() numCPU := int64(runtime.GOMAXPROCS(0)) pool := semaphore.NewWeighted(numCPU) for range 1_000 { // queryStart := time.Now() _ = pool.Acquire(ctx, 1) go func() { defer pool.Release(1) _ = fibonacci.Slow(27) // duration := time.Since(queryStart) }() } _ = pool.Acquire(ctx, numCPU) } Code-wise it looks remarkably like the solution using wait groups. Now try this out:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/try5 *** Finished 1000 runs in 372ms - avg 1.85ms, stddev 445µs It has nearly the same total runtime as our last attempt, but the requests are much more responsive (a query is calculated after 1.85 Milliseconds, which is close to our single-core result of 1.45 Milliseconds) and much less variance than before. Let us look at the trace of that:\n\u0026gt; go test -trace trace5.out fillmore-labs.com/blog/goroutines/cmd/try5 ok fillmore-labs.com/blog/goroutines/cmd/try5 0.374s \u0026gt; go tool trace trace5.out We see that the goroutines spend time executing code, instead of just waiting to be scheduled. Also, we have only four goroutines running at a time:\nSo, this surely is an improvement.\nSummary #We studied some ways to parallelize a CPU bound algorithm so that it efficiently uses all CPU cores without swamping the Go scheduler. We also saw that randomly using goroutines has a good chance to make a program run slower than before.\nInterestingly enough the synchronous calculation of a single task is faster that the parallel one, and moving concurrency out of the calculation sped things up tremendously.\n\u0026hellip; continued in the next post.\n","date":"19 March 2024","permalink":"/posts/goroutines-2/","section":"Blog","summary":"\u0026hellip; continued from the previous post.","title":"Using Goroutines Will Not Grant You Another CPU Core"},{"content":"I embarked on my journey in programming on a humble 6502 Machine, equipped with just 1K of RAM. Eventually, I delved into the intricate task of reverse engineering and enhancing the firmware of a 5¼\u0026quot; floppy disk drive.\nAfter writing a lot of C++ and Java and studying Mathematics and Computer Science I worked in mobile development in an architect role for a while but moved to backend and systems architecture since then.\nBesides software architecture and archaeology my expertise is in event-based systems, domain-driven design (DDD) and streaming data processing. I am also leading project teams, mentoring developers and care a lot about testing and quality assurance.\nLast but not least I recognize the human aspect of software development and believe that software architecture is sometimes more of an art than a craft.\n","date":null,"permalink":"/about/","section":"Fillmore Labs","summary":"I embarked on my journey in programming on a humble 6502 Machine, equipped with just 1K of RAM.","title":"About Me"},{"content":"","date":null,"permalink":"/","section":"Fillmore Labs","summary":"","title":"Fillmore Labs"},{"content":"Concurrency Is Not Parallelism1 #Recently two interesting observations in “A Study of Real-World Data Races in Golang”2 caught my eye:\nObservation 1. Developers using Go employ significantly more concurrency and synchronization constructs than in Java.\nObservation 2. Developers using Go for programming microservices expose significantly more runtime concurrency than other languages such as Java, Python, and NodeJS used for the same purpose.\nThese observations agree with my experiences, and while easy concurrency is great it comes with its problems.3\nIf you are familiar with the intricacies of the Go scheduler jump to part four, otherwise let\u0026rsquo;s just take a step back:\nBuilding Blocks #Concurrency in Go builds upon\nGoroutines Channels Cancelation (in package context) Function literals (closures, anonymous functions) Synchronization primitives (in package sync) Result and error handling (or more generally communication between goroutines) And while it can be argued that some of these points are not exactly part of Go’s program execution model, all of these elements are identifiable in code dealing with concurrency.\nA Simple Example #To start our journey, let us take a simple example: We want to calculate the 27. Fibonacci number (196,418), using a simple approach:\n1 2 3 4 5 6 7 8 9 10 11 12 package fibonacci func Slow(i int) int { if i \u0026lt; 2 { return i } fn1 := Slow(i - 1) fn2 := Slow(i - 2) return fn1 + fn2 } And call this function 1,000 times:\n1 2 3 4 5 6 7 8 9 10 11 package main import \u0026#34;fillmore-labs.com/blog/goroutines/pkg/fibonacci\u0026#34; func main() { for range 1_000 { // queryStart := time.Now() _ = fibonacci.Slow(27) // duration := time.Since(queryStart) } } This is an easily understandable stand-in for a CPU-bound algorithm, so please do not send me better implementations - it is meant to burn CPU cycles.\nRunning this on a N5105 CPU gives us:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/try1 *** Finished 1000 runs in 1.47s - avg 1.47ms, stddev 18.4µs So, our whole program takes 1.47 seconds on a single core. This is okay, but since the N5105 has four cores we can do better.\nLet\u0026rsquo;s parallelize (yes, I know) this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 package fibonacci func Parallel1(i int) int { if i \u0026lt; 2 { return i } fc1 := make(chan int) go func() { fc1 \u0026lt;- Parallel1(i - 1) }() fc2 := make(chan int) go func() { fc2 \u0026lt;- Parallel1(i - 2) }() return \u0026lt;-fc1 + \u0026lt;-fc2 } Ok, great. Off we go:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/try2 *** Finished 1000 runs in 5m25s - avg 325ms, stddev 6.49ms This is pretty terrible. It takes more than 200 times as long as before while using all available cores.\nOne problem is easy to spot: We are creating two new goroutines and use the original one just to wait rather than do any meaningful work. Let us fix that:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 package fibonacci func Parallel2(i int) int { if i \u0026lt; 2 { return i } fc1 := make(chan int) go func() { fc1 \u0026lt;- Parallel2(i - 1) }() fn2 := Parallel2(i - 2) return \u0026lt;-fc1 + fn2 } Another try, then:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/try3 *** Finished 1000 runs in 1m51s - avg 111ms, stddev 3.10ms Ok, that was an easy three times speed up. Much better, but we are still much slower than the single-core solution.\nNo One Writes Code Like That #Oh? containerd has a ‘concurrent’ garbage collector that spawns exponentially many goroutines:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 package gc import ( \u0026#34;sync\u0026#34; ) func ConcurrentMark(root Node) map[Node]struct{} { grays := make(chan Node) seen := map[Node]struct{}{} var wg sync.WaitGroup go func() { for gray := range grays { if _, ok := seen[gray]; ok { wg.Done() continue } seen[gray] = struct{}{} go func() { defer wg.Done() var children []Node = gray.Children() for _, n := range children { wg.Add(1) grays \u0026lt;- n } }() } }() wg.Add(1) grays \u0026lt;- root wg.Wait() close(grays) return seen } This is some clever piece of code. The access to seen is serialized through the grays channel and every Node posted to grays increments the wait group and spawns a new goroutine.\nSimply constructing a complete four-ary tree of height nine and comparing with a simple non-concurrent solution gives us the following result:\n\u0026gt; go run fillmore-labs.com/blog/goroutines/cmd/gc Concurrent: Found 349525 reachable nodes in 462ms Non-Concurrent: Found 349525 reachable nodes in 166ms Should containerd change its solution? No. This code is seven years old and seems to be unused. The constructed test tree is pathologic and the results will be different in practice due to many already seen nodes. Also calculating children might be CPU intensive.\nThe point I am making is that the presented kind of code exists in practice and while the go scheduler is good and forgives many mistakes, it is still often not used properly.\nSummary #Use of goroutines can make execution slower.\nEspecially interesting is the comparison of Parallel1 and Parallel2. While both are bad solutions, I have often seen the construct of having a goroutine exclusively for waiting instead of doing actual work, and it makes a dramatic difference in this case.\n\u0026hellip; continued in part two.\nRob Pike. 2012. Concurrency is not Parallelism. Talk at Heroku Waza conference — January 2012 — \u0026lt;vimeo.com/49718712\u0026gt; — \u0026lt;go.dev/talks/2012/waza.slide\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMilind Chabbi and Murali Krishna Ramanathan. 2022. A Study of Real-World Data Races in Golang. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation — June 2022 — Pages 474–489 — \u0026lt;doi.org/10.1145/3519939.3523720\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nTengfei Tu, Xiaoyu Liu, Linhai Song, Yiying Zhang. 2019. Understanding Real-World Concurrency Bugs in Go. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems — April 2019 — Pages 865–878 — \u0026lt;doi.org/10.1145/3297858.3304069\u0026gt;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"18 March 2024","permalink":"/posts/goroutines-1/","section":"Blog","summary":"Concurrency Is Not Parallelism1 #Recently two interesting observations in “A Study of Real-World Data Races in Golang”2 caught my eye:","title":"Goroutines Are Cheap, but Not Free"}]