Balancing #107

mjarco · 2018-10-12T08:56:23Z

No description provided.

balancing/balance_breaker.go

… start up

balancing/balance_breaker.go

wookie41 · 2018-10-18T13:05:35Z

balancing/balance_breaker.go

+	}
+}
+
+type lenLimitCounter struct {


lenDelimitedCounter

balancing/balance_breaker.go

wookie41 · 2018-10-18T13:24:04Z

balancing/balance_breaker.go

+		return tracker.state, true
+	}
+
+	changed := false


we don't need this, we can just return false/true everywhere below

balancing/balance_breaker.go

wookie41 · 2018-10-18T17:22:23Z

balancing/balance_breaker.go

+	return exceeded
+}
+
+func (breaker *NodeBreaker) checkStateHalfOpen(exceeded bool) bool {


isHalfOpen

balancing/balance_breaker.go

wookie41 · 2018-11-02T08:01:33Z

balancing/balance_breaker.go

+	}
+	if !meter.IsActive() && active {
+		meter.histogram.shiftData(meter.now().Sub(meter.inActiveSince))
+		meter.inActiveSince = time.Time{}


Maybe nil instead oftime.Time, when switching from inactive to active?

wookie41 · 2018-11-02T08:08:38Z

balancing/balance_breaker.go

+	state               *openStateTracker
+}
+
+// Record collects call data and returns bool if breaker should be open


open -> opened

wookie41 · 2018-11-02T08:09:01Z

balancing/balance_breaker.go

+	return breaker.ShouldOpen()
+}
+
+// ShouldOpen checks if breaker should be open


wookie41 · 2018-11-02T08:36:00Z

balancing/balance_breaker.go

+}
+
+func (tracker *openStateTracker) currentDelay() time.Duration {
+	multiplier := int(math.Pow(2, tracker.closeIteration))


closeIteration is never decremented/reset, even if a backend is healthy for a long time.
An interesting case popped up in my head.

initially the backend is not doing so well and its closeIteration gets incremented until it reaches some high value

the backend becomes healthy and stays healthy for a long time, but it's closeIteration is already big

the backend misbehaves for a brief second and gets taken out of the pool of healthy nodes for a long time, due to it's closeIteration

we wait a long time until we plug the backend back in, even though it' already healthy and might provide us with the best timings

wookie41 · 2018-11-02T09:57:36Z

balancing/balance_breaker.go

+func (h *histogram) cellsCount() int {
+	return int(math.Ceil(float64(h.retention)/float64(h.resolution))) + 1
+}
+


Some of the methods of histogram are protected by a mutex, but some are not. shiftData mutates the state heavily, yet it doesn't lock the data it mutates. If (or when)shiftData and unshiftData run concurrently bad things may happen.

wookie41 · 2018-11-02T10:21:27Z

balancing/balance_breaker.go

+}
+
+func (series *dataSeries) ValueRangeFun(timeStart, timeEnd time.Time, fun func(*timeValue)) []float64 {
+	dataRange := []float64{}


dataRange is unused

wookie41 · 2018-11-02T10:28:31Z

balancing/balance_breaker.go

+}
+
+// TimeSpent returns float64 repesentation of time spent in execution
+func (meter *CallMeter) TimeSpent() float64 {


Won't this be too expensive to calculate?
It happens every time we want to know the node's weight, which is basically every call. If there are a lot rps, then each call we have to traverse each series in a histogram for all of the active nodes to elect a node.
We could simply aggregate this instead of calculating.

wookie41 · 2018-11-02T10:53:34Z

balancing/balance_breaker.go

+	return ms.Node.IsActive()
+}
+
+func raportMetrics(rt http.RoundTripper, since time.Time, open bool) {


wookie41 · 2018-11-02T10:55:05Z

balancing/balance_breaker.go

+}
+
+// IsActive checks Breaker status propagates it to Node compound
+func (ms *MeasuredStorage) IsActive() bool {


i don't like the fact that the method's name hides what it's actually doing and you have to take a look at the doc to know the whole truth.

wookie41 · 2018-11-02T11:06:07Z

httphandler/log.go

@@ -16,7 +16,7 @@ type AccessMessageData struct {
 	Path       string  `json:"path"`
 	UserAgent  string  `json:"useragent"`
 	StatusCode int     `json:"status"`
-	Duration   float64 `json:"duration"`
+	Duration   float64 `json:"duration_ms"`


isn't json camel case? (duration_ms)

wookie41 · 2018-11-02T11:07:19Z

httphandler/roundtripper_decorators.go

@@ -25,7 +25,7 @@ func (lrt *loggingRoundTripper) RoundTrip(req *http.Request) (resp *http.Respons
 	timeStart := time.Now()
 	resp, err = lrt.roundTripper.RoundTrip(req)

-	duration := time.Since(timeStart).Seconds()
+	duration := time.Since(timeStart).Seconds() * 1000


why multiply by 1000?

wookie41 · 2018-11-02T11:15:00Z

storages/shardclient.go

+}
+
+func (c *ShardClient) balancerRoundTrip(req *http.Request) (resp *http.Response, err error) {
+	notFoundNodes := []balancing.Node{}


maybe its worth it to add entries to sync log for each of the nodes in notFoundNodes after we've found a node that was able to process the request?

wookie41 · 2018-11-02T11:26:55Z

README.md

@@ -37,6 +37,11 @@ It also backtracks to older cluster when requested for not existing object on
 target cluster. This kind of events are logged, so it's possible to rebalance
 clusters in background.

+### Multi cloud cost optimization
+While all objects has to be written in every storage from shard, not all storages
+has to be read. With load balancing and storage prioritization akubra will peak


wookie41 · 2018-11-02T11:29:13Z

balancing/balance_breaker.go

+	h.t0 = newT0
+	for _, series := range h.data {
+		for _, value := range series.data {
+			value.date = value.date.Add(delta)


This loop seems costly. We could instead accumulate the delta and later, when someone calls ValueRangeFun just apply to delta to timeStart and timeEnd once at the beginning of the method

This reverts commit 73be064.

wookie41 · 2018-12-20T10:56:12Z

balancing/balance_breaker.go

 	open := ms.Breaker.Record(duration, success)
-	log.Debugf("MeasuredStorage %s: Request %s took %s was successful: %t, opened breaker %t\n", ms.Name, reqID, duration, success, open)
+	log.Debugf("s %s: Request %s took %s was successful: %t, opened breaker %t\n", ms.Name, reqID, duration, success, open)


You replaced MeasuredStorage with s in the log message. Is this intentional?

…lly shutsdown server

Michal Jarco added 6 commits October 3, 2018 09:25

Work in progress

646852c

Merge remote-tracking branch 'origin/v1.0' into balancing

7ca93a6

Working breaker

67e3852

Integration

5631bc0

Balancing integration

bc6098e

Configuration language adjustments

ca8f512

ksitak reviewed Oct 16, 2018

View reviewed changes

balancing/balance_breaker.go Outdated Show resolved Hide resolved

Michal Jarco added 5 commits October 16, 2018 08:31

Configuration scheme, validation and test fixes

c0a686d

Silence startup messages

2f13508

Renamed files

452efe9

Add some random data to node stats, so balancing will be more even on…

ded1321

… start up

Balancer and breaker metrics

3847aea

wookie41 suggested changes Oct 18, 2018

View reviewed changes

mjarco commented Oct 19, 2018

View reviewed changes

balancing/balance_breaker.go Outdated Show resolved Hide resolved

Michal Jarco added 5 commits October 19, 2018 11:24

Linguistic improvements, node reactivation goroutine

64b4e9a

Typo fix

5d65ae6

Fixed: node reactivation when circut breaker closes

58d540c

Fixes

470741f

Fixes

7e5e6ec

wookie41 reviewed Nov 2, 2018

View reviewed changes

balancing/balance_breaker.go Outdated

return breaker.ShouldOpen()

}

// ShouldOpen checks if breaker should be open

Copy link

Contributor

wookie41 Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

wookie41 reviewed Nov 2, 2018

View reviewed changes

Michal Jarco added 8 commits November 9, 2018 07:10

Concurency test fixed

723a025

Benchmark added

73be064

Fixed: return errorRate not successRate

7ce573a

Revert "Benchmark added"

5f53768

This reverts commit 73be064.

40X are no longer backend errors

f54769a

Merge remote-tracking branch 'origin/balancing' into HEAD

d76597c

Debug remove

1b94974

PR fixes

8478a75

wookie41 reviewed Dec 20, 2018

View reviewed changes

Michal Jarco added 4 commits December 21, 2018 11:03

Signals handling SIGHUP triggers configuration reload, SIGINT gracefu…

22537e1

…lly shutsdown server

Builf fixes

a27c1b0

Build fixes

b7c8c25

Merge branch 'soft-reload' into balancing

458d581

mjarco merged commit 2e6aaa9 into v1.0 Jan 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balancing #107

Balancing #107

mjarco commented Oct 12, 2018

wookie41 Oct 18, 2018

wookie41 Oct 18, 2018

wookie41 Oct 18, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Dec 20, 2018

Balancing #107

Balancing #107

Conversation

mjarco commented Oct 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wookie41 Nov 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wookie41 Nov 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wookie41 Nov 2, 2018 • edited Loading

Choose a reason for hiding this comment

wookie41 Nov 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Nov 2, 2018 •

edited

Loading

wookie41 Nov 2, 2018 •

edited

Loading