Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balancing #107

Merged
merged 28 commits into from
Jan 9, 2019
Merged

Balancing #107

merged 28 commits into from
Jan 9, 2019

Conversation

mjarco
Copy link
Contributor

@mjarco mjarco commented Oct 12, 2018

No description provided.

balancing/balance_breaker.go Show resolved Hide resolved
balancing/balance_breaker.go Show resolved Hide resolved
balancing/balance_breaker.go Outdated Show resolved Hide resolved
balancing/balance_breaker.go Outdated Show resolved Hide resolved
balancing/balance_breaker.go Show resolved Hide resolved
}
}

type lenLimitCounter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lenDelimitedCounter

balancing/balance_breaker.go Outdated Show resolved Hide resolved
return tracker.state, true
}

changed := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this, we can just return false/true everywhere below

balancing/balance_breaker.go Show resolved Hide resolved
return exceeded
}

func (breaker *NodeBreaker) checkStateHalfOpen(exceeded bool) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHalfOpen

balancing/balance_breaker.go Outdated Show resolved Hide resolved
}
if !meter.IsActive() && active {
meter.histogram.shiftData(meter.now().Sub(meter.inActiveSince))
meter.inActiveSince = time.Time{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nil instead oftime.Time, when switching from inactive to active?

state *openStateTracker
}

// Record collects call data and returns bool if breaker should be open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open -> opened

return breaker.ShouldOpen()
}

// ShouldOpen checks if breaker should be open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

}

func (tracker *openStateTracker) currentDelay() time.Duration {
multiplier := int(math.Pow(2, tracker.closeIteration))
Copy link
Contributor

@wookie41 wookie41 Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeIteration is never decremented/reset, even if a backend is healthy for a long time.
An interesting case popped up in my head.

  1. initially the backend is not doing so well and its closeIteration gets incremented until it reaches some high value

  2. the backend becomes healthy and stays healthy for a long time, but it's closeIteration is already big

  3. the backend misbehaves for a brief second and gets taken out of the pool of healthy nodes for a long time, due to it's closeIteration

  4. we wait a long time until we plug the backend back in, even though it' already healthy and might provide us with the best timings

func (h *histogram) cellsCount() int {
return int(math.Ceil(float64(h.retention)/float64(h.resolution))) + 1
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the methods of histogram are protected by a mutex, but some are not. shiftData mutates the state heavily, yet it doesn't lock the data it mutates. If (or when)shiftData and unshiftData run concurrently bad things may happen.

}

func (series *dataSeries) ValueRangeFun(timeStart, timeEnd time.Time, fun func(*timeValue)) []float64 {
dataRange := []float64{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataRange is unused

}

// TimeSpent returns float64 repesentation of time spent in execution
func (meter *CallMeter) TimeSpent() float64 {
Copy link
Contributor

@wookie41 wookie41 Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be too expensive to calculate?
It happens every time we want to know the node's weight, which is basically every call. If there are a lot rps, then each call we have to traverse each series in a histogram for all of the active nodes to elect a node.
We could simply aggregate this instead of calculating.

return ms.Node.IsActive()
}

func raportMetrics(rt http.RoundTripper, since time.Time, open bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report

}

// IsActive checks Breaker status propagates it to Node compound
func (ms *MeasuredStorage) IsActive() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like the fact that the method's name hides what it's actually doing and you have to take a look at the doc to know the whole truth.

@@ -16,7 +16,7 @@ type AccessMessageData struct {
Path string `json:"path"`
UserAgent string `json:"useragent"`
StatusCode int `json:"status"`
Duration float64 `json:"duration"`
Duration float64 `json:"duration_ms"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't json camel case? (duration_ms)

@@ -25,7 +25,7 @@ func (lrt *loggingRoundTripper) RoundTrip(req *http.Request) (resp *http.Respons
timeStart := time.Now()
resp, err = lrt.roundTripper.RoundTrip(req)

duration := time.Since(timeStart).Seconds()
duration := time.Since(timeStart).Seconds() * 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why multiply by 1000?

}

func (c *ShardClient) balancerRoundTrip(req *http.Request) (resp *http.Response, err error) {
notFoundNodes := []balancing.Node{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe its worth it to add entries to sync log for each of the nodes in notFoundNodes after we've found a node that was able to process the request?

@@ -37,6 +37,11 @@ It also backtracks to older cluster when requested for not existing object on
target cluster. This kind of events are logged, so it's possible to rebalance
clusters in background.

### Multi cloud cost optimization
While all objects has to be written in every storage from shard, not all storages
has to be read. With load balancing and storage prioritization akubra will peak
Copy link
Contributor

@wookie41 wookie41 Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pick the

h.t0 = newT0
for _, series := range h.data {
for _, value := range series.data {
value.date = value.date.Add(delta)
Copy link
Contributor

@wookie41 wookie41 Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop seems costly. We could instead accumulate the delta and later, when someone calls ValueRangeFun just apply to delta to timeStart and timeEnd once at the beginning of the method

open := ms.Breaker.Record(duration, success)
log.Debugf("MeasuredStorage %s: Request %s took %s was successful: %t, opened breaker %t\n", ms.Name, reqID, duration, success, open)
log.Debugf("s %s: Request %s took %s was successful: %t, opened breaker %t\n", ms.Name, reqID, duration, success, open)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You replaced MeasuredStorage with s in the log message. Is this intentional?

@mjarco mjarco merged commit 2e6aaa9 into v1.0 Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants