Optionally disable cassidx update #565

replay · 2017-03-15T18:49:48Z

That's one part of #547

Dieterbe · 2017-03-16T11:25:26Z

if !updateCassIdx there's no need to initialize the writeQueues and start processWriteQueue routines. (watch out Stop() needs to be adjusted too, closing a nil channel would cause a panic)

Dieterbe · 2017-03-16T11:40:55Z

idx/cassandra/cassandra.go

@@ -390,6 +393,9 @@ func (c *CasIdx) Delete(orgId int, pattern string) ([]schema.MetricDefinition, e
 }

 func (c *CasIdx) deleteDef(def *schema.MetricDefinition) error {
+	if !updateCassIdx {
+		return nil
+	}


I think checks like these should be executed in the caller, e.g. in the public functions, not the internal "worker" functions. This is also more efficient. E.g. In Prune we can simply skip the entire loop of many deleteDef calls.

Dieterbe

my comments aside. I think this looks good. this will be very useful to deploy. I'm not sure if the 2nd "remove duplication" commit is still worth it once this is rebased on top of the changes that simplified the code already a bit. Note that you also need to update metrictank-sample.ini and then run scripts/sync-configs.sh

replay · 2017-03-16T16:50:16Z

Cool, I'll update that. As a general question: If there are timing-stats collected about how long it takes to update the index, and cassandra index updates are disabled, should these stats still be collected even though now they only include the memory index updates? F.e: https://github.com/raintank/metrictank/blob/a4e8ff546c1a0c74c09cd938d72523ae0f4306ba/idx/cassandra/cassandra.go#L391

I think the should. Obviously the values will drop by a lot once cassandra doesn't need to be updated, but the stats are separated per instance anyway so they shouldn't skew the other results.

replay · 2017-03-16T18:26:30Z

this would be ready for review again

Dieterbe · 2017-03-16T19:37:54Z

docker/docker-cluster/metrictank.ini

@@ -271,6 +271,8 @@ write-queue-size = 100000
 max-stale = 0
 #Interval at which the index should be checked for stale series.
 prune-interval = 3h
+# disable cassandra index updates (for read nodes)


this comment is the inverse of what the option actually does.

suggestion:
# synchronize index changes to cassandra. not all your nodes need to do this.

Dieterbe · 2017-03-16T19:57:12Z

idx/cassandra/cassandra.go

-	c.writeQueue <- writeReq{recvTime: time.Now(), def: def}
-	statAddDuration.Value(time.Since(pre))
+
+	return


these two lines seem pointless

Dieterbe · 2017-03-16T19:59:03Z

idx/cassandra/cassandra.go

-				c.MemoryIdx.AddOrUpdateDef(def)
-				c.writeQueue <- writeReq{recvTime: time.Now(), def: def}
-				statUpdateDuration.Value(time.Since(pre))
+				updateIdx = true


we can replace 3 lines with one, if we do updateIdx = (existing.LastUpdate < oldest.Unix())

Dieterbe · 2017-03-16T20:01:41Z

idx/cassandra/cassandra.go

-		go c.processWriteQueue()
+	if updateCassIdx {
+		for i := 0; i < numConns; i++ {
+			c.wg.Add(1)


we can move this to before the loop and make it a single call: c.wg.Add(numConns)

replay · 2017-03-16T20:41:59Z

Ok, updated everything as requested

Dieterbe · 2017-03-16T20:46:31Z

idx/cassandra/cassandra.go

@@ -92,7 +92,7 @@ func ConfigSetup() *flag.FlagSet {
 	casIdx.DurationVar(&timeout, "timeout", time.Second, "cassandra request timeout")
 	casIdx.IntVar(&numConns, "num-conns", 10, "number of concurrent connections to cassandra")
 	casIdx.IntVar(&writeQueueSize, "write-queue-size", 100000, "Max number of metricDefs allowed to be unwritten to cassandra")
-	casIdx.BoolVar(&updateCassIdx, "update-cassandra-index", true, "disable cassandra index updates (for read nodes)")
+	casIdx.BoolVar(&updateCassIdx, "update-cassandra-index", true, "synchronize index changes to cassandra. not all your nodes need to do this.")
 	casIdx.DurationVar(&updateInterval, "update-interval", time.Hour*3, "frequency at which we should update the metricDef lastUpdate field.")


I think we should clarify that you can also get instant updates. here and in the config comments and docs.
in that case, what value should the user specify? 0s ?

yes, just specify 0s or any other unit. ok, i'll add that

updated the description there and in the example configs.

Dieterbe · 2017-03-16T21:37:16Z

ok if you can confirm '0s' works, then this LGTM .

replay · 2017-03-16T22:18:48Z

Looking good:

I modified idx/cassandra/cassandra.go such as:

mst@ubuntu:~/go/src/github.com/raintank/metrictank$ git diff idx
diff --git a/idx/cassandra/cassandra.go b/idx/cassandra/cassandra.go
index 8c58537..1a63d1d 100644
--- a/idx/cassandra/cassandra.go
+++ b/idx/cassandra/cassandra.go
@@ -242,8 +242,10 @@ func (c *CasIdx) AddOrUpdate(data *schema.MetricData, partition int32) {
                if existing.Partition == partition {
                        var oldest time.Time
                        if updateInterval > 0 {
+                               fmt.Println(fmt.Sprintf("updating if update interval has been exceeded %s", data.Name))
                                oldest = time.Now().Add(-1 * updateInterval).Add(-1 * time.Duration(rand.Int63n(updateInterval.Nanoseconds()*int64(updateFuzzyness*100)/100)))
                        } else {
+                               fmt.Println(fmt.Sprintf("updating because update interval is 0 %s", data.Name))
                                oldest = time.Now()
                        }
                        updateIdx = (existing.LastUpdate < oldest.Unix())

Then I've set update-interval = 0s:

mst@ubuntu:~/go/src/github.com/raintank/metrictank$ git diff scripts/config/metrictank-docker.ini
diff --git a/scripts/config/metrictank-docker.ini b/scripts/config/metrictank-docker.ini
index 9599279..6b8cee7 100644
--- a/scripts/config/metrictank-docker.ini
+++ b/scripts/config/metrictank-docker.ini
@@ -274,7 +274,7 @@ prune-interval = 3h
 # synchronize index changes to cassandra. not all your nodes need to do this.
 update-cassandra-index = true
 #frequency at which we should update the metricDef lastUpdate field, use 0s for instant updates
-update-interval = 4h
+update-interval = 0s
 #fuzzyness factor for update-interval. should be in the range 0 > fuzzyness <= 1. With an updateInterval of 4hours and fuzzyness of 0.5, metricDefs will be updated every 4-6hours.
 update-fuzzyness = 0.5
 # enable SSL connection to cassandra

Then I started MT and grepped for some:

build/metrictank -config scripts/config/metrictank-docker.ini | grep some

Then I used fakemetrics to produce metrics for 10 keys:

~$ fakemetrics -flushPeriod 100 -orgs 1 -keys-per-org 10   -carbon-tcp-address 127.0.0.1:2003

And I'm continuously getting the following output from the MT with grep:

mst@ubuntu:~/go/src/github.com/raintank/metrictank$ build/metrictank -config scripts/config/metrictank-docker.ini | grep some
2017/03/16 15:11:57 [I] Metrictank starting. Built from 0.7.0-114-gb3541aa - Go version go1.7
2017/03/16 15:11:57 [I] CLU Start: Starting cluster on 0.0.0.0:7946
2017/03/16 15:11:57 [I] CLU manager: Node ubuntu with address 127.0.0.1 has joined the cluster
2017/03/16 15:11:57 [I] initializing cassandra-idx. Hosts=127.0.0.1:9042
2017/03/16 15:11:57 [I] API Listening on: http://:6060/
2017/03/16 15:11:57 [I] cassandra-idx Rebuilding Memory Index from metricDefinitions in Cassandra
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx writeQueue handler started.
2017/03/16 15:11:57 [I] cassandra-idx Rebuilding Memory Index Complete. Imported 539. Took 9.469262ms
2017/03/16 15:11:57 [I] metricIndex initialized in 56.268525ms. starting data consumption
2017/03/16 15:11:57 [I] carbon-in: listening on :2003/tcp
2017/03/16 15:11:57 [I] CLU manager: Node ubuntu at 127.0.0.1 has been updated - {"name":"default","version":"0.7.0-114-gb3541aa","primary":true,"primaryChange":"2017-03-16T15:11:57.589988972-07:00","state":1,"priority":0,"started":"2017-03-16T15:11:57.588685675-07:
00","stateChange":"2017-03-16T15:11:57.589989053-07:00","partitions":[0],"apiPort":6060,"apiScheme":"http","updated":"2017-03-16T15:11:57.700520943-07:00","remoteAddr":""}
2017/03/16 15:11:59 [I] stats now connected to localhost:2003
updating because update interval is 0 some.id.of.a.metric.1
updating because update interval is 0 some.id.of.a.metric.2
updating because update interval is 0 some.id.of.a.metric.3
updating because update interval is 0 some.id.of.a.metric.4
updating because update interval is 0 some.id.of.a.metric.5
updating because update interval is 0 some.id.of.a.metric.6
updating because update interval is 0 some.id.of.a.metric.7
updating because update interval is 0 some.id.of.a.metric.8
updating because update interval is 0 some.id.of.a.metric.9
updating because update interval is 0 some.id.of.a.metric.10
updating because update interval is 0 some.id.of.a.metric.1
updating because update interval is 0 some.id.of.a.metric.2
updating because update interval is 0 some.id.of.a.metric.3
updating because update interval is 0 some.id.of.a.metric.4
updating because update interval is 0 some.id.of.a.metric.5
updating because update interval is 0 some.id.of.a.metric.6
updating because update interval is 0 some.id.of.a.metric.7
updating because update interval is 0 some.id.of.a.metric.8
updating because update interval is 0 some.id.of.a.metric.9
updating because update interval is 0 some.id.of.a.metric.10
updating because update interval is 0 some.id.of.a.metric.1
updating because update interval is 0 some.id.of.a.metric.2
updating because update interval is 0 some.id.of.a.metric.3
updating because update interval is 0 some.id.of.a.metric.4
updating because update interval is 0 some.id.of.a.metric.5
updating because update interval is 0 some.id.of.a.metric.6
updating because update interval is 0 some.id.of.a.metric.7
updating because update interval is 0 some.id.of.a.metric.8
updating because update interval is 0 some.id.of.a.metric.9
updating because update interval is 0 some.id.of.a.metric.10

replay mentioned this pull request Mar 16, 2017

Idx adds updates can only succeed #566

Merged

Dieterbe reviewed Mar 16, 2017

View reviewed changes

Dieterbe suggested changes Mar 16, 2017

View reviewed changes

replay force-pushed the optionally_disable_cassidx_update branch 2 times, most recently from fa12a97 to 3a49582 Compare March 16, 2017 17:41

replay added 3 commits March 16, 2017 10:48

make it possible to disable cass idx updates

7cd9329

update docs to add update-cassandra-index

b459a68

refactor cassandra-idx AddOrUpdate to remove duplication

972cfd8

replay force-pushed the optionally_disable_cassidx_update branch from 3a49582 to 972cfd8 Compare March 16, 2017 18:12

Dieterbe reviewed Mar 16, 2017

View reviewed changes

bunch of minor fixes according to comments

b3541aa

replay force-pushed the optionally_disable_cassidx_update branch from 7f9ac15 to b3541aa Compare March 16, 2017 21:14

Dieterbe approved these changes Mar 16, 2017

View reviewed changes

replay changed the title ~~[WIP] Optionally disable cassidx update~~ Optionally disable cassidx update Mar 16, 2017

Dieterbe approved these changes Mar 16, 2017

View reviewed changes

replay merged commit 7ebbd51 into master Mar 16, 2017

Dieterbe deleted the optionally_disable_cassidx_update branch September 18, 2018 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally disable cassidx update #565

Optionally disable cassidx update #565

replay commented Mar 15, 2017

Dieterbe commented Mar 16, 2017 •

edited

Loading

Dieterbe Mar 16, 2017

Dieterbe left a comment

replay commented Mar 16, 2017

replay commented Mar 16, 2017

Dieterbe Mar 16, 2017

Dieterbe Mar 16, 2017

Dieterbe Mar 16, 2017

Dieterbe Mar 16, 2017

replay commented Mar 16, 2017

Dieterbe Mar 16, 2017

replay Mar 16, 2017

replay Mar 16, 2017

Dieterbe commented Mar 16, 2017

replay commented Mar 16, 2017

Optionally disable cassidx update #565

Optionally disable cassidx update #565

Conversation

replay commented Mar 15, 2017

Dieterbe commented Mar 16, 2017 • edited Loading

Choose a reason for hiding this comment

Dieterbe left a comment

Choose a reason for hiding this comment

replay commented Mar 16, 2017

replay commented Mar 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay commented Mar 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Mar 16, 2017

replay commented Mar 16, 2017

Dieterbe commented Mar 16, 2017 •

edited

Loading