Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

allow metrictank to run as graphite 1.0 cluster node #611

Merged
merged 2 commits into from
Apr 25, 2017
Merged

Conversation

woodsaj
Copy link
Member

@woodsaj woodsaj commented Apr 20, 2017

Allow metrictank to sit behind a stock graphite-web 1.0.0 deployment
as a cluster_server. This allows metrictank to be used without the
need of using a custom finder plugin.

To support this feature the following changes have been made

  • support pickle format for metrics/find calls
  • support pickle format for render calls

@woodsaj woodsaj requested review from Dieterbe and replay April 20, 2017 17:50
@woodsaj woodsaj force-pushed the pickle branch 2 times, most recently from 2b50d32 to 123d42e Compare April 21, 2017 17:41
@DanCech
Copy link
Contributor

DanCech commented Apr 24, 2017

Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial feedback. I want to dig a little deeper still

@@ -35,6 +35,9 @@ services:
- "8080:80"
environment:
GRAPHITE_CLUSTER_SERVERS: metrictank:6060
WSGI_PROCESSES: 4
WSGI_THREADS: 25
GRAPHITE_STATSD_HOST: statsdaemon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this environment have specific wsgi settings and the others don't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRAPHITE_* are graphite settings, defined in graphite's local_settings.py
WSGI_* are wsgi settings, defined in the WSGIDaemonProcess directive in Apache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i mean is, the reason behind the different docker environments, is that they all have something unique to them, and usually relating to MT. eg:

  • docker-standard : standard docker env. 1 graphite, 1 MT, official binaries and stock configs
  • docker-dev: same, but custom MT binary and MT config files mounted in (though the defaults are what is in git, so equivalent to docker-standard by default, which makes it easy to use git diff to see the difference)
  • docker-cluster : a cluster of MT's

So I don't think we should commit any changes that make the graphite in docker-dev behave any different from any other of the docker environments. maybe those settings be set so that they apply to all our docker environments

},
{
"refId": "C",
"target": "alias(maxSeries(stats.docker-env.timers.view.graphite.render.views.renderView.*.upper), 'upper')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use this dashboard elsewhere? then it would be better to create an environment selector instead of hardcoding it. (as a general principle, at least for the MT dashboard and ideally others in this repo too, we should keep the dashboard in git in sync with what we use on our monitoring server, so that we leverage updates everywhere)

Copy link
Member Author

@woodsaj woodsaj Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dashboard cant be used anywhere else.

@@ -15,7 +15,8 @@ type GraphiteRender struct {
From string `json:"from" form:"from"`
Until string `json:"until" form:"until"`
To string `json:"to" form:"to"`
Format string `json:"format" form:"format" binding:"In(,json,msgp)"`
Format string `json:"format" form:"format" binding:"In(,json,msgp,pickle)"`
LocalOnly bool `json:"local" form:"local"` //this is set to true by graphite-web when it passes request to cluster servers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"local" is ambiguous here. it's easy to mistake this for an option that controls whether MT talks to clustering peers. (especially in local non-saas deployments where it may seem to make sense to have graphite talk to all MT cluster peers directly, which it seems like we should be discouraging since MT should do a more efficient job at talking to peers)
here you're using it to control proxying back to graphite, but to the reader this is not clear. btw does graphite issue requests to MT containing all processing calls as well? I thought graphite queries for raw data and then does processing itself.

Copy link
Member Author

@woodsaj woodsaj Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphite should only send queries with all functions stripped from the target param. (ie, it just sends target=<pathExpr>&local=true)

"local" is ambiguous here.

Nothing we can do about that. "local" is the query param that graphite sends.

The only reason we handle this query param is to prevent an infinite loop if either graphite or metrictank incorrectly parse the target. if graphite sends a target with a pathExpr wrapped in a function, or MT thinks the target cant be processed locally and needs to be proxied, bad things will happen.

We can change the struct field to "MetrictankOnly" if that helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah MTOnly or NoProxy or something

b.ResetTimer()
var resp *Pickle
for n := 0; n < b.N; n++ {
resp = NewPickle(200, models.SeriesPickleFormat(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious what the cost of models.SeriesPickleFormat itself is, that would be very interesting to also have as a separate benchmark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added the benchmark.

@@ -147,4 +146,4 @@ services:
- "443:443"
volumes:
- "./carbon.conf:/opt/graphite/conf/carbon.conf"
- "./storage-schemas.conf:/opt/graphite/conf/storage-schemas.conf"
- "./storage-schemas.conf:/opt/graphite/conf/storage-schemas.conf"
Copy link
Contributor

@Dieterbe Dieterbe Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your editor does something weird where it often removes newlines at the end of the files. can you make it stop doing this as sometimes this leads to chaos

"url":"http://localhost:6060",
"access":"direct",
"isDefault":false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this datasource? the point of this environment is to make it easy to test MT directly .

@@ -102,7 +101,6 @@ In the metrics tab you should see a bunch of metrics already in the root hierarc
* `metrictank.usage`: usage metrics reported by metrictank. See
[Usage reporting](https://github.com/raintank/metrictank/blob/master/docs/usage-reporting.md)
It may take a few minutes for the usage metrics to show up.
* `stats`: these are metrics coming from graphite-api, aggregated by statsdaemon and sent back to metrictank every second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this line instead of just s/graphite-api/graphite/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a note here that this won't show up until statsdaemon receives some stats, there's only a few non-obvious cases in which this happens (user adds graphite-direct datasource, user sends to statsdaemon manually, or user does requests to MT which MT needs graphite for)

* Url: `http://localhost:8080`
* Access: `direct` (not `proxy`)
* Url: `http://graphite`
* Access: `proxy`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep using the direct address by default, as that makes it trivial for people to just interact with the endpoint from their host system as well. e.g. they might want to do curl requests or http benchmark tools on it without having to worry about the docker dns stuff. or i guess we could leave it proxied but just mention "you can also connect directly to localhost:8080 from the host"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphite-web does not support CORS. so direct does not work.

users can still connect directly to the datasource on localhost:8080.

"url":"http://localhost:8080",
"access":"direct",
"isDefault":true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we remove the mt-graphite-api DS but why don't we add a mt-graphite DS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed docker/docker-dev-direct-single-tenant completely, as it is no different to docker-dev.
All docker envs that use direct carbon input are configured as single-tenant (as all metrics ingested use orgId=1). the cluster envs, use multiTenant metrictank, but have graphite configured in singleTenant mode so it sends the x-org-id:1 header, preventing the need for running tsdb-gw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that sounds good. but I still think it's useful to have 2 datasources (one to talk to graphite, and one to talk to MT). maybe not in the standard one, but definitely in docker-dev.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. I ensured all envs have a graphite and metrictank datasource, with the metrictank one set as default.

@@ -1,4 +1,4 @@
# Graphite-api
# Graphite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run a grep -Ri graphite-api in the root of the MT repo. you'll find plenty of places that need to be updated still. like the documentation index in the readme,

@@ -25,7 +23,7 @@ instance or a cluster. Metrictank will also respond to queries: if the data is
RAM, and older data is fetched from cassandra. This happens transparantly.
Metrictank maintains an index of metrics metadata, for all series it sees.
You can use an index entirely in memory, or backed by Cassandra for persistence.
You'll typically query metrictank by querying graphite-api which uses the graphite-metrictank plugin to talk
You'll typically query metrictank by querying graphite which uses the clustering capabilities of graphite to talk
to metrictank. You can also query metrictank directly but this is experimental and too early for anything useful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You can also query metrictank directly but this is experimental and too early for anything useful." this is not true anymore. in fact querying MT directly is now the preferred way. Let's replace this sentence, along with the "You'll typically..." sentence by something like: "You can query metrictank directly (it has fast, but limited built-in processing and will fallback to graphite when needed) or you can also just query graphite which will always use graphite's processing but use metrictank as a datastore" except I'm not sure if the latter statement is true.

@@ -31,7 +31,7 @@ The stack will listen on the following ports:
* 2003 tcp (metrictank's carbon input)
* 3000 tcp (grafana's http port)
* 6060 tcp (metrictank's internal endpoint)
* 8080 tcp (the graphite api query endpoint)
* 8080 tcp (the graphite query endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you changed it to port 80

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the container listens on 80, but it is bound to port 8080 on localhost.

@@ -4,11 +4,9 @@

We'll go over these in more detail below.

* Cassandra. We run and recommend 3.0.8 .
* Cassandra. We run and recommend 3.X .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we be more specific? 3.x includes old versions that don't have some things we need (like TWCS)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep you are correct. I wrongly assumed all 3.X versions had twcs, but it was not added till 3.8


Configure graphite with the following settings in local_settings.py
```
CLUSTER_SERVERS = ['localhost:6060'] # update to match the host:port metrictank is running on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point in the guide, MT is not running yet, and the user has no idea yet about how and on which port he will run MT, as that's only at the end of the guide

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still a big improvement on the previous instructions, which didnt even mention configuring graphite-api.

refactoring this install guide is outside the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all i'm saying is you're introducing a sentence that is confusing. in fact, the rest of the guide assumes mt will run on localhost:6060 so let's remove the comment, as it instructs people to do something that they don't know about, and they don't need to do it anyway.

@@ -115,7 +117,7 @@ The log - should you need it - is at /var/log/cassandra/system.log

## Set up statsd

You can optionally statsd or a statsd-compatible agent for instrumentation of graphite-api.
You can optionally statsd or a statsd-compatible agent for instrumentation of your applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too vague. maybe make it "of graphite and optionally any of your other applications"


When you hit save, Grafana should succeed in talking to the data source.

![Add data source screenshot](https://raw.githubusercontent.com/raintank/metrictank/master/docs/assets/add-datasource-docker.png)

Note: it also works with `proxy` mode but then you have to enter `http://graphite-api:8080` as uri.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a good place to mention that users can still connect directly to the datasource on localhost:8080.

@Dieterbe
Copy link
Contributor

sweet, getting much better perf out of graphite.
just one minor thing left, #611 (comment), otherwise, this LGTM

@@ -102,8 +103,7 @@ In the metrics tab you should see a bunch of metrics already in the root hierarc
* `metrictank.usage`: usage metrics reported by metrictank. See
[Usage reporting](https://github.com/raintank/metrictank/blob/master/docs/usage-reporting.md)
It may take a few minutes for the usage metrics to show up.
* `stats`: these are metrics coming from graphite-api, aggregated by statsdaemon and sent back to metrictank every second.

* `stats`: these are metrics coming from gaphite, aggregated by statsdaemon and sent back to metrictank every second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gaphite/graphite/

@woodsaj
Copy link
Member Author

woodsaj commented Apr 25, 2017

thanks @Dieterbe all issues raised have been addressed. please approve an LGTM

woodsaj and others added 2 commits April 25, 2017 11:08
Allow metrictank to sit behind a stock graphite-web 1.0.1 deployment
as a cluster_server.  This allows metrictank to be used without the
need of using a custom finder plugin.
To support this feature the following changes have been made
- support pickle format for metrics/find calls
- support pickle format for render calls

Docs and docker Envs have also been updated to reflect the change
from using graphite-api to using graphite-web
@Dieterbe Dieterbe merged commit da6c0c2 into master Apr 25, 2017
@woodsaj woodsaj deleted the pickle branch April 25, 2017 15:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants