Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor process mappers into taggers. #178

Merged
merged 1 commit into from
Jun 8, 2015
Merged

Conversation

tomwilkie
Copy link
Contributor

  • Removed the cgroup mapper
  • Adds topology tagger

if err != nil {
log.Fatal(err)
log.Printf("warning: failed to start docker tagger: %v", err)

This comment was marked as abuse.

@tomwilkie tomwilkie force-pushed the taggers-over-mappers branch from dfb1fe4 to 3c32944 Compare June 4, 2015 14:23
@peterbourgon
Copy link
Contributor

LGTM

@peterbourgon peterbourgon removed their assignment Jun 4, 2015
@@ -24,7 +24,7 @@ var (
readFile = ioutil.ReadFile
)

func newPIDTree(procRoot string) (*pidTree, error) {
func newRealPIDTree(procRoot string) (*pidTree, error) {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie tomwilkie force-pushed the taggers-over-mappers branch from 3c32944 to 45017f0 Compare June 5, 2015 09:41
@peterbourgon
Copy link
Contributor

Address is an e.g. IP address-to-address topology. It's not new, it's a rename of the currently (badly) named Network.

I agree with your sentiment about the error handling... but remember that tagging is and will always be best effort — we can't assume that all Endpoint nodes will always have processes attached to them, for example.

We've uncovered something here, though. Take the docker tagger as an example. If we stick a docker_container_id key/val into every endpoint node, and then merge multiple nodes together in report.Render, as it stands now we're going to lose information, the last writer to the merged metadata will win. (In the current version this works by having an appendable list of Origins, no information is lost there.)

And I think this is what you were hinting at — so what about this —

  • Eliminate tagger concept altogether
  • Docker tagger becomes Container Topology generator (as you said)
  • Every unique docker container ID, name, etc. is represented in just one node in that topology
  • All cross-topology links are encoded as foreign keys, e.g. endpoint node has process_node_id FK, process node has container_node_id FK
  • report.Render then resolves those FKs to produce Origins list in a RenderableNode, kind of like we have now

Make sense?

edit: none of that is done in this PR

@tomwilkie
Copy link
Contributor Author

Yes this makes good sense. I though this was the plan all along. Yay code review!

I'm still not comfortable with this tagging interface, but I can't come up with anything nicer. We'll probably have to come back to it at some later point.

@peterbourgon
Copy link
Contributor

Well if you agree with above we can rm the interface...

@tomwilkie
Copy link
Contributor Author

Lets not introduce the containers topology in the this PR; it will make the rest of #123 unmergable.

But I think what you're saying is that, given above, tagger will not need to do foreign key lookups? And therefore, we can massively simplify this interface?

@peterbourgon
Copy link
Contributor

Yes I think that's correct.

@tomwilkie
Copy link
Contributor Author

Do you want to have a crack at that (if you're not busy on marketing)? I'm trying to break out some other changes right now.

@peterbourgon
Copy link
Contributor

If you can merge this and we do that other work in a separate PR — LGTM

I guess we're doing that above-mentioned work in this PR.

@peterbourgon
Copy link
Contributor

Motivation

Right now, the component that reads from Docker attaches metadata to existing nodes in the topologies. For example, if one container is responsible for 10 connections, the same set of Docker metadata is duplicated across those 10 nodes in the process topology — inefficient. And, if we try to merge those nodes, it's last-writer-wins on which information ends up in the merged node — buggy.

What we want instead is for the Docker component to populate its own topology, with each node representing a Docker container, and the metadata existing in that single place. Then, using the same example, if one container is responsible for 10 connections, those 10 nodes in the process topology get a single container_node_id key/val, pointing to the relevant node in the container topology. When we merge the nodes, we'll perform special handling on foreign-key-style key/vals, merging them via set-union.

This will also require changes to report rendering, so that it understands how to follow those links.

Gameplan

This change will occur in 4 stages:

  1. Introduce package spy, describing things that can spy on the host machine and produce topologies. Port current func spy to spy.Proc, producing process and address topologies
  2. Add spy.Docker, producing the (new) container topology, and populating the foreign keys to the other topologies as appropriate
  3. Update Render to perform foreign-key resolution
  4. Remove tagging interface and components

@peterbourgon
Copy link
Contributor

Step 3 presents a speed bump. If Render will be able to perform foreign-key resolution, it will need to have access to all topologies. That means it needs to be defined on the full Report, rather than a single Topology. Starting that work, it became apparent that it would involve porting a significant portion of the work already done in #123, specifically implementing something roughly equivalent to its Render function but with lots of small changes to keep the diff only to the immediately relevant components. Some of those are quite tricky, for example reverting the definition of the pseudo func, and therefore the behavior of pseudo node generation, to the previous broken/nondeterminstic mode (previously described here). And which would eventually be undone, anyway, as we continue this path.

This is enough of a problem that I'd like to ask for your opinion, @tomwilkie, on how to proceed. I'm afraid that it will be a waste of time. Perhaps there is a better strategy.

@tomwilkie
Copy link
Contributor Author

Some (friendly) comments:

  • Stop renaming things! At least, stop renaming them in the same PR as you introduce new features. Its just bloats the PRs and makes them harder to review.
  • Stop doing more than one thing per PR! We don't need to introduce the container topology in this PR. Lets get the taggers -> mappers refactoring in, then work on the next bit.

This change has already been open for too long; lets do the minimal thing to get it ready and then merge it. I think that minimal thing is to simplify / remove the tagger interface.

@peterbourgon
Copy link
Contributor

The mapper creates new topologies. I don't understand how we can make this PR without introducing a container topology?

@tomwilkie
Copy link
Contributor Author

Don't make the mapper create new topologies? Preserve the existing behaviour?

@peterbourgon
Copy link
Contributor

The end result of this PR is that Tagger operates on just a Report. @tomwilkie PTAL.

func (t *dockerTagger) Tag(r report.Report) report.Report {
for _, topology := range []*report.Topology{
&(r.Process),
&(r.Network),

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@peterbourgon peterbourgon force-pushed the taggers-over-mappers branch from 762b98d to 07d9525 Compare June 8, 2015 12:48
@tomwilkie
Copy link
Contributor Author

Please squash this down to one change and rebase on top of master.

- Reduce tagger interface to operate on reports
- Remove cgroup tagger
@peterbourgon peterbourgon force-pushed the taggers-over-mappers branch from 07d9525 to d486d10 Compare June 8, 2015 13:05
@tomwilkie
Copy link
Contributor Author

\o/ LGTM \o/

peterbourgon added a commit that referenced this pull request Jun 8, 2015
Refactor process mappers into taggers.
@peterbourgon peterbourgon merged commit fdbb371 into master Jun 8, 2015
@peterbourgon peterbourgon deleted the taggers-over-mappers branch June 8, 2015 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants