Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposes metrics ports on pods in order to enable GCP Managed Prometheus #2712

Merged
merged 9 commits into from
Sep 1, 2022

Conversation

austin-space
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / Why we need it: GKE managed prometheus(GMP) makes some interesting decisions in how it implements a prometheus operator-like system. The most notable change is that there is no concept of a ServiceMonitor in GMP, only a PodMonitoring custom resource. As the name implies this monitor only has visibility on pods, not onto services. Fortunately I can setup PodMonitoring resources that closely imitate the ServiceMonitor resources in the included helm charts, the only issue is that the metrics port for the allocator service is not defined on the pod itself, just on the metrics service.

This is the least intrusive way of introducing this. Alternatively a GMP flag could be included in the values and I could add all of the pod monitors as well.

Which issue(s) this PR fixes: none that I'm aware of

Special notes for your reviewer:

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: d752667a-1771-401b-a365-14d6f794b995

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2712/head:pr_2712 && git checkout pr_2712
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.26.0-7d743f8-amd64

@roberthbailey roberthbailey added the kind/feature New features for Agones label Aug 19, 2022
@markmandel
Copy link
Member

Thanks for digging into this! I'm now also learning how Google Cloud Managed Prometheus works!

i think in general this looks like the correct approach (exposing the container ports on the relevant pods)

It's worth noting that we have a metrics endpoint on the controller as well as the allocation system (controller's service monitor) - so this will need to cover both.

🤔 another thought I had, not sure if it sticks, but does it make any sense to allow someone to add arbitrary container ports to either the controller or the allocation Pods? Much in the same way we do agones.controller.annotations for example.

Just trying to think more generically, rather than a specific solution for this specific problem.

(Also docs are good 😄 )

WDYT?

@austin-space
Copy link
Contributor Author

Hey Mark, the controller metrics port is actually already exposed since it's the same port that is used for its other http traffic. There's probably some work that can be done to unify the way that these ports are setup/exposed since some have names and port numbers coming from the values.yaml file, and others are hardcoded in the templates. I can update the docs in this PR later today or tomorrow when I get some time. For docs are you thinking it would be better to have a quick blurb on GMP(something like "if you are using Google managed prometheus, you will need to set ... and set up your own podmonitoring resources(link to google docs)") or something more? Also should the podmonitoring resources required for managed prometheus be included in the helm chart(behind a usingGoogleManagedPrometheus flag or something)? I don't know how much you do or don't want to avoid provider specific stuff sneaking in there.

Thinking about the more generic solution: outside of this port, I'm not sure what other ports even have something interesting running on them and aren't exposed on the pod by default(as opposed to a service) so I don't know if that would be of too much value at this point in time(but might be if another one of these crops up).

@markmandel
Copy link
Member

That 100% makes sense.

My thought here then, would be to just leave the container port always open (i.e. don't bother putting in a helm configuration variable). It's inside the cluster anyway, so I don't think it really matters. @roberthbailey WDYT?

Regarding documentation - good point re: platform specific.

I'm thinking something generic like "if your metric collection agent needs to scrape container ports directly (such as with Google Cloud Managed Prometheus), the ports you would need to scrape can be found at {insert details}"

How does that sound?

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: a2f03f1f-27fe-4020-900b-2138ede895d4

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2712/head:pr_2712 && git checkout pr_2712
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.26.0-ba90653-amd64

@roberthbailey
Copy link
Member

My thought here then, would be to just leave the container port always open (i.e. don't bother putting in a helm configuration variable). It's inside the cluster anyway, so I don't think it really matters. @roberthbailey WDYT?

I agree that it should be fine to just add the port all the time. If nothing scrapes it then it doesn't do any harm, but it's there if someone wants to scrape it (using GMP or a different prometheus scraper that uses PodMonitoring) without needing to re-install / re-configure Agones later.

@markmandel
Copy link
Member

Sounds like we have consensus!

If we could add some docs with a feature tag around it for the next version, that would be perfect 👍🏻

@markmandel
Copy link
Member

Just a heads up, we are one week away from our release candidate, so if you have time to implement the above comments, it would be awesome to get that in for that release.

@google-oss-prow google-oss-prow bot added size/S and removed size/XS labels Aug 31, 2022
@austin-space
Copy link
Contributor Author

Sorry, the week got away from me. I think everything should be good now. Let me know if you want any changes with the little blurb I added in the docs.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 797931f1-ddf6-4050-bfd0-96bdc665d58c

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 9e75dc67-aacc-4e7b-aeda-2bf2ec7f35df

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2712/head:pr_2712 && git checkout pr_2712
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.26.0-eaba510-amd64

Co-authored-by: Mark Mandel <markmandel@google.com>
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 286bc54c-9343-429c-926b-3974ee2a86f1

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member

make[1]: Leaving directory '/workspace/build'
sort /workspace/install/yaml/install.yaml > /tmp/agones-install/install.current.yaml.sorted
diff /tmp/agones-install/install.yaml.sorted /tmp/agones-install/install.current.yaml.sorted
13943a13944
>           containerPort:  8080
14156a14158
>         - name: http

oooh, I see what it is.

If you could run make gen-install in the ./build directory (assuming you have Make and Docker installed) that will refresh the install.yaml to have your changes, and this should be good to go 👍🏻

@markmandel
Copy link
Member

Lemme know if you run into any issues doing that, and I can do it on my end, and submit a PR to your PR 😄

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: austin-space, markmandel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot removed the lgtm label Sep 1, 2022
@google-oss-prow
Copy link

New changes are detected. LGTM label has been removed.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 0af444b2-ccef-4959-8cd2-5759e104261c

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2712/head:pr_2712 && git checkout pr_2712
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.26.0-0dc7715-amd64

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 4915143f-ae06-4744-a3a1-9b76c5c9b1fe

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2712/head:pr_2712 && git checkout pr_2712
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.tag=1.26.0-3a9841b-amd64

@markmandel markmandel merged commit 2e9a43e into googleforgames:main Sep 1, 2022
@SaitejaTamma SaitejaTamma added this to the 1.26.0 milestone Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved kind/feature New features for Agones size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants