Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone problem with kube-state-metrics #279

Closed
Reamer opened this issue Mar 1, 2019 · 18 comments
Closed

Timezone problem with kube-state-metrics #279

Reamer opened this issue Mar 1, 2019 · 18 comments

Comments

@Reamer
Copy link
Contributor

Reamer commented Mar 1, 2019

Hi
I updated my cluster yesterday with openshift-ansible with this commit openshift/openshift-ansible@8c77207
This commit changed the timezone in api, controller and etcd.
kube-state-metrics pod is still in UTC timezone and I get exactly this issue: kubernetes/kube-state-metrics#500

What can I do? It is possible to set the timezone also in kube-state-metric pod?

Openshift-Version:

oc version
oc v3.11.0+b6db8e6-107
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://s-cp-lb-01.cloud.mycompany.de:443
openshift v3.11.0+d0c29df-98
kubernetes v1.11.0+d4cacc0

If you need more information let me know.

Reamer referenced this issue in openshift/openshift-ansible Mar 1, 2019
Master static pods are always running with UTC timezone, it would be not same timezone with worker nodes. It causes potential issues of time dependent works.
- Fix: https://bugzilla.redhat.com/show_bug.cgi?id=1674170
@bysnupy
Copy link
Member

bysnupy commented Mar 15, 2019

Hi @Reamer , sorry for late response.

Could you elaborate your issue ?
And if you can provide steps to reproduce the issue, please let me know.

@Reamer
Copy link
Contributor Author

Reamer commented Mar 15, 2019

Hi @bysnupy,

Steps to reproduce:

  • Install Openshift 3.11 with ansible-playbook on machines, which are not in time zone UTC. My machines are in time zone Europe/Berlin
    • the version of ansible-playbook must include your time zone change. Your changes are in branch release-3.11.
  • the project openshift-monitoring should created by default
    • cluster-montirong-operator will setup prometheus ( with configuration and rules), grafana, node-exporter and kube-state-metrics
  • Install also the logging components with ansible playbook
    • this will create the curator cron job

With a time zone in kube-api pod, the time for the next cron job is reported not in UTC any more. The application of kube-state-metrics but still calculates with UTC. Now I have the same issue, which is described here kubernetes/kube-state-metrics#500

@bysnupy
Copy link
Member

bysnupy commented Mar 16, 2019

It's a good point @Reamer,

I think it's not master control plane's timezone problem, each pod should have configuration to set a specific timezone if it's some feature affect from timezone. But no way provided from kube-state-metrics to set the specific timezone at the moment.
My suggestion is creating again the curator CronJob as workaround, because this issue depends last scheduled time of it, so you can clear the history as creating again.

Personally I think kube-state-metric team take a look this, such as the discussion you provided. I'm not familiar with kube-state-metrics, sorry for not help to you.

@sdodson
Copy link
Member

sdodson commented Mar 20, 2019

@brancz Can I get your opinion whether you think we should revert the changes that were made in the linked pull requests for openshift-ansible? We did this because people complained when the api/controllers/etcd processes moved from host services to static pods without access to /etc/localtime which meant their log timestamps were different from the rest of the system.

@bysnupy
Copy link
Member

bysnupy commented Mar 21, 2019

FYI @Reamer @brancz @sdodson

I've verified timezone influence for CronJob as follows.

In my conclusion, CronJob starting time depends on control plane(controller) timezone, not kube-state-metrics timezone.
But kube_cronjob_next_schedule_time value depends on kube-state-metrics timezone.
Look the test2 section, it's buggy.

  • test1>
    • api, controller, etcd timezone: UTC
    • kube-state-metrics: UTC
    • CronJob Schedule: 5 9 * * *
    • kube_cronjob_next_schedule_time:
    # TZ=UTC date -d @1553245500
    Fri Mar 22 09:05:00 UTC 2019

    # date -d @1553245500
    Fri Mar 22 18:05:00 JST 2019
  • test2> This pattern is buggy, look the next schedule time return as UTC timezone, even though CronJob is scheduled as JST. The time is same but timezone is different.
    • api, controller, etcd timezone: JST (UTC+9)
    • kube-state-metrics: UTC
    • CronJob Schedule: 50 18 * * *
    • kube_cronjob_next_schedule_time:
    # TZ=UTC date -d @1553194200
    Thu Mar 21 18:50:00 UTC 2019

    # date -d @1553194200
    Fri Mar 22 03:50:00 JST 2019
  • test3>
    • api, controller, etcd timezone: JST (UTC+9)
    • kube-state-metrics: JST (UTC+9)
    • CronJob Schedule: 0 19 * * *
    • kube_cronjob_next_schedule_time:
    # TZ=UTC date -d @1553248800
    Fri Mar 22 10:00:00 UTC 2019

    # date -d @1553248800
    Fri Mar 22 19:00:00 JST 2019
  • Refer the following testing evidences.

test1>

  # for ctr in $(oc get pod -o name -n kube-system); do echo "$ctr : $(oc rsh -n kube-system $ctr date)"; done
  pod/master-api-all.ocp311.example.com : Thu Mar 21 08:57:01 UTC 2019
  pod/master-controllers-all.ocp311.example.com : Thu Mar 21 08:57:02 UTC 2019
  pod/master-etcd-all.ocp311.example.com : Thu Mar 21 08:57:03 UTC 2019

  # oc rsh -n openshift-monitoring -c kube-state-metrics deployment/kube-state-metrics date
  Thu Mar 21 08:57:17 UTC 2019

  # oc create -f - <<EOF
  apiVersion: batch/v1beta1
  kind: CronJob
  metadata:
    name: testcronjob
  spec:
    jobTemplate:
      spec:
        template:
          spec:
            containers:
            - command:
              - date
              image: busybox
              imagePullPolicy: Always
              name: test
            restartPolicy: OnFailure
    schedule: '5 9 * * *'
    successfulJobsHistoryLimit: 3
    suspend: false
  EOF

  # date
  Thu Mar 21 18:03:40 JST 2019
  # TZ=UTC date
  Thu Mar 21 09:04:33 UTC 2019

  # oc describe cj testcronjob 
  Name:                       testcronjob
  Namespace:                  test
  Labels:                     <none>
  Annotations:                <none>
  Schedule:                   5 9 * * *
  ...
  Last Schedule Time:  Thu, 21 Mar 2019 18:05:00 +0900
  Active Jobs:         <none>
  Events:
    Type    Reason            Age   From                Message
    ----    ------            ----  ----                -------
    Normal  SuccessfulCreate  23s   cronjob-controller  Created job testcronjob-1553159100
    Normal  SawCompletedJob   3s    cronjob-controller  Saw completed job: testcronjob-1553159100


  # oc exec -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl -s \
            'http://localhost:9090/api/v1/query?query=kube_cronjob_next_schedule_time' | python -m json.tool
  {
      "data": {
          "result": [
              {
                  "metric": {
                      "__name__": "kube_cronjob_next_schedule_time",
                      "cronjob": "testcronjob",
                      "endpoint": "https-main",
                      "instance": "10.128.1.88:8443",
                      "job": "kube-state-metrics",
                      "namespace": "test",
                      "pod": "kube-state-metrics-75b9b8dcc4-wmkrm",
                      "service": "kube-state-metrics"
                  },
                  "value": [
                      1553159458.714,
                      "1553245500"
                  ]
              }
          ],
          "resultType": "vector"
      },
      "status": "success"
  }

  # TZ=UTC date -d @1553245500
  Fri Mar 22 09:05:00 UTC 2019

After changing UTC timezone to JST for only control plane.

test2>

  # for ctr in $(oc get pod -o name -n kube-system); do echo "$ctr : $(oc rsh -n kube-system $ctr date)"; done
  pod/master-api-all.ocp311.example.com : Thu Mar 21 18:42:43 JST 2019
  pod/master-controllers-all.ocp311.example.com : Thu Mar 21 18:42:47 JST 2019
  pod/master-etcd-all.ocp311.example.com : Thu Mar 21 18:42:49 JST 2019

  # oc rsh -n openshift-monitoring -c kube-state-metrics deployment/kube-state-metrics date
  Thu Mar 21 09:43:39 UTC 2019

  # date
  Thu Mar 21 18:44:13 JST 2019
  # TZ=UTC date
  Thu Mar 21 09:44:18 UTC 2019

  # oc edit cj/testcronjob
  ...
    schedule: 50 18 * * *
  ...

  # oc describe cj/testcronjob
  Name:                       testcronjob
  Namespace:                  test
  Labels:                     <none>
  Annotations:                <none>
  Schedule:                   50 18 * * *
  ...
  Last Schedule Time:  Thu, 21 Mar 2019 18:50:00 +0900
  Active Jobs:         <none>
  Events:
    Type    Reason            Age   From                Message
    ----    ------            ----  ----                -------
    Normal  SuccessfulCreate  45m   cronjob-controller  Created job testcronjob-1553159100
    Normal  SawCompletedJob   45m   cronjob-controller  Saw completed job: testcronjob-1553159100
    Normal  SuccessfulCreate  25s   cronjob-controller  Created job testcronjob-1553161800
    Normal  SawCompletedJob   5s    cronjob-controller  Saw completed job: testcronjob-1553161800

  # oc exec -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl -s \
             'http://localhost:9090/api/v1/query?query=kube_cronjob_next_schedule_time' | python -m json.tool
  {
      "data": {
          "result": [
              {
                  "metric": {
                      "__name__": "kube_cronjob_next_schedule_time",
                      "cronjob": "testcronjob",
                      "endpoint": "https-main",
                      "instance": "10.128.1.91:8443",
                      "job": "kube-state-metrics",
                      "namespace": "test",
                      "pod": "kube-state-metrics-75b9b8dcc4-wmkrm",
                      "service": "kube-state-metrics"
                  },
                  "value": [
                      1553161897.961,
                      "1553194200"
                  ]
              }
          ],
          "resultType": "vector"
      },
      "status": "success"
  }

  # TZ=UTC date -d @1553194200
  Thu Mar 21 18:50:00 UTC 2019

  # date -d @1553194200
  Fri Mar 22 03:50:00 JST 2019

After stop cluster-monitoring-operator and prometheus-operator, change the timezone to JST (UTC+9) for kube-state-metrics.

test3>

  # oc set env deployment/kube-state-metrics TZ=Asia/Tokyo -n openshift-monitoring
  deployment.extensions/kube-state-metrics updated

  # oc rsh -n openshift-monitoring -c kube-state-metrics deployment/kube-state-metrics date
  Thu Mar 21 18:57:44 JST 2019

  # oc edit cj/testcronjob
  ...
    schedule: 0 19 * * *
  ...

  # date
  Thu Mar 21 18:59:28 JST 2019
  # TZ=UTC date
  Thu Mar 21 09:59:34 UTC 2019

  # oc describe cj/testcronjob
  Name:                       testcronjob
  Namespace:                  test
  Labels:                     <none>
  Annotations:                <none>
  Schedule:                   0 19 * * *
  ...
  Last Schedule Time:  Thu, 21 Mar 2019 19:00:00 +0900
  Active Jobs:         <none>
  Events:
    Type    Reason            Age   From                Message
    ----    ------            ----  ----                -------
    Normal  SuccessfulCreate  55m   cronjob-controller  Created job testcronjob-1553159100
    Normal  SawCompletedJob   55m   cronjob-controller  Saw completed job: testcronjob-1553159100
    Normal  SuccessfulCreate  10m   cronjob-controller  Created job testcronjob-1553161800
    Normal  SawCompletedJob   10m   cronjob-controller  Saw completed job: testcronjob-1553161800
    Normal  SuccessfulCreate  23s   cronjob-controller  Created job testcronjob-1553162400
    Normal  SawCompletedJob   3s    cronjob-controller  Saw completed job: testcronjob-1553162400

  # oc exec -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl -s \
             'http://localhost:9090/api/v1/query?query=kube_cronjob_next_schedule_time' | python -m json.tool
  {
      "data": {
          "result": [
              {
                  "metric": {
                      "__name__": "kube_cronjob_next_schedule_time",
                      "cronjob": "testcronjob",
                      "endpoint": "https-main",
                      "instance": "10.128.1.120:8443",
                      "job": "kube-state-metrics",
                      "namespace": "test",
                      "pod": "kube-state-metrics-6484658f69-576sd",
                      "service": "kube-state-metrics"
                  },
                  "value": [
                      1553162486.08,
                      "1553248800"
                  ]
              }
          ],
          "resultType": "vector"
      },
      "status": "success"
  }

  # TZ=UTC date -d @1553248800
  Fri Mar 22 10:00:00 UTC 2019

  # date -d @1553248800
  Fri Mar 22 19:00:00 JST 2019

@ThoTischner
Copy link

ThoTischner commented Mar 21, 2019

We could implement a new ansible-playbook variable:
openshift_logging_kube_state_metrics_timezone: "Europe/Paris"
Default value is generated via facts on the master.

This value will be set as env var or config parameter.. on the cluster-monitoring-operator deployment.

If the cluster-monitoring-operator detects this variable / config value, it will add a TZ env var to the kube-state-metrics deploy.

The kube-state-metrics can now translate the kubernetes metrics timezone to UTC or export the time values with +x values for its timezone so that prometheus can convert it to UTC.

@sdodson
Copy link
Member

sdodson commented Mar 21, 2019

I think it'd be much better if everything were UTC than having components respecting different timezones.

@bysnupy
Copy link
Member

bysnupy commented Mar 21, 2019

Personally I think one timezone is ideal situation on all the system. But real world is consist of various timezone, and most people take each region timezone for granted. If the system running as host process, then it's not problem, because the process is always running on the host timezone. If the system is based on container manner, then it isolated from host configuration. First of all, we should set a policy about running container manner. For instance, UTC is only available timezone against all system. Or prepare the way to control over timezone on the all system. We should consider the opinions of various sections specialist about this, it can lead to best result.

@ThoTischner
Copy link

I think it'd be much better if everything were UTC than having components respecting different timezones.

Than we can not configure the cronjob schedule time in our timezone?

@Reamer
Copy link
Contributor Author

Reamer commented Mar 21, 2019

I think it'd be much better if everything were UTC than having components respecting different timezones.

Than we can not configure the cronjob schedule time in our timezone?

No you can't - think global

@ThoTischner
Copy link

Anyway how we proceed with this issue? Kube Metrics time and all dependent alarms are odd.

@brancz
Copy link
Contributor

brancz commented Apr 1, 2019

(sorry I was on vacation until just now)

In general, monitoring is always done against UTC only, for all the reasons already laid out in this thread. I'm also for UTC always and everywhere, it's a widely used best practice in SRE.

@bysnupy
Copy link
Member

bysnupy commented Apr 3, 2019

In this thread, UTC is better timezone than each local one based on your opinions. If UTC becomes standard on the OpenShift clusters, then personally I think we should define this clearly as documentation. I want to suppress confusing around timezone, such as cronjob scheduling time, each pod logs timestamp and so on. Could I get your thought ? @sdodson @brancz

@brancz
Copy link
Contributor

brancz commented Apr 3, 2019

I feel like this should be brought up on a broader level (probably at least on aos-devel), but yes I agree with this.

@jkroepke
Copy link

jkroepke commented Apr 4, 2019

In 2019 a timezone should be not an issue. What the problem to set the TZ environment variable? It could be done by a ansible fact like described above.

From my side, it should be document that the timezone must be unique across the whole cluster. But the timezone should be managed by the user.

You might be get an ideal solution but it is not a real world solution.

Openshift is the enterprise version of kubernetes. Its mainly using inside onpremise datacenter. Supporting only UTC is bogus and breaks a lot IT process in (german) datacenters.

The worst case would be that RedHat official supports UTC only.

@eparis
Copy link
Member

eparis commented Apr 18, 2019

my opinion, for 3.x we should hostmount /etc/localtime into the kube-state-metrics container, just like we do with the api and etcd containers.

for 4.x we should use UTC everywhere. We should not continue down this path.

@brancz
Copy link
Contributor

brancz commented May 10, 2019

Opened #353 with the approach outlined by Eric.

@Reamer
Copy link
Contributor Author

Reamer commented May 13, 2019

Hi @brancz,
just update cluster-monitoring-operator. Your change works. Thank you.
Screenshot_2019-05-13 Prometheus Time Series Collection and Processing Server

@Reamer Reamer closed this as completed May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants