Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Utilization - Memory not showing / 0 bytes #13023

Closed
EtienneBruines opened this issue May 16, 2022 · 14 comments · Fixed by #13670
Closed

Resource Utilization - Memory not showing / 0 bytes #13023

EtienneBruines opened this issue May 16, 2022 · 14 comments · Fixed by #13670

Comments

@EtienneBruines
Copy link
Contributor

EtienneBruines commented May 16, 2022

Nomad version

Nomad v1.3.0 (52e95d6)

Operating system and Environment details

Ubuntu 22.04 LTS running cgroups2 and using the docker driver for the jobs.

Issue

The Memory usage is not showing for any allocation or task. This seems to be a UI bug.

Reproduction steps

Run any task in docker and look at the graphs.

Expected Result

To see memory usage.

Actual Result

Screenshot_20220516_083142

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Requests / Responses

There was a HTTP request to /v1/client/allocation/968a938f-e05e-dbe3-216f-e56988f26ab4/stats which returned the following:

{
	"ResourceUsage": {
		"MemoryStats": {
			"RSS": 0,
			"Cache": 0,
			"Swap": 0,
			"MappedFile": 0,
			"Usage": 265932800,
			"MaxUsage": 0,
			"KernelUsage": 0,
			"KernelMaxUsage": 0,
			"Measured": [
				"Cache",
				"Swap",
				"Usage"
			]
		},
		"CpuStats": {
			"SystemMode": 360.39809863339275,
			"UserMode": 1214.3493761140821,
			"TotalTicks": 1.2197442310137645,
			"ThrottledPeriods": 0,
			"ThrottledTime": 0,
			"Percent": 0.04288833442383138,
			"Measured": [
				"Throttled Periods",
				"Throttled Time",
				"Percent"
			]
		},
		"DeviceStats": []
	},
	"Tasks": {
		"filebeat": {
			"ResourceUsage": {
				"MemoryStats": {
					"RSS": 0,
					"Cache": 0,
					"Swap": 0,
					"MappedFile": 0,
					"Usage": 72101888,
					"MaxUsage": 0,
					"KernelUsage": 0,
					"KernelMaxUsage": 0,
					"Measured": [
						"Cache",
						"Swap",
						"Usage"
					]
				},
				"CpuStats": {
					"SystemMode": 109.09090909090908,
					"UserMode": 254.54545454545453,
					"TotalTicks": 0.030670588235294116,
					"ThrottledPeriods": 0,
					"ThrottledTime": 0,
					"Percent": 0.0010784313725490195,
					"Measured": [
						"Throttled Periods",
						"Throttled Time",
						"Percent"
					]
				},
				"DeviceStats": null
			},
			"Timestamp": 1652682809079671600,
			"Pids": null
		},
		"metricbeat": {
			"ResourceUsage": {
				"MemoryStats": {
					"RSS": 0,
					"Cache": 0,
					"Swap": 0,
					"MappedFile": 0,
					"Usage": 72835072,
					"MaxUsage": 0,
					"KernelUsage": 0,
					"KernelMaxUsage": 0,
					"Measured": [
						"Cache",
						"Swap",
						"Usage"
					]
				},
				"CpuStats": {
					"SystemMode": 0,
					"UserMode": 433.3333333333333,
					"TotalTicks": 0.033541031941031946,
					"ThrottledPeriods": 0,
					"ThrottledTime": 0,
					"Percent": 0.0011793611793611794,
					"Measured": [
						"Throttled Periods",
						"Throttled Time",
						"Percent"
					]
				},
				"DeviceStats": null
			},
			"Timestamp": 1652682809070195200,
			"Pids": null
		},
		"connect-proxy-mossaino-http-static-production": {
			"ResourceUsage": {
				"MemoryStats": {
					"RSS": 0,
					"Cache": 0,
					"Swap": 0,
					"MappedFile": 0,
					"Usage": 22339584,
					"MaxUsage": 0,
					"KernelUsage": 0,
					"KernelMaxUsage": 0,
					"Measured": [
						"Cache",
						"Swap",
						"Usage"
					]
				},
				"CpuStats": {
					"SystemMode": 206.86274509803923,
					"UserMode": 193.13725490196077,
					"TotalTicks": 1.1050971428571428,
					"ThrottledPeriods": 0,
					"ThrottledTime": 0,
					"Percent": 0.038857142857142854,
					"Measured": [
						"Throttled Periods",
						"Throttled Time",
						"Percent"
					]
				},
				"DeviceStats": null
			},
			"Timestamp": 1652682808052281300,
			"Pids": null
		},
		"webserver": {
			"ResourceUsage": {
				"MemoryStats": {
					"RSS": 0,
					"Cache": 0,
					"Swap": 0,
					"MappedFile": 0,
					"Usage": 98656256,
					"MaxUsage": 0,
					"KernelUsage": 0,
					"KernelMaxUsage": 0,
					"Measured": [
						"Cache",
						"Swap",
						"Usage"
					]
				},
				"CpuStats": {
					"SystemMode": 44.44444444444444,
					"UserMode": 333.33333333333337,
					"TotalTicks": 0.050435467980295565,
					"ThrottledPeriods": 0,
					"ThrottledTime": 0,
					"Percent": 0.001773399014778325,
					"Measured": [
						"Throttled Periods",
						"Throttled Time",
						"Percent"
					]
				},
				"DeviceStats": null
			},
			"Timestamp": 1652682809059644000,
			"Pids": null
		}
	},
       "Timestamp": 1652682809079671600
}

This is the same for all such requests - they all return pretty much 0 for everything except for Usage (which does have a value).

Seems like the API response is correct (it has a Usage value). Unsure if MaxUsage is supposed to be 0?

The docker stats command also returns correct values, i.e.:

CONTAINER ID   NAME                                             CPU %     MEM USAGE / LIMIT   MEM %     NET I/O   BLOCK I/O         PIDS
14bafbce6e20   webserver-0d566119-7216-037f-e930-47995efd8036   0.84%     115.2MiB / 1GiB     11.25%    0B / 0B   3.05MB / 3.84MB   34

Gathering the information through nomad alloc status returns correct values as well:

Task "webserver" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/500 MHz  116 MiB/1.0 GiB  300 MiB  

When including -stats it even shows:


Memory Stats
Cache  Swap  Usage
0 B    0 B   116 MiB

CPU Stats
Percent  Throttled Periods  Throttled Time
1.39%    0                  0
@shoenig
Copy link
Member

shoenig commented May 16, 2022

Hi @EtienneBruines thanks for reporting, it does look like there may be a UI bug here.

I do want to point out, starting in cgroups v2 the memory metrics reported by the kernel are more limited than they were with cgroups v1. This response object shows the difference:

"MemoryStats": {
	"RSS": 0,
	"Cache": 0,
	"Swap": 0,
	"MappedFile": 0,
	"Usage": 98656256,
	"MaxUsage": 0,
	"KernelUsage": 0,
	"KernelMaxUsage": 0,
	"Measured": [
		"Cache",
		"Swap",
		"Usage"
	]
},

That Measured list is indicating what information is available. And there's a bit more context in #10251

@shoenig shoenig added this to Needs Triage in Nomad - Community Issues Triage via automation May 16, 2022
@Lord-Y
Copy link

Lord-Y commented Jun 1, 2022

Any news about this issue?

@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jun 6, 2022
@denogio
Copy link

denogio commented Jun 7, 2022

Experiencing this issue as well on Ubuntu 22.04

@jeremy-ke
Copy link

Also running into this issue with nomad v1.3.1 on Debian 11 amd64

@philrenaud philrenaud self-assigned this Jun 23, 2022
@jhillyerd
Copy link

There is some add'l info in the now closed #12088 (and folks are still pinging that bug) -- I'll just add that I am seeing this in NixOS 21.11 and 22.05, which are cgroups v2.

@Rumbles
Copy link

Rumbles commented Jul 27, 2022

I have just been testing to see if the latest version of nomad (1.3.2) fixes this issue as we first discovered it when flatcar upgraded to cgroupsv2 late last year. Currently we are using flatcar 3227.2.0 with cgroupsv2 enabled and we get no memory usage reported on tasks that run as a container.

Switching back to cgroupsv1 and memory is reported correctly again. I have seen some release notes (for 1.3.0) that added cgroups v2 support for nomad client, but if the UI doesn't show the memory usage I would rather stick with cgroups v1 for the time being.

When we initially rolled out the flatcar release using cgroupsv2, we found that some of our applications would crash with OOM issues, I am not certain if that is still happening, but the only other way we could tell if there was likely to be a memory issue was from checking memory usage reported in the UI.

I would question whether this is this definitely just a UI bug? Or is nomad still not able to track memory usage on hosts running cgroupsv2?

I can see that with nomad alloc status that the memory does report correctly

@shoenig shoenig added this to the 1.3.3 milestone Aug 2, 2022
@shoenig
Copy link
Member

shoenig commented Aug 3, 2022

@philrenaud just FYI I took second look at this and updated #13670 with a slightly better fix, but I dunno why the test is failing or how to dig into it

@Rumbles
Copy link

Rumbles commented Aug 8, 2022

Hi @shoenig I don't think this is resolved, I just deployed a machine with 1.3.3 installed as a client and the server is also updated to 1.3.3, I updated my OS config to use cgroupsv2, and while I can see memory usage in nomad cli:

core@ip-10-54-2-59 ~ $ nomad alloc status 786b2983-0490-e7ee-dc1f-fa4ab2d312e0
ID                  = 786b2983-0490-e7ee-dc1f-fa4ab2d312e0
Eval ID             = 4939dad3
Name                = vector-logs.vector-logs[0]
Node ID             = e4bfb624
Node Name           = ip-10-54-2-141
Job ID              = vector-logs
Job Version         = 376
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 20m7s ago
Modified            = 20m ago

Allocation Addresses
Label           Dynamic  Address
*vector_export  yes      10.54.2.141:9598 -> 9598

Task "vector-logs" is "running"
Task Resources
CPU         Memory          Disk     Addresses
42/100 MHz  82 MiB/600 MiB  300 MiB  

Task Events:
Started At     = 2022-08-08T09:37:04Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2022-08-08T09:37:04Z  Started     Task started by client
2022-08-08T09:36:58Z  Driver      Downloading image
2022-08-08T09:36:58Z  Task Setup  Building Task Directory
2022-08-08T09:36:58Z  Received    Task received by client

This is still not reflected in the UI:

Screenshot 2022-08-08 at 11 59 51

@EtienneBruines
Copy link
Contributor Author

@Rumbles could you check the Devtools in the browser, to get the response to the call to /v1/client/allocation/... ?

Upgrading the nomad server 1.3.3 did fix the issue for me, so I'd be curious as to the values returned in that API call. (That should help determine whether it's a display issue or whether it's elsewhere.)

@Rumbles
Copy link

Rumbles commented Aug 8, 2022

I don't see where to check this in Web Developer Tools sorry @EtienneBruines:

could you check the Devtools in the browser, to get the response to the call to /v1/client/allocation/... ?

@EtienneBruines
Copy link
Contributor Author

  • Rightclick the webpage, and click Inspect (sometimes F12 also works)
  • Open the Network tab
  • Find something that is called stats, and click it
  • Then go to the Response tab - here's the JSON that would be interesting.

Screenshot_20220808_121511
Screenshot_20220808_121645

@Rumbles
Copy link

Rumbles commented Aug 8, 2022

Never mind, I tried in another browser, it was working there, I went back to my starting browser and it was working... I guess I wasn't patient enough :/

thanks for fixing this!

@mongrelion
Copy link

Sorry for hijacking the thread but I'm having the same issue.
Here's a sample output from the response from the server

{
  "ResourceUsage": {
    "CpuStats": {
      "Measured": [
        "Throttled Periods",
        "Throttled Time",
        "Percent"
      ],
      "Percent": 0.10362745098039215,
      "SystemMode": 4.541154210028382,
      "ThrottledPeriods": 0,
      "ThrottledTime": 0,
      "TotalTicks": 5.025931372549019,
      "UserMode": 1596.9725638599812
    },
    "DeviceStats": [],
    "MemoryStats": {
      "Cache": 0,
      "KernelMaxUsage": 0,
      "KernelUsage": 0,
      "MappedFile": 0,
      "MaxUsage": 0,
      "Measured": [
        "Cache",
        "Swap",
        "Usage"
      ],
      "RSS": 0,
      "Swap": 0,
      "Usage": 292315136
    }
  },
  "Tasks": {
    "foo": {
      "Pids": null,
      "ResourceUsage": {
        "CpuStats": {
          "Measured": [
            "Throttled Periods",
            "Throttled Time",
            "Percent"
          ],
          "Percent": 0.10362745098039215,
          "SystemMode": 4.541154210028382,
          "ThrottledPeriods": 0,
          "ThrottledTime": 0,
          "TotalTicks": 5.025931372549019,
          "UserMode": 1596.9725638599812
        },
        "DeviceStats": null,
        "MemoryStats": {
          "Cache": 0,
          "KernelMaxUsage": 0,
          "KernelUsage": 0,
          "MappedFile": 0,
          "MaxUsage": 0,
          "Measured": [
            "Cache",
            "Swap",
            "Usage"
          ],
          "RSS": 0,
          "Swap": 0,
          "Usage": 292315136
        }
      },
      "Timestamp": 1660560913426673700
    }
  },
  "Timestamp": 1660560913426673700
}

Nomad server version: 1.3.3
Nomad client version: 1.3.3

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

9 participants