-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI volume allocation count not updated properly in UI #9495
Comments
Hi @apollo13! Thanks for opening this and we'll take a look. |
I don’t know how to run CSI locally 🙃 but my guess would be that the API is returning different values from ETA: specifically for |
Using the test rig I describe below, I can reproduce the exact behavior @apollo13 is reporting on current master. On 1.0.0-beta2 or 0.12.x, the volume list page always shows a count of 0. So it looks like the fix for #9215 (PR #9377) dealt with the volume detail page correctly but didn't fix the list page. The confusing thing to me is that there's some stateful behavior here:
Because my test has two volumes, I see some additional symptoms:
So the volume list page is only showing a count when accessed via the breadcrumb from the volume detail page. The API responses reflect what we'd expect. The list page API doesn't include a total count and neither the So the confusing bit for me is not that we have a total count of 0 in the list view (which is easy enough to fix by adding Results of [
{
"ID": "test-volume0",
"Namespace": "default",
"Name": "test-volume0",
"ExternalID": "631ef179-356d-11eb-a328-0242ac110002",
"Topologies": [],
"AccessMode": "single-node-writer",
"AttachmentMode": "file-system",
"CurrentReaders": 2,
"CurrentWriters": 0,
"Schedulable": true,
"PluginID": "hostpath-plugin0",
"Provider": "csi-hostpath",
"ControllersHealthy": 0,
"ControllersExpected": 0,
"NodesHealthy": 1,
"NodesExpected": 1,
"CreateIndex": 16,
"ModifyIndex": 23
},
{
"ID": "test-volume1",
"Namespace": "default",
"Name": "test-volume1",
"ExternalID": "632709a7-356d-11eb-a328-0242ac110002",
"Topologies": [],
"AccessMode": "single-node-writer",
"AttachmentMode": "file-system",
"CurrentReaders": 2,
"CurrentWriters": 0,
"Schedulable": true,
"PluginID": "hostpath-plugin0",
"Provider": "csi-hostpath",
"ControllersHealthy": 0,
"ControllersExpected": 0,
"NodesHealthy": 1,
"NodesExpected": 1,
"CreateIndex": 17,
"ModifyIndex": 24
}
] Results of {
"AccessMode": "single-node-writer",
"Allocations": [
{
"ClientDescription": "Tasks are running",
"ClientStatus": "running",
"CreateIndex": 19,
"CreateTime": 1607002949687635500,
"DeploymentStatus": {
"Canary": false,
"Healthy": true,
"ModifyIndex": 32,
"Timestamp": "2020-12-03T13:42:40.465063115Z"
},
"DesiredDescription": "",
"DesiredStatus": "run",
"EvalID": "cdfbd5f5-0907-aca6-da49-19a50a7402b4",
"FollowupEvalID": "",
"ID": "79b30d92-0055-2086-835f-955ecfd7abc1",
"JobID": "example",
"JobType": "service",
"JobVersion": 0,
"ModifyIndex": 32,
"ModifyTime": 1607002960587864800,
"Name": "example.cache[0]",
"Namespace": "default",
"NodeID": "eb894f13-d782-6d1e-fdb4-1088db394fba",
"NodeName": "devmode",
"PreemptedAllocations": null,
"PreemptedByAllocation": "",
"RescheduleTracker": null,
"TaskGroup": "cache",
"TaskStates": {
"redis": {
"Events": [
{
"Details": {},
"DiskLimit": 0,
"DiskSize": 0,
"DisplayMessage": "Task received by client",
"DownloadError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"FailedSibling": "",
"FailsTask": false,
"GenericSource": "",
"KillError": "",
"KillReason": "",
"KillTimeout": 0,
"Message": "",
"RestartReason": "",
"SetupError": "",
"Signal": 0,
"StartDelay": 0,
"TaskSignal": "",
"TaskSignalReason": "",
"Time": 1607002949690282500,
"Type": "Received",
"ValidationError": "",
"VaultError": ""
},
{
"Details": {
"message": "Building Task Directory"
},
"DiskLimit": 0,
"DiskSize": 0,
"DisplayMessage": "Building Task Directory",
"DownloadError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"FailedSibling": "",
"FailsTask": false,
"GenericSource": "",
"KillError": "",
"KillReason": "",
"KillTimeout": 0,
"Message": "Building Task Directory",
"RestartReason": "",
"SetupError": "",
"Signal": 0,
"StartDelay": 0,
"TaskSignal": "",
"TaskSignalReason": "",
"Time": 1607002949711526100,
"Type": "Task Setup",
"ValidationError": "",
"VaultError": ""
},
{
"Details": {},
"DiskLimit": 0,
"DiskSize": 0,
"DisplayMessage": "Task started by client",
"DownloadError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"FailedSibling": "",
"FailsTask": false,
"GenericSource": "",
"KillError": "",
"KillReason": "",
"KillTimeout": 0,
"Message": "",
"RestartReason": "",
"SetupError": "",
"Signal": 0,
"StartDelay": 0,
"TaskSignal": "",
"TaskSignalReason": "",
"Time": 1607002950464377300,
"Type": "Started",
"ValidationError": "",
"VaultError": ""
}
],
"Failed": false,
"FinishedAt": null,
"LastRestart": null,
"Restarts": 0,
"StartedAt": "2020-12-03T13:42:30.464381504Z",
"State": "running"
}
}
},
{
"ClientDescription": "Tasks are running",
"ClientStatus": "running",
"CreateIndex": 19,
"CreateTime": 1607002949687635500,
"DeploymentStatus": {
"Canary": false,
"Healthy": true,
"ModifyIndex": 32,
"Timestamp": "2020-12-03T13:42:40.436082297Z"
},
"DesiredDescription": "",
"DesiredStatus": "run",
"EvalID": "cdfbd5f5-0907-aca6-da49-19a50a7402b4",
"FollowupEvalID": "",
"ID": "9f7932c3-4193-f6a5-bb81-c1150c39c3d5",
"JobID": "example",
"JobType": "service",
"JobVersion": 0,
"ModifyIndex": 32,
"ModifyTime": 1607002960587864800,
"Name": "example.cache[1]",
"Namespace": "default",
"NodeID": "eb894f13-d782-6d1e-fdb4-1088db394fba",
"NodeName": "devmode",
"PreemptedAllocations": null,
"PreemptedByAllocation": "",
"RescheduleTracker": null,
"TaskGroup": "cache",
"TaskStates": {
"redis": {
"Events": [
{
"Details": {},
"DiskLimit": 0,
"DiskSize": 0,
"DisplayMessage": "Task received by client",
"DownloadError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"FailedSibling": "",
"FailsTask": false,
"GenericSource": "",
"KillError": "",
"KillReason": "",
"KillTimeout": 0,
"Message": "",
"RestartReason": "",
"SetupError": "",
"Signal": 0,
"StartDelay": 0,
"TaskSignal": "",
"TaskSignalReason": "",
"Time": 1607002949689950500,
"Type": "Received",
"ValidationError": "",
"VaultError": ""
},
{
"Details": {
"message": "Building Task Directory"
},
"DiskLimit": 0,
"DiskSize": 0,
"DisplayMessage": "Building Task Directory",
"DownloadError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"FailedSibling": "",
"FailsTask": false,
"GenericSource": "",
"KillError": "",
"KillReason": "",
"KillTimeout": 0,
"Message": "Building Task Directory",
"RestartReason": "",
"SetupError": "",
"Signal": 0,
"StartDelay": 0,
"TaskSignal": "",
"TaskSignalReason": "",
"Time": 1607002949712533000,
"Type": "Task Setup",
"ValidationError": "",
"VaultError": ""
},
{
"Details": {},
"DiskLimit": 0,
"DiskSize": 0,
"DisplayMessage": "Task started by client",
"DownloadError": "",
"DriverError": "",
"DriverMessage": "",
"ExitCode": 0,
"FailedSibling": "",
"FailsTask": false,
"GenericSource": "",
"KillError": "",
"KillReason": "",
"KillTimeout": 0,
"Message": "",
"RestartReason": "",
"SetupError": "",
"Signal": 0,
"StartDelay": 0,
"TaskSignal": "",
"TaskSignalReason": "",
"Time": 1607002950435578600,
"Type": "Started",
"ValidationError": "",
"VaultError": ""
}
],
"Failed": false,
"FinishedAt": null,
"LastRestart": null,
"Restarts": 0,
"StartedAt": "2020-12-03T13:42:30.435582395Z",
"State": "running"
}
}
}
],
"AttachmentMode": "file-system",
"Context": {},
"ControllerRequired": false,
"ControllersExpected": 0,
"ControllersHealthy": 0,
"CreateIndex": 16,
"ExternalID": "631ef179-356d-11eb-a328-0242ac110002",
"ID": "test-volume0",
"ModifyIndex": 23,
"MountOptions": null,
"Name": "test-volume0",
"Namespace": "default",
"NodesExpected": 1,
"NodesHealthy": 1,
"Parameters": {},
"PluginID": "hostpath-plugin0",
"Provider": "csi-hostpath",
"ProviderVersion": "v1.2.0-0-g83590990",
"ReadAllocs": {
"79b30d92-0055-2086-835f-955ecfd7abc1": null,
"9f7932c3-4193-f6a5-bb81-c1150c39c3d5": null
},
"ResourceExhausted": null,
"Schedulable": true,
"Secrets": null,
"Topologies": [],
"WriteAllocs": {}
} My test environment for this is Nomad running in dev mode on Linux, with test.sh#!/bin/bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
nomad job run "${DIR}/plugin.nomad"
while :
do
nomad plugin status hostpath | grep "Nodes Healthy = 1" && break
sleep 2
done
nomad plugin status hostpath
# create the volume in the "external provider"
VOLUME_NAME=test-volume
# non-dev mode
#CSI_ENDPOINT="/var/nomad/client/csi/monolith/hostpath-plugin0/csi.sock"
# dev mode path is going to be in a tempdir
PLUGIN_DOCKER_ID=$(docker ps | grep hostpath | awk -F' +' '{print $1}')
CSI_ENDPOINT=$(docker inspect $PLUGIN_DOCKER_ID | jq -r '.[0].Mounts[] | select(.Destination == "/csi") | .Source')/csi.sock
echo "creating volume..."
UUID=$(sudo csc --endpoint "$CSI_ENDPOINT" controller create-volume "${VOLUME_NAME}0" --cap 1,2,ext4 | grep -o '".*"' | tr -d '"')
echo "registering volume $UUID..."
sed -e "s/EXTERNAL_ID/$UUID/" \
-e "s/VOLUME_NAME/${VOLUME_NAME}0/" \
"${DIR}/volume.hcl" | nomad volume register -
echo "creating second volume..."
UUID=$(sudo csc --endpoint "$CSI_ENDPOINT" controller create-volume "${VOLUME_NAME}1" --cap 1,2,ext4 | grep -o '".*"' | tr -d '"')
echo "registering volume $UUID..."
sed -e "s/EXTERNAL_ID/$UUID/" \
-e "s/VOLUME_NAME/${VOLUME_NAME}1/" \
"${DIR}/volume.hcl" | nomad volume register -
echo "claiming volume..."
nomad job run "${DIR}/example.nomad" plugin.nomadjob "csi-plugin" {
datacenters = ["dc1"]
spread {
attribute = "${node.unique.id}"
}
group "csi" {
constraint {
operator = "distinct_hosts"
value = true
}
task "plugin" {
driver = "docker"
config {
image = "quay.io/k8scsi/hostpathplugin:v1.2.0"
args = [
"--drivername=csi-hostpath",
"--v=5",
"--endpoint=unix://csi/csi.sock",
"--nodeid=node-${NOMAD_ALLOC_INDEX}",
]
privileged = true
}
meta {
vers = "1"
}
csi_plugin {
id = "hostpath-plugin0"
#type = "monolith"
type = "node"
mount_dir = "/csi"
}
resources {
cpu = 256
memory = 128
}
}
}
} example.nomadvariables {
volume_ids = ["volume0", "volume1"]
}
job "example" {
datacenters = ["dc1"]
group "cache" {
count = 2
dynamic "volume" {
for_each = var.volume_ids
labels = [volume.value]
content {
type = "csi"
source = "test-${volume.value}"
read_only = true
}
}
network {
port "db" {
to = 6379
}
}
task "redis" {
driver = "docker"
config {
image = "redis:3.2"
ports = ["db"]
}
dynamic "volume_mount" {
for_each = var.volume_ids
content {
volume = "${volume_mount.value}"
destination = "${NOMAD_ALLOC_DIR}/${volume_mount.value}"
}
}
resources {
cpu = 500
memory = 256
}
}
}
}
volume.hclid = "VOLUME_NAME"
name = "VOLUME_NAME"
type = "csi"
external_id = "EXTERNAL_ID"
plugin_id = "hostpath-plugin0"
access_mode = "single-node-writer"
attachment_mode = "file-system" |
I’ll look more into this but there is a volumes serialiser
To accurately reflect the API, I’m not sure if the tests actually make assertions on this, I’ll look into that. |
There is indeed an assertion of the allocation count. As for why fetching the collection after fetching an item from it clears the loaded allocations, I believe it’s these lines in the serialiser. Ember Data is a “single source of truth” so when the collection is loaded and no allocations are included, the relationships are deleted. It might be possible to avoid this by bypassing extraction relationships when the Is it prohibitive to include |
We already have the fields For example: [
{
"ID": "test-volume0",
"Namespace": "default",
"Name": "test-volume0",
"ExternalID": "631ef179-356d-11eb-a328-0242ac110002",
"Topologies": [],
"AccessMode": "single-node-writer",
"AttachmentMode": "file-system",
"CurrentReaders": 2,
"CurrentWriters": 0,
"Schedulable": true,
"PluginID": "hostpath-plugin0",
"Provider": "csi-hostpath",
"ControllersHealthy": 0,
"ControllersExpected": 0,
"NodesHealthy": 1,
"NodesExpected": 1,
"CreateIndex": 16,
"ModifyIndex": 23
}
] |
oh, interesting, so |
Yup! Sorry, I could have been more clear about that! 😀 ( |
The API doesn’t return the ReadAllocs and WriteAllocs for the index query, as discussed here: #9495 (comment) So this removes that, which will cause the allocation count in the table to be 0, as in the bug report.
Looks like we missed the merge window for rc2 on this one so if there's no rc3 this will land in 1.0.1. |
Correction: this did land in the merge window for 1.0 GA! Changelog entry added in #9563 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
This is a follow up to #9215. The volume tab after starting a task looks like this (don't click anywhere before), just open the UI and click
Storage
:Now when I click on the
wiki
link it takes me to the volume details where it properly shows the write allocation:And if I then click on
Volumes
(in the breadcrumbs) to get back to the initial page the count is magically updated:But if I now click on
Storage
in the left sidebar the alloc count drops back to zero.Nomad version is: 4d6a166
The text was updated successfully, but these errors were encountered: