Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI volume created using nomad_external_volume resource seems to be missing mount_flags #260

Closed
CarbonCollins opened this issue Jan 29, 2022 · 11 comments · Fixed by #266
Closed

Comments

@CarbonCollins
Copy link

Hi there,

Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

Terraform Version

Terraform v1.1.3
on linux_amd64
+ provider registry.terraform.io/hashicorp/nomad v1.4.15
+ provider registry.terraform.io/hashicorp/vault v2.24.1

Nomad Version

1.2.4

Provider Configuration

Which values are you setting in the provider configuration?

provider "nomad" {}
provider "vault" {}

Environment Variables

Do you have any Nomad specific environment variable set in the machine running Terraform?

NOMAD_REGION=global
NOMAD_TOKEN=[redacted]
NOMAD_ADDR=https://[redacted]

Affected Resource(s)

Please list the resources as a list, for example:

  • nomad_external_volume

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

locals {
  plugin_id = "axion-proxima"
  volume_id = "plex-data"
}

data "nomad_plugin" "axion_proxima" {
  plugin_id        = local.plugin_id
  wait_for_healthy = false
}

data "vault_generic_secret" "axion_proxima_auth" {
  path = var.axion_proxima_auth_secret_path
}

resource "nomad_external_volume" "plex_data" {
  depends_on  = [data.nomad_plugin.axion_proxima]

  type        = "csi"
  namespace   = "default"
  plugin_id   = local.plugin_id

  name        = local.volume_id
  volume_id   = local.volume_id

  capacity_min = "1 GiB"
  capacity_max = "20 GiB"

  capability {
    access_mode     = "single-node-reader-only"
    attachment_mode = "file-system"
  }

  capability {
    access_mode     = "single-node-writer"
    attachment_mode = "file-system"
  }

  mount_options {
    fs_type = "cifs"
    mount_flags = [
        "vers=3",
        format("uid=%s", var.user_id),
        format("gid=%s", var.group_id),
        "nolock",
        format("username=%s", data.vault_generic_secret.axion_proxima_auth.data["USER"]),
        format("password=%s", data.vault_generic_secret.axion_proxima_auth.data["PASS"])
    ]
  }
}

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

Panic Output

none

Expected Behavior

What should have happened?
CSI volume is created in nomad and when an allocation starts it should be using the mount_flags specified in the resource to mount the cifs share.

Actual Behavior

What actually happened?
CSI volume is created in nomad and when an allocation starts it uses a default set of mount_flags from the CSI driver (democratic-csi) when mounting the cifs share.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply
  2. nomad job run nomad.hcl (with job file that makes use of csi volume in this case I am using linuxserver/plex:latest)

Important Factoids

  • If I create the same CSI Volume using the Nomad CLI or API the allocation starts up correctly and the correct mount_flags are passed to the cifs mount, I only observe this issue when creating the CSI Volume within terraform deployments.
  • I do have ACL enabled within Nomad, however, due to the previous factoid I have ruled out an ACL issue as I tested with the same token for all 3 methods (Nomad CLI, Nomad API, Terraform Resource)

References

None

@tgross
Copy link
Member

tgross commented Jan 31, 2022

Hi @CarbonCollins!

CSI volume is created in nomad and when an allocation starts it should be using the mount_flags specified in the resource to mount the cifs share.

The mount_flags set by volume registration are only used to validate the volume. To get the mount flags on the allocation, you'll want to set them on the volume block of the jobspec.

@CarbonCollins
Copy link
Author

CarbonCollins commented Jan 31, 2022

If this is the case how does creating the volume from the CLI/API direct to nomad differ? If I do it using these methods I do not need to provide the mount_flags in the job spec as they are defined already (The job starts correctly and looking at the CSI controller logs I can see the mount_flags present, I also don't provide any mount_flags or mount_options in the spec file)

The docs for the Nomad volume stanza also specifies Options for mounting CSI volumes that have the file-system attachment mode. These options override the mount_options field from volume registration. (https://www.nomadproject.io/docs/job-specification/volume#mount_options)

Though i'm not sure how up to date/in sync the docs are as the terraform provider docs say Options for mounting block-device volumes without a pre-formatted file system. (https://registry.terraform.io/providers/hashicorp/nomad/latest/docs/resources/external_volume) where one says file-system and the other says block-device.

Though to be fair I might be assuming that the terraform provider would be using the Nomad API...

@tgross
Copy link
Member

tgross commented Feb 2, 2022

If this is the case how does creating the volume from the CLI/API direct to nomad differ? If I do it using these methods I do not need to provide the mount_flags in the job spec as they are defined already (The job starts correctly and looking at the CSI controller logs I can see the mount_flags present, I also don't provide any mount_flags or mount_options in the spec file)

You can only register a volume with Terraform's nomad_volume resource, not create it. But see hashicorp/nomad#11899 for how the API might change in a way that'd improve that story.

You can include the mount_flags in the nomad_volume resource and they'll be used for validating registration, just as they would with the nomad volume register command. But the mount_flags on the jobspec's volume block will determine how the volume is actually mounted. It might be that the mount_flags are getting set to defaults when we run the jobspec via TF. I can take a look at that.

Though to be fair I might be assuming that the terraform provider would be using the Nomad API...

It does, but the CSI API for Nomad is still in beta and has changed fairly quickly. I think the Terraform docs might be stale here.

@tgross tgross self-assigned this Feb 2, 2022
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Feb 2, 2022
@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Feb 2, 2022
@tgross tgross added the type/bug label Feb 2, 2022
@CarbonCollins
Copy link
Author

I'm using the nomad_external_volume resource not the nomad_volume.

Though it is good to see the potential API changes with the nomad_volume one :D

@tgross
Copy link
Member

tgross commented Feb 2, 2022

Oh that's embarrassing... I didn't realize there was a second resource for that (can you tell it's been a while since I've looked at this?) 😊 The rest of my comments should apply though, so I'll verify that.

@CarbonCollins
Copy link
Author

CarbonCollins commented Feb 5, 2022

To be honest I had to look at the provider source code to figure out what the difference between the nomad_external_volume and nomad_volume when I was initially trying to set this all up as its not super clear from the docs itself so I can understand the confusion 😅

As far as I understood it at the time, the nomad_volume just does a volume register whereas the nomad_external_volume seemed to be using the volume create endpoint. This is initially why I raised the issue as if I use the volume create endpoint directly through a curl/postman request the mount_flags were set correctly and my workloads would start correctly (as they require cifs credentials to work) whereas they seemed to be omitted if I used this provider.

I am mostly setting these mount_flags at the time of volume creation as I could not figure out how to define them in the nomad job file from a vault template as they seem to be limited to the task stanza whereas the volume mount_flags are at the group stanza level.

Ideally I would be setting these credentials in the volume secrets but these also don't seem to be passed along either 😅 regardless of this the CSI controller I am using just converts these secrets into mount_flags and appends them onto the end anyway so for simplicity I want to at least get the mount_flags working first before complicating it even more :)

@tgross
Copy link
Member

tgross commented Mar 1, 2022

@CarbonCollins I'm realizing after having just wrapped up hashicorp/nomad#12150 that we redact the mount options coming back from the API, precisely because they can be secrets. Having data redacted in the API definitely seems like it would break Terraform's model for how resources are supposed to work. I'm certain there's a canonical way in TF providers to handle that but I'm going to admit that I'm not an expert on that.

Let me tag-in my colleague @lgfa29 to see if he has any suggestions. 👋

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 2, 2022

As far as I understood it at the time, the nomad_volume just does a volume register whereas the nomad_external_volume seemed to be using the volume create endpoint.

Ah yes, that was an unfortunate order of events that caused the resource names to be extra confusing, but we have to stick with it because renaming resources is a pretty big breaking change. Sorry about that.

Having data redacted in the API definitely seems like it would break Terraform's model for how resources are supposed to work.

Yup, but already got bitten by it, so it shouldn't be a problem anymore:
https://github.com/hashicorp/terraform-provider-nomad/blob/main/nomad/resource_volume.go#L439-L440

Using the config you provided and intercepting the request I see that MountFlags is null for some reason

{
    "Volumes": [
        {
            "ID": "plex-data",
            "Name": "plex-data",
            "ExternalID": "",
            "Namespace": "",
            "Topologies": null,
            "AccessMode": "",
            "AttachmentMode": "",
            "MountOptions": {
                "FSType": "cifs",
                "MountFlags": null
            },
...

We're probably not parsing that structure correctly.

I will need some time to investigate this further, but I will let you know once I find the root cause.

@tgross
Copy link
Member

tgross commented Mar 29, 2022

Possibly fixed by hashicorp/nomad#12150, which will ship in Nomad 1.3.0

@tgross
Copy link
Member

tgross commented Apr 25, 2022

Hey I wanted to follow up on this because unfortunately it doesn't look like the Nomad 1.3.0 fixes actually fixed anything here, and it doesn't look like it's related to redaction.

I ran the following on my local environment with the current main of Nomad:

terraform config
provider "nomad" {
  address = "http://localhost:4646"
  region  = "global"
}

resource "nomad_external_volume" "volume0" {
  type         = "csi"
  plugin_id    = "org.democratic-csi.nfs"
  volume_id    = "csi-volume-nfs"
  name         = "csi-volume-nfs"
  capacity_min = "1GiB"
  capacity_max = "1GiB"

  capability {
    access_mode     = "single-node-writer"
    attachment_mode = "file-system"
  }

  mount_options {
    mount_flags = ["noatime"]
  }
}
terraform apply
$ terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are
indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # nomad_external_volume.volume0 will be created
  + resource "nomad_external_volume" "volume0" {
      + capacity_max            = "1GiB"
      + capacity_min            = "1GiB"
      + controller_required     = (known after apply)
      + controllers_expected    = (known after apply)
      + controllers_healthy     = (known after apply)
      + id                      = (known after apply)
      + name                    = "csi-volume-nfs"
      + namespace               = "default"
      + nodes_expected          = (known after apply)
      + nodes_healthy           = (known after apply)
      + plugin_id               = "org.democratic-csi.nfs"
      + plugin_provider         = (known after apply)
      + plugin_provider_version = (known after apply)
      + schedulable             = (known after apply)
      + type                    = "csi"
      + volume_id               = "csi-volume-nfs"

      + capability {
          + access_mode     = "single-node-writer"
          + attachment_mode = "file-system"
        }

      + mount_options {
          + mount_flags = [
              + "noatime",
            ]
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

nomad_external_volume.volume0: Creating...
nomad_external_volume.volume0: Creation complete after 0s [id=csi-volume-nfs]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

The results in the CLI:

$ nomad volume status csi-volume-nfs
ID                   = csi-volume-nfs
Name                 = csi-volume-nfs
External ID          = csi-volume-nfs
Plugin ID            = org.democratic-csi.nfs
Provider             = org.democratic-csi.nfs
Version              = 1.6.1
Schedulable          = false
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 0
Nodes Expected       = 0
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed

Ok, so maybe it's a problem with the CLI. Let's check the API:

$ nomad operator api '/v1/volume/csi/csi-volume-nfs' | jq .MountOptions
{
  "MountOptions": {
    "FSType": "",
    "MountFlags": null
  }
}

Nope, that's not it. Just to make sure it's not an issue with the Nomad API's redaction of the outputs, I went directly to the state store:

$ sudo nomad operator raft state . | jq '.CSIVolumes[0].MountOptions'
{
  "FSType": "",
  "MountFlags": null
}

Finally, I've dumped the API request we get from Terraform and it's nil there too. So the data just isn't there. But if I create the same volume via nomad volume create, I can see the mount flags (redacted) as expected:

$ nomad operator api '/v1/volume/csi/csi-volume-nfs' | jq '.MountOptions'
{
  "FSType": "",
  "MountFlags": [
    "[REDACTED]"
  ]
}

I'll dig into the provider code a bit to see if I can figure out what's going on there.

@tgross
Copy link
Member

tgross commented Apr 25, 2022

Should be fixed in #266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants