K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. #584

grassiale · 2022-06-23T08:27:00Z

Hello k8ssandra comunity!

What did you do?
We are performing a medusa backup of a cassandra datacenter with more than 200GBs of data per node, using AWS EBS volumes, then trying to restore it either on a different cluster or on the same. We are not achieving that because medusa-restore init container downloads backup data on tmp folder, that is an emptydir, causing disk-pressure on the node (only has 30Gb storage locally) and pod eviction.

Did you expect to see some different?
I would expect medusa to download data on the paths mounted to the EBS volume in order to use a storage that was defined accordingly to the sizes needed by the data.

Environment

AWS EKS 1.21
AWS EBS GP3 via EBS CSI Driver

K8ssandra Operator version:

v1.1.1
Kubernetes version information:

   Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.12-eks-a64ea69", GitCommit:"d4336843ba36120e9ed1491fddff5f2fec33eb77", GitTreeState:"clean", BuildDate:"2022-05-12T18:29:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

EKS

Manifests:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: k8ssandra-dev
spec:
  auth: false
  cassandra: 
    serverVersion: "4.0.4"
    serverImage: "k8ssandra/cass-management-api:4.0.4-v0.1.40"
    telemetry: 
      prometheus:
        enabled: true
    datacenters:
    - metadata:
        name: k8ssandra-dev
      size: 12
      config:
        jvmOptions:
          heapSize: "15G"
          additionalOptions:
            - "-Dcassandra.system_distributed_replication_dc_names=k8ssandra-dev"
            - "-Dcassandra.system_distributed_replication_per_dc=3"
        cassandraYaml:
          num_tokens: 16
          allocate_tokens_for_local_replication_factor: 3
          authenticator: AllowAllAuthenticator
          authorizer: AllowAllAuthorizer
          role_manager: CassandraRoleManager
          read_request_timeout_in_ms: 8000
          write_request_timeout_in_ms: 6000
          request_timeout_in_ms: 12000
          compaction_throughput_mb_per_sec: 200
          concurrent_compactors: 32 
          stream_entire_sstables: true
          stream_throughput_outbound_megabits_per_sec: 61440
          streaming_connections_per_host: 6
      storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: ebs-gp3
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1600Gi
            persistentVolumeReclaimPolicy: Retain
      networking:
        nodePort:
          native: 30001
          internode: 30002
      resources:
        requests:
          memory: 60Gi
          cpu: 7000m
      racks:
        - name: rack-a
          nodeAffinityLabels:
            topology.kubernetes.io/zone: eu-west-1a
        - name: rack-b
          nodeAffinityLabels:
            topology.kubernetes.io/zone: eu-west-1b
        - name: rack-c
          nodeAffinityLabels:
            topology.kubernetes.io/zone: eu-west-1c
      tolerations:
      - key: "datanode"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
  medusa:
    storageProperties:
      storageProvider: s3
      storageSecretRef:
        name: medusa-bucket-key
      bucketName: bucket-name
      prefix: cassandra
      region: eu-west-1
      transferMaxBandwidth: 500MB/s
      concurrentTransfers: 4

K8ssandra Operator Logs:

insert K8ssandra Operator logs relevant to the issue here

Anything else we need to know?:

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1596
┆priority: Medium

The text was updated successfully, but these errors were encountered:

adejanovski · 2022-06-23T10:16:09Z

You're totally right, we didn't change the default which will perform the restore on /tmp, nor did we give the ability to change the location where files will be downloaded.

We should straight away download the files on the volume that holds the Cassandra data, which is already mounted anyway.

adejanovski · 2022-06-23T10:16:49Z

Please add your planning poker estimate with ZenHub @jsanda

sync-by-unito bot changed the title ~~Medusa Restore is downloading data to /tmp folder that is ephemeral.~~ K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. Jun 23, 2022

adejanovski added the zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' label Jun 23, 2022

adejanovski added zh:Ready and removed zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' labels Jun 27, 2022

adejanovski self-assigned this Jun 27, 2022

adejanovski added zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' and removed zh:Ready zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' labels Jun 27, 2022

adejanovski mentioned this issue Jun 30, 2022

Make Medusa use the Cassandra data PV for restore downloads #593

Merged

5 tasks

adejanovski added this to the v1.2 milestone Jul 5, 2022

adejanovski closed this as completed in #593 Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. #584

K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. #584

grassiale commented Jun 23, 2022 •

edited

Loading

adejanovski commented Jun 23, 2022

adejanovski commented Jun 23, 2022

K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. #584

K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. #584

Comments

grassiale commented Jun 23, 2022 • edited Loading

adejanovski commented Jun 23, 2022

adejanovski commented Jun 23, 2022

grassiale commented Jun 23, 2022 •

edited

Loading