Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. #584

Closed
grassiale opened this issue Jun 23, 2022 · 2 comments · Fixed by #593
Closed
Assignees
Labels
zh:Done Issues in the ZenHub pipeline 'Done'
Milestone

Comments

@grassiale
Copy link

grassiale commented Jun 23, 2022

Hello k8ssandra comunity!

What did you do?
We are performing a medusa backup of a cassandra datacenter with more than 200GBs of data per node, using AWS EBS volumes, then trying to restore it either on a different cluster or on the same. We are not achieving that because medusa-restore init container downloads backup data on tmp folder, that is an emptydir, causing disk-pressure on the node (only has 30Gb storage locally) and pod eviction.

Did you expect to see some different?
I would expect medusa to download data on the paths mounted to the EBS volume in order to use a storage that was defined accordingly to the sizes needed by the data.

Environment

AWS EKS 1.21
AWS EBS GP3 via EBS CSI Driver

  • K8ssandra Operator version:

    v1.1.1

  • Kubernetes version information:

   Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.12-eks-a64ea69", GitCommit:"d4336843ba36120e9ed1491fddff5f2fec33eb77", GitTreeState:"clean", BuildDate:"2022-05-12T18:29:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:
EKS
  • Manifests:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: k8ssandra-dev
spec:
  auth: false
  cassandra: 
    serverVersion: "4.0.4"
    serverImage: "k8ssandra/cass-management-api:4.0.4-v0.1.40"
    telemetry: 
      prometheus:
        enabled: true
    datacenters:
    - metadata:
        name: k8ssandra-dev
      size: 12
      config:
        jvmOptions:
          heapSize: "15G"
          additionalOptions:
            - "-Dcassandra.system_distributed_replication_dc_names=k8ssandra-dev"
            - "-Dcassandra.system_distributed_replication_per_dc=3"
        cassandraYaml:
          num_tokens: 16
          allocate_tokens_for_local_replication_factor: 3
          authenticator: AllowAllAuthenticator
          authorizer: AllowAllAuthorizer
          role_manager: CassandraRoleManager
          read_request_timeout_in_ms: 8000
          write_request_timeout_in_ms: 6000
          request_timeout_in_ms: 12000
          compaction_throughput_mb_per_sec: 200
          concurrent_compactors: 32 
          stream_entire_sstables: true
          stream_throughput_outbound_megabits_per_sec: 61440
          streaming_connections_per_host: 6
      storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: ebs-gp3
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1600Gi
            persistentVolumeReclaimPolicy: Retain
      networking:
        nodePort:
          native: 30001
          internode: 30002
      resources:
        requests:
          memory: 60Gi
          cpu: 7000m
      racks:
        - name: rack-a
          nodeAffinityLabels:
            topology.kubernetes.io/zone: eu-west-1a
        - name: rack-b
          nodeAffinityLabels:
            topology.kubernetes.io/zone: eu-west-1b
        - name: rack-c
          nodeAffinityLabels:
            topology.kubernetes.io/zone: eu-west-1c
      tolerations:
      - key: "datanode"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
  medusa:
    storageProperties:
      storageProvider: s3
      storageSecretRef:
        name: medusa-bucket-key
      bucketName: bucket-name
      prefix: cassandra
      region: eu-west-1
      transferMaxBandwidth: 500MB/s
      concurrentTransfers: 4
  • K8ssandra Operator Logs:
insert K8ssandra Operator logs relevant to the issue here

Anything else we need to know?:

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1596
┆priority: Medium

@sync-by-unito sync-by-unito bot changed the title Medusa Restore is downloading data to /tmp folder that is ephemeral. K8SSAND-1596 ⁃ Medusa Restore is downloading data to /tmp folder that is ephemeral. Jun 23, 2022
@adejanovski adejanovski added the zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' label Jun 23, 2022
@adejanovski
Copy link
Contributor

You're totally right, we didn't change the default which will perform the restore on /tmp, nor did we give the ability to change the location where files will be downloaded.

We should straight away download the files on the volume that holds the Cassandra data, which is already mounted anyway.

@adejanovski
Copy link
Contributor

Please add your planning poker estimate with ZenHub @jsanda

@adejanovski adejanovski added zh:Ready and removed zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' labels Jun 27, 2022
@adejanovski adejanovski self-assigned this Jun 27, 2022
@adejanovski adejanovski added zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' and removed zh:Ready zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' labels Jun 27, 2022
@adejanovski adejanovski added zh:Ready-For-Review Issues in the ZenHub pipeline 'Ready-For-Review' zh:Review Issues in the ZenHub pipeline 'Review' zh:Done Issues in the ZenHub pipeline 'Done' and removed zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' zh:Ready-For-Review Issues in the ZenHub pipeline 'Ready-For-Review' zh:Review Issues in the ZenHub pipeline 'Review' labels Jun 30, 2022
@adejanovski adejanovski added this to the v1.2 milestone Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zh:Done Issues in the ZenHub pipeline 'Done'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants