Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline deployment of GESIS stage server #3090

Open
wants to merge 89 commits into
base: main
Choose a base branch
from

Conversation

rgaiacs
Copy link
Collaborator

@rgaiacs rgaiacs commented Sep 6, 2024

This is related to #2797

The configuration in the ansible folder is working and GitLab CI at .gitlab-ci.yml is also working.

I'm trying to complete the Kubernetes cluster configuration in the Helm chart.

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Sep 10, 2024

Thanks @manics for the reply and comments. I was able to disable the the attempt to contact Google Cloud with

analyticsPublisher:
enabled: false

The problem that I have is that all persistent volume claims are pending.

kubectl get -n gesis pvc
NAME                                        STATUS    VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
binderhub-grafana                           Pending                                                           24h
binderhub-harbor-jobservice                 Pending                                                           4d16h
binderhub-harbor-registry                   Pending                                                           4d16h
binderhub-prometheus-server                 Pending                                            standard       24h
data-binderhub-harbor-redis-0               Bound     alertmanager   5Gi        RWO                           4d16h
data-binderhub-harbor-trivy-0               Pending                                                           4d16h
database-data-binderhub-harbor-database-0   Pending                                                           4d16h
hub-db-dir                                  Pending                                                           24h

I know that I need to declare a correct persistent volume but I can't find where the persistent volume is declared for OVH or CurveNote. @manics can you point me to the persistent volume declaration? Thanks!

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Sep 10, 2024

I have the main pods running.

kubectl get -n gesis pods
NAME                                                     READY   STATUS             RESTARTS   AGE
binder-7c84c576c-2689p                                   1/1     Running            0          80m
binderhub-cryptnono-c9hrj                                2/2     Running            0          128m
binderhub-cryptnono-dgr4g                                2/2     Running            0          128m
binderhub-cryptnono-hqpzf                                2/2     Running            0          128m
binderhub-cryptnono-pbqlx                                2/2     Running            0          128m
binderhub-dind-ntxvs                                     1/1     Running            0          80m
binderhub-grafana-9d48bc74-qtn4x                         1/1     Running            0          62m
binderhub-image-cleaner-6zc9v                            1/1     Running            0          80m
binderhub-ingress-nginx-controller-6fdbf98688-j29w2      1/1     Running            0          47m
binderhub-ingress-nginx-defaultbackend-5d698c868-qh5zx   1/1     Running            0          128m
binderhub-kube-state-metrics-8547b9d4dd-rr4tw            1/1     Running            0          128m
binderhub-prometheus-node-exporter-4dv2s                 1/1     Running            0          128m
binderhub-prometheus-node-exporter-c8bv7                 1/1     Running            0          128m
binderhub-prometheus-node-exporter-gkxcf                 1/1     Running            0          128m
binderhub-prometheus-node-exporter-wfk7h                 1/1     Running            0          128m
binderhub-prometheus-server-7c59dd5d85-fwbqm             2/2     Running            0          128m
hub-6564cd475f-nxltz                                     1/1     Running            0          13m
minesweeper-bf58z                                        0/1     ImagePullBackOff   0          128m
minesweeper-fkjd6                                        0/1     ImagePullBackOff   0          128m
minesweeper-t2fs8                                        0/1     ImagePullBackOff   0          128m
proxy-f5b566ddc-j7l9l                                    1/1     Running            0          80m
proxy-patches-85b5998bdb-9mjw9                           1/1     Running            0          128m
static-6f64c6bc8-ndn2t                                   1/1     Running            0          128m
user-scheduler-55df956bcf-6b4m6                          1/1     Running            0          80m
user-scheduler-55df956bcf-db79g                          1/1     Running            0          80m

Ingress

The ingress is not working. The goal here is to have http://notebooks-test.gesis.org being answer by the NGINX Ingress pod. @manics can you help me?

ping -c 1 notebooks-test.gesis.org
PING notebooks-test.gesis.org (194.95.75.20) 56(84) bytes of data.
64 bytes from svko-css-backup-node.gesis.intra (194.95.75.20): icmp_seq=1 ttl=61 time=2.26 ms

--- notebooks-test.gesis.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.261/2.261/2.261/0.000 ms
kubectl -n gesis describe ingress binderhub
Name:             binderhub
Labels:           app.kubernetes.io/managed-by=Helm
Namespace:        gesis
Address:          10.100.230.222
Ingress Class:    <none>
Default backend:  <default>
TLS:
  kubelego-tls-binder-binderhub terminates notebooks-test.gesis.org
Rules:
  Host                      Path  Backends
  ----                      ----  --------
  notebooks-test.gesis.org  
                            /   binder:80 (10.244.255.21:8585)
Annotations:                kubernetes.io/ingress.class: nginx
                            kubernetes.io/tls-acme: true
                            meta.helm.sh/release-name: binderhub
                            meta.helm.sh/release-namespace: gesis
Events:
  Type    Reason  Age   From                      Message
  ----    ------  ----  ----                      -------
  Normal  Sync    54m   nginx-ingress-controller  Scheduled for sync
  Normal  Sync    54m   nginx-ingress-controller  Scheduled for sync
  Normal  Sync    53m   nginx-ingress-controller  Scheduled for sync
kubectl -n gesis describe service binderhub-ingress-nginx-controller
Name:              binderhub-ingress-nginx-controller
Namespace:         gesis
Labels:            app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=binderhub
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=ingress-nginx
                   app.kubernetes.io/part-of=ingress-nginx
                   app.kubernetes.io/version=1.11.2
                   helm.sh/chart=ingress-nginx-4.11.2
Annotations:       meta.helm.sh/release-name: binderhub
                   meta.helm.sh/release-namespace: gesis
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=binderhub,app.kubernetes.io/name=ingress-nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.100.230.222
IPs:               10.100.230.222
Port:              http  80/TCP
TargetPort:        http/TCP
Endpoints:         10.244.65.205:80
Port:              https  443/TCP
TargetPort:        https/TCP
Endpoints:         10.244.65.205:443
Session Affinity:  None
Events:            <none>

minesweeper

The image name is wrong. It is trying to pull jupyterhub/mybinder.org-minesweeper:set-by-chartpress.

kubectl -n gesis describe pod minesweeper-bf58z
Name:             minesweeper-bf58z
Namespace:        gesis
Priority:         0
Service Account:  minesweeper
Node:             svko-css-backup-node/194.95.75.20
Start Time:       Tue, 10 Sep 2024 14:27:52 +0200
Labels:           app=binder
                  component=minesweeper
                  controller-revision-hash=767d8795cc
                  heritage=Helm
                  name=minesweeper
                  pod-template-generation=1
                  release=binderhub
Annotations:      checksum/configmap: 7a857debb16fa8bcb22a5de6418a5ff319c9e06f4cfc010705caec539b9614cc
                  cni.projectcalico.org/containerID: a3415f68c66691989387a7ea9bc5c6dd5cfa8039affee823adfd0a9b8f0b7263
                  cni.projectcalico.org/podIP: 10.244.65.206/32
                  cni.projectcalico.org/podIPs: 10.244.65.206/32
Status:           Pending
IP:               10.244.65.206
IPs:
  IP:           10.244.65.206
Controlled By:  DaemonSet/minesweeper
Containers:
  minesweeper:
    Container ID:  
    Image:         jupyterhub/mybinder.org-minesweeper:set-by-chartpress
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      python
      /srv/minesweeper/minesweeper.py
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  250Mi
    Requests:
      cpu:     100m
      memory:  100Mi
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
      NAMESPACE:  gesis
    Mounts:
      /etc/minesweeper from config (ro)
      /srv/minesweeper from src (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5wbfq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  src:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      minesweeper-src
    Optional:  false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      minesweeper-config
    Optional:  false
  kube-api-access-5wbfq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 hub.jupyter.org/dedicated=user:NoSchedule
                             hub.jupyter.org_dedicated=user:NoSchedule
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason   Age                     From     Message
  ----    ------   ----                    ----     -------
  Normal  BackOff  4m51s (x545 over 129m)  kubelet  Back-off pulling image "jupyterhub/mybinder.org-minesweeper:set-by-chartpress"

@manics
Copy link
Member

manics commented Sep 10, 2024

Can you try running an ephemeral pod in the same namespace, and exec something like curl -v http://binderhub-ingress-nginx-controller/ from the pod? That should return a 404 from the Nginx controller default backend. You might need to add the internal service port. Note the existing pods may be restricted by NetworkPolicies, so best to create a new pod. I often use https://gist.github.com/manics/67efaed42d25cc1f830e0d5566652b03 as netshoot includes several useful tools for troubleshooting networks.

Then try curl -v --header 'Host: notebooks-test.gesis.org' http://binderhub-ingress-nginx-controller/ from the pod which should fool the ingress controller into thinking you've requested notebooks-test.gesis.org.

If that works it means the controller and your internal BinderHub/JupyterHub ingress is (probably!) working, and the problem is likely in the path between the external internet and the internal ingress.

@manics
Copy link
Member

manics commented Sep 10, 2024

For the chartpress tag problem you'll need to first run chartpress --skip-build to update the set-by-chartpress placeholders:

- name: "Stage 3: Run chartpress to update values.yaml"
run: |
chartpress ${{ matrix.chartpress_args || '--skip-build' }}

The actual building and pushing of the container images is done in the staging workflow, and since chartpress deterministically generates the tag based on git commit hash it's fine to rerun it to update the tags.

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Sep 11, 2024

Thanks @manics for the reply. I will look into chartpress. And I believe the problem with traffic is because of the load balancer. I looking at MetalLB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants