Skip to content
This repository has been archived by the owner on Apr 29, 2020. It is now read-only.

Can't connect peer to bootstrap as in tests #32

Open
mikhail-manuilov opened this issue Oct 12, 2017 · 8 comments
Open

Can't connect peer to bootstrap as in tests #32

mikhail-manuilov opened this issue Oct 12, 2017 · 8 comments

Comments

@mikhail-manuilov
Copy link

mikhail-manuilov commented Oct 12, 2017

Hello, I've created ipfs-cluster 4 ipfs-cluster nodes in kubernetes using examples from here.

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: ipfs-cluster-bootstrapper
  labels:
    name: ipfs-cluster
    app: ipfs
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "5001"
    prometheus.io/path: "debug/metrics/prometheus"
spec:
  replicas: 1
  serviceName: ipfs-cluster-svc
  template:
    metadata:
      labels:
        name: ipfs-cluster
        role: bootstrapper
        app: ipfs
    spec:
      containers:
      - name: ipfs-cluster-bootstrapper
        image: "ipfs/ipfs-cluster:latest"
        command: ["/usr/local/bin/start-daemons.sh"]
        args:
          - --loglevel
          - debug
          - --debug
        ports:
        - containerPort: 4001
          name: "swarm"
          protocol: "TCP"
        - containerPort: 5001
          name: "api"
          protocol: "TCP"
        - containerPort: 9094
          name: "clusterapi"
          protocol: "TCP"
        - containerPort: 9095
          name: "clusterproxy"
          protocol: "TCP"
        - containerPort: 9096
          name: "cluster"
          protocol: "TCP"
        volumeMounts:
          - mountPath: /data
            name: data
  volumeClaimTemplates:
    - metadata:
        annotations:
          volume.alpha.kubernetes.io/storage-class: default
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: ipfs-cluster-peers
  labels:
    name: ipfs-cluster
#    app: ipfs 
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "5001"
    prometheus.io/path: "debug/metrics/prometheus"
spec:
  replicas: 3
  serviceName: ipfs-cluster-svc
  template:
    metadata:
      labels:
        name: ipfs-cluster
#        app: ipfs 
        role: peer
    spec:
      containers:
      - name: ipfs-cluster
        image: "ipfs/ipfs-cluster:latest"
        imagePullPolicy: IfNotPresent
        command: ["/usr/local/bin/start-daemons.sh"]
        args:
          - --loglevel
          - debug
        ports:
        - containerPort: 4001
          name: "swarm"
          protocol: "TCP"
        - containerPort: 5001
          name: "api"
          protocol: "TCP"
        - containerPort: 9094
          name: "clusterapi"
          protocol: "TCP"
        - containerPort: 9095
          name: "clusterproxy"
          protocol: "TCP"
        - containerPort: 9096
          name: "cluster"
          protocol: "TCP"
        volumeMounts:
          - mountPath: /data
            name: data
  volumeClaimTemplates:
    - metadata:
        annotations:
          volume.alpha.kubernetes.io/storage-class: default
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

Also I created service to interconnect nodes:

apiVersion: v1
kind: Service
metadata:
  name: ipfs-cluster-svc
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: cluster
    targetPort: 9096
    port: 9096
    protocol: TCP
  - name: clusterapi
    targetPort: 9094
    port: 9094
    protocol: TCP
  - targetPort: 9095
    port: 9095
    name: clusterproxy
    protocol: TCP
  selector:
    name: ipfs-cluster

Then I run script to add peers to bbotstrap peer (as in init.sh)

kubectl get pods -l name=ipfs-cluster,role=peer -o jsonpath={.items[*].metadata.name}
+ xargs -n1
+ pods=ipfs-cluster-peers-0
ipfs-cluster-peers-1
ipfs-cluster-peers-2
+ kubectl get pods -l name=ipfs-cluster,role=bootstrapper -o jsonpath={.items[*].metadata.name}
+ bootstrapper=ipfs-cluster-bootstrapper-0
+ kubectl get pods ipfs-cluster-peers-0 -o jsonpath={.status.podIP}
+ + jq -r .id
kubectl exec ipfs-cluster-peers-0 -- ipfs-cluster-ctl --enc json id
+ echo /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww
+ addr=/ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww
+ kubectl exec ipfs-cluster-bootstrapper-0 -- ipfs-cluster-ctl peers add /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww
An error ocurred:
 **Code: 500
 Message: dial attempt failed: <peer.ID dYrcex> --> <peer.ID ZwQwbA> dial attempt failed: incoming message was too large**

This is log of bootstrap pod:

4:30:56.453 DEBUG  p2p-gorpc: makeCall: Cluster.PeerAdd client.go:106
14:30:56.453 DEBUG  p2p-gorpc: local call: Cluster.PeerAdd client.go:112
14:30:56.453 DEBUG    cluster: peerAdd called with /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww cluster.go:512
14:30:56.453 DEBUG    cluster: adding peer /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:33
14:30:56.453  INFO    cluster: new Cluster peer /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:41
14:30:56.453 DEBUG  p2p-gorpc: makeCall: Cluster.RemoteMultiaddrForPeer client.go:106
14:30:56.453 DEBUG  p2p-gorpc: sending remote call client.go:144
14:30:58.287 DEBUG    monitor: monitoring tick peer_monitor.go:264
14:30:58.287 DEBUG  p2p-gorpc: makeCall: Cluster.PeerManagerPeers client.go:106
14:30:58.287 DEBUG  p2p-gorpc: local call: Cluster.PeerManagerPeers client.go:112
14:30:58.287 DEBUG    monitor: check metrics ping peer_monitor.go:278
14:30:58.287 DEBUG    monitor: check metrics disk-freespace peer_monitor.go:278
14:30:59.892 DEBUG    cluster: Leader <peer.ID dYrcex> about to broadcast metric ping to [<peer.ID ZwQwbA> <peer.ID dYrcex>]. Expires: 2017-10-12T14:31:29.892463518Z cluster.go:229
14:30:59.892 DEBUG  p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:30:59.892 DEBUG  p2p-gorpc: local call: Cluster.PeerMonitorLogMetric client.go:112
14:30:59.892 DEBUG    monitor: logged 'ping' metric from '<peer.ID dYrcex>'. Expires on 2017-10-12T14:31:29.892463518Z peer_monitor.go:181
14:30:59.892 DEBUG  p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:30:59.892 DEBUG  p2p-gorpc: sending remote call client.go:144
14:31:00.870 DEBUG  p2p-gorpc: makeCall: Cluster.IPFSFreeSpace client.go:106
14:31:00.870 DEBUG  p2p-gorpc: local call: Cluster.IPFSFreeSpace client.go:112
14:31:00.870 DEBUG   ipfshttp: getting repo/stat ipfshttp.go:697
14:31:00.872 DEBUG    cluster: Leader <peer.ID dYrcex> about to broadcast metric disk-freespace to [<peer.ID dYrcex> <peer.ID ZwQwbA>]. Expires: 2017-10-12T14:31:30.872385036Z cluster.go:229
14:31:00.872 DEBUG  p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:31:00.872 DEBUG  p2p-gorpc: sending remote call client.go:144
14:31:00.872 DEBUG  p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:31:00.872 DEBUG  p2p-gorpc: local call: Cluster.PeerMonitorLogMetric client.go:112
14:31:00.872 DEBUG    monitor: logged 'disk-freespace' metric from '<peer.ID dYrcex>'. Expires on 2017-10-12T14:31:30.872385036Z peer_monitor.go:181
14:31:06.454 ERROR    cluster: dial attempt failed: context deadline exceeded cluster.go:537
14:31:06.454 DEBUG    cluster: removing peer QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:52
14:31:06.454 ERROR    cluster: error pushing metric to QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww: dial attempt failed: context deadline exceeded cluster.go:238
14:31:06.454 ERROR    cluster: error pushing metric to QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww: dial attempt failed: context deadline exceeded cluster.go:238
14:31:06.454  INFO    cluster: removing Cluster peer QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:55
14:31:06.454 DEBUG    cluster: Leader <peer.ID dYrcex> broadcasted metric disk-freespace to [<peer.ID dYrcex> <peer.ID ZwQwbA>]. Expires: 2017-10-12T14:31:30.872385036Z cluster.go:241
14:31:06.454 ERROR  p2p-gorpc: dial attempt failed: context deadline exceeded client.go:125
14:31:06.454 DEBUG    cluster: Leader <peer.ID dYrcex> broadcasted metric ping to [<peer.ID ZwQwbA> <peer.ID dYrcex>]. Expires: 2017-10-12T14:31:29.892463518Z cluster.go:241
14:31:06.454 ERROR    restapi: sending error response: 500: dial attempt failed: context deadline exceeded restapi.go:519

Tcpdump shows normal TCP\IP flow:

# tcpdump host 10.244.4.120 and port not 4001
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:24:51.683723 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [S], seq 3040182720, win 29200, options [mss 1418,sackOK,TS val 947970455 ecr 0,nop,wscale 7], length 0
13:24:51.683767 IP ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [S.], seq 1428266664, ack 3040182721, win 28960, options [mss 1460,sackOK,TS val 1040225904 ecr 947970455,nop,wscale 7], length 0
13:24:51.684452 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [.], ack 1, win 229, options [nop,nop,TS val 947970455 ecr 1040225904], length 0
13:24:51.684734 IP ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [P.], seq 1:25, ack 1, win 227, options [nop,nop,TS val 1040225904 ecr 947970455], length 24
13:24:51.684775 IP ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [P.], seq 25:45, ack 1, win 227, options [nop,nop,TS val 1040225904 ecr 947970455], length 20
13:24:51.685393 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [.], ack 25, win 229, options [nop,nop,TS val 947970455 ecr 1040225904], length 0
13:24:51.685410 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [.], ack 45, win 229, options [nop,nop,TS val 947970455 ecr 1040225904], length 0
@hsanjuan
Copy link
Member

Probably a problem with the cluster secret used by each peer.

Note that this project is solely for running a number of automated tests on ipfs/ipfs-cluster and not for deploying any of them for real-world-use within kubernetes.

@mikhail-manuilov
Copy link
Author

mikhail-manuilov commented Oct 12, 2017

I understand kubernetes here is purely for testing purposes, but just want to clarify some stuff if it's possible.

Why do tests do not require the same secrets across all peers?

I thought about same secret, but got some strange issue that folders /data/ipfs and /data/ipfs-cluster are updated each time pod dies (and starts again). So I can't change secret in service.json and restart, I looked into /usr/local/bin/start-daemons.sh maybe it's purely Azure problem, but files other than these two directories are not changed.

I don't have such issue in docker and two local volumes for each daemon.

@hsanjuan
Copy link
Member

Why do tests do not require the same secrets across all peers?

They do, afaik they just run a custom container which ensures that.

Other than that, I am not sure why your /data folders are not persistent.

@mikhail-manuilov
Copy link
Author

mikhail-manuilov commented Oct 12, 2017

Seems like I know the root of my problem: VOLUME directives in ipfs/go-ipfs and ipfs/ipfs-cluster Dockerfile's. Seems like I need multiple volumes to run pods, not expected behaviour at all.

@hsanjuan
Copy link
Member

I don't fully understand how VOLUME directives affect kubernetes, but maybe want to open an issue and explain? We can fix the dockerfiles if there's a way to improve them...

@mikhail-manuilov
Copy link
Author

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: ipfs-cluster-bootstrapper
  labels:
    name: ipfs-cluster
    app: ipfs
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "5001"
    prometheus.io/path: "debug/metrics/prometheus"
spec:
  replicas: 1
  serviceName: ipfs-cluster-svc
  template:
    metadata:
      labels:
        name: ipfs-cluster
        role: bootstrapper
        app: ipfs
    spec:
      containers:
      - name: ipfs-cluster-bootstrapper
        image: "ipfs/ipfs-cluster:latest"
        command: ["/usr/local/bin/start-daemons.sh"]
        args:
          - --loglevel
          - debug
          - --debug
        ports:
        - containerPort: 4001
          name: "swarm"
          protocol: "TCP"
        - containerPort: 5001
          name: "api"
          protocol: "TCP"
        - containerPort: 9094
          name: "clusterapi"
          protocol: "TCP"
        - containerPort: 9095
          name: "clusterproxy"
          protocol: "TCP"
        - containerPort: 9096
          name: "cluster"
          protocol: "TCP"
        volumeMounts:
          - mountPath: /data/ipfs
            name: data-ipfs
          - mountPath: /data/ipfs-cluster
            name: data-ipfs-cluster
  volumeClaimTemplates:
    - metadata:
        annotations:
          volume.alpha.kubernetes.io/storage-class: default
        name: data-ipfs 
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
    - metadata:
        annotations:
          volume.alpha.kubernetes.io/storage-class: default
        name: data-ipfs-cluster
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

This is how to fix this behavior. NEEDS two volumes.

@FrankPetrilli
Copy link
Collaborator

@mikhail-manuilov, would you want to send in a pull request with the changes you're proposing? I'll be happy to look it over and approve it once I confirm it meets our requirements.

@mikhail-manuilov
Copy link
Author

mikhail-manuilov commented Oct 13, 2017

Since there's kubernetes definition files are for testing purposes only, and posted above tested only in Azure cloud. Also I suppose having two volumes for one container is no-good, maybe Dockerfile should be changed for ipfs/go-ipfs and ipfs/ipfs-cluster. Since ipfs/ipfs-cluster uses FROM ipfs/go-ipfs, I suppose creating one VOLUME for /data in ipfs/go-ipfs and deleting VOLUME $IPFS_CLUSTER_PATH from ipfs/ipfs-cluster Dockerfile will do the job

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants