Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[求助/Help] default-telegraf-2ggl7 default-host-5kbs7 无法启动 #21956

Closed
callme80 opened this issue Jan 11, 2025 · 3 comments
Closed

[求助/Help] default-telegraf-2ggl7 default-host-5kbs7 无法启动 #21956

callme80 opened this issue Jan 11, 2025 · 3 comments
Labels
question Further information is requested

Comments

@callme80
Copy link

版本 ocboot-master-v3.11.9
host 和 telegraf 无法启动
kubectl -n onecloud get pods default-telegraf-2ggl7 default-host-5kbs7
NAME READY STATUS RESTARTS AGE
default-telegraf-2ggl7 0/1 Init:CrashLoopBackOff 25 (100s ago) 104m
default-host-5kbs7 2/3 Running 1 (2m47s ago) 7m47s

`Name: default-telegraf-d2wqh
Namespace: onecloud
Priority: 0
Service Account: onecloud-operator
Node: h100/192.168.50.198
Start Time: Sat, 11 Jan 2025 07:31:08 +0000
Labels: app=telegraf
app.kubernetes.io/component=telegraf
app.kubernetes.io/instance=onecloud-cluster-2b9b
app.kubernetes.io/managed-by=onecloud-operator
app.kubernetes.io/name=onecloud-cluster
controller-revision-hash=56c4c685f8
pod-template-generation=2
Annotations: onecloud.yunion.io/last-applied-configuration:
{"volumes":[{"name":"etc-telegraf","hostPath":{"path":"/etc/telegraf","type":"DirectoryOrCreate"}},{"name":"root","hostPath":{"path":"/","...
Status: Pending
IP: 192.168.50.198
IPs:
IP: 192.168.50.198
Controlled By: DaemonSet/default-telegraf
Init Containers:
telegraf-init:
Container ID: containerd://aca77256a0f40a3ddccb375108df6786e8ff6577c24c286ae02ab735011970df
Image: registry.cn-beijing.aliyuncs.com/yunionio/telegraf-init:release-1.19.2-0
Image ID: registry.cn-beijing.aliyuncs.com/yunionio/telegraf-init@sha256:dbda0b59b2506e76fd33547de7e13bc701b6571b4134485d1d96493b269b770e
Port:
Host Port:
Command:
/bin/telegraf-init
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 11 Jan 2025 07:31:22 +0000
Finished: Sat, 11 Jan 2025 07:31:22 +0000
Ready: False
Restart Count: 2
Environment:
NODENAME: (v1:spec.nodeName)
INFLUXDB_URL: https://default-influxdb:30086
Mounts:
/etc/telegraf from etc-telegraf (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rjdfk (ro)
Containers:
telegraf:
Container ID:
Image: registry.cn-beijing.aliyuncs.com/yunionio/telegraf:release-1.19.2-9
Image ID:
Port:
Host Port:
Args:
/usr/bin/telegraf
-config
/etc/telegraf/telegraf.conf
-config-directory
/etc/telegraf/telegraf.d
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
HOST_ETC: /hostfs/etc
HOST_PROC: /hostfs/proc
HOST_SYS: /hostfs/sys
HOST_VAR: /hostfs/var
HOST_RUN: /hostfs/run
HOST_MOUNT_PREFIX: /hostfs
Mounts:
/etc/telegraf from etc-telegraf (rw)
/hostfs from root (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rjdfk (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
etc-telegraf:
Type: HostPath (bare host directory volume)
Path: /etc/telegraf
HostPathType: DirectoryOrCreate
root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType: Directory
kube-api-access-rjdfk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node-role.kubernetes.io/controlplane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message


Normal Scheduled 36s default-scheduler Successfully assigned onecloud/default-telegraf-d2wqh to h100
Normal Pulled 23s (x3 over 36s) kubelet Container image "registry.cn-beijing.aliyuncs.com/yunionio/telegraf-init:release-1.19.2-0" already present on machine
Normal Created 23s (x3 over 36s) kubelet Created container telegraf-init
Normal Started 22s (x3 over 36s) kubelet Started container telegraf-init
Warning BackOff 8s (x3 over 34s) kubelet Back-off restarting failed container telegraf-init in pod default-telegraf-d2wqh_onecloud(0e215719-6673-4a92-872e-0c93906c371d)
`

host

[warning 250111 07:30:43 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument start-host-ignore-sys-error
[error 2025-01-11 07:31:02 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:175)] no block device avaiable
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "cat " , error: exit status 1 , output: cat: '': No such file or directory
[error 2025-01-11 07:31:03 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:814)] exit status 1
[error 2025-01-11 07:31:03 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:826)] Failed to detect distribution info
[debug 2025-01-11 07:31:03 procutils.(*Command).Run(procutils.go:89)] Execute command "systemctl cat -- openvswitch" , error: exit status 1
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl status yunion-host-sdnagent" , error: exit status 4 , output: Unit yunion-host-sdnagent.service could not be found.
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled yunion-host-sdnagent" , error: exit status 1 , output: Failed to get unit file state for yunion-host-sdnagent.service: No such file or directory
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl status yunion-host-sdnagent" , error: exit status 4 , output: Unit yunion-host-sdnagent.service could not be found.
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled yunion-host-sdnagent" , error: exit status 1 , output: Failed to get unit file state for yunion-host-sdnagent.service: No such file or directory
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl status yunion-host-deployer" , error: exit status 4 , output: Unit yunion-host-deployer.service could not be found.
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled yunion-host-deployer" , error: exit status 1 , output: Failed to get unit file state for yunion-host-deployer.service: No such file or directory
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl status yunion-host-deployer" , error: exit status 4 , output: Unit yunion-host-deployer.service could not be found.
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled yunion-host-deployer" , error: exit status 1 , output: Failed to get unit file state for yunion-host-deployer.service: No such file or directory
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl status telegraf" , error: exit status 4 , output: Unit telegraf.service could not be found.
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled telegraf" , error: exit status 1 , output: Failed to get unit file state for telegraf.service: No such file or directory
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl status telegraf" , error: exit status 4 , output: Unit telegraf.service could not be found.
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled telegraf" , error: exit status 1 , output: Failed to get unit file state for telegraf.service: No such file or directory
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled openvswitch-switch" , error: exit status 1 , output: disabled
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled openvswitch-switch" , error: exit status 1 , output: disabled
[debug 2025-01-11 07:31:03 procutils.(*Command).Output(procutils.go:98)] Execute command "systemctl is-enabled openvswitch-switch" , error: exit status 1 , output: disabled
[error 2025-01-11 07:31:03 auth.(*authManager).startRefreshRevokeTokens(auth.go:193)] refreshRevokeTokens: No valid admin token credential
[debug 2025-01-11 07:31:04 procutils.(Command).Output(procutils.go:98)] Execute command "which python" , error: exit status 1 , output:
.......
processors":[174,366],"total_threads":2},{"id":15,"index":75,"logical_processors":[175,367],"total_threads":2},{"id":40,"index":76,"logical_processors":[176,368],"total_threads":2},{"id":41,"index":77,"logical_processors":[177,369],"total_threads":2},{"id":42,"index":78,"logical_processors":[178,370],"total_threads":2},{"id":43,"index":79,"logical_processors":[179,371],"total_threads":2},{"id":44,"index":80,"logical_processors":[180,372],"total_threads":2},{"id":45,"index":81,"logical_processors":[181,373],"total_threads":2},{"id":46,"index":82,"logical_processors":[182,374],"total_threads":2},{"id":47,"index":83,"logical_processors":[183,375],"total_threads":2},{"id":72,"index":84,"logical_processors":[184,376],"total_threads":2},{"id":73,"index":85,"logical_processors":[185,377],"total_threads":2},{"id":74,"index":86,"logical_processors":[186,378],"total_threads":2},{"id":75,"index":87,"logical_processors":[187,379],"total_threads":2},{"id":76,"index":88,"logical_processors":[188,380],"total_threads":2},{"id":77,"index":89,"logical_processors":[189,381],"total_threads":2},{"id":78,"index":90,"logical_processors":[190,382],"total_threads":2},{"id":79,"index":91,"logical_processors":[191,383],"total_threads":2},{"id":0,"index":92,"logical_processors":[288,96],"total_threads":2},{"id":1,"index":93,"logical_processors":[289,97],"total_threads":2},{"id":2,"index":94,"logical_processors":[290,98],"total_threads":2},{"id":3,"index":95,"logical_processors":[291,99],"total_threads":2}],"distances":[32,10],"id":1,"memory":{"supported_page_sizes":[1073741824,2097152],"total_physical_bytes":826781204480,"total_usable_bytes":811614797824}}]},"version":"03"},"version":"release/3.11.9(60778a6a7724122408)"}: {"error":{"class":"UnclassifiedError","code":500,"details":"TxExec: Error 1406: Data too long for column 'sys_info' at row 1","request":{"body":"{"host":{"meta":{"cpu_info":"{\"processors\":[{\"capabilities\":[\"fpu\",\"vme\",\"de\",\"pse\",....9(60778a6a7724122408)"}}","headers":{"Content-Length":"169586","Content-Type":"application/json","User-Agent":"yunioncloud-go/201708","X-Auth-Token":"
","X-Yunion-Parent-Id":"","X-Yunion-Peer-Service-Name":"host","X-Yunion-Remote-Addr":"default-region:30888","X-Yunion-Span-Id":"0","X-Yunion-Span-Name":"","X-Yunion-Strace-Debug":"true","X-Yunion-Strace-Id":"c010d555"},"method":"POST","url":"https://default-region:30888/zones/21d90935-8db8-4f3a-87f0-60581c7e6052/hosts"}}}

@callme80 callme80 added the question Further information is requested label Jan 11, 2025
@wanyaoqi
Copy link
Member

TxExec: Error 1406: Data too long for column 'sys_info' at row 1

主要应该是由于host 启动失败导致telegraf启动失败。
宿主机是什么配置的, lscpu 看下

@callme80
Copy link
Author

TxExec: Error 1406: Data too long for column 'sys_info' at row 1

主要应该是由于host 启动失败导致telegraf启动失败。 宿主机是什么配置的, lscpu 看下

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 112
On-line CPU(s) list: 0-111
Thread(s) per core: 2
Core(s) per socket: 28
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Stepping: 6
CPU MHz: 800.068
BogoMIPS: 4000.00
Virtualization: VT-x
L1d cache: 2.6 MiB
L1i cache: 1.8 MiB
L2 cache: 70 MiB
L3 cache: 84 MiB
NUMA node0 CPU(s): 0-27,56-83
NUMA node1 CPU(s): 28-55,84-111
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm con
stant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg
fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat
l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invp
cid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm

llc cqm_occup_llc cqm_mbm_total cqm_mbm_local wbnoinvd dtherm ida arat pln pts hwp_epp avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx51
2_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear pconfig flush_l1d arch_capabilities

@wanyaoqi
Copy link
Member

更新 sys_info 使用 longtext: #22041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants