Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not parse cibadmin status from XML: strconv.ParseInt: parsing \"0s\": invalid syntax #138

Closed
lotusnoir opened this issue Mar 4, 2020 · 6 comments · Fixed by #140
Closed
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@lotusnoir
Copy link

Hello and thanks for your work
i have this message on the logs when i try to get the metrics from cibadmin, here is the XML

<cib crm_feature_set="3.1.0" validate-with="pacemaker-3.2" epoch="32" num_updates="1" admin_epoch="0" cib-last-written="Wed Mar 4 14:55:08 2020" update-origin="test1.test.com" update-client="crm_attribute" update-user="root" have-quorum="1" dc-uuid="2"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.0.1-9e909a5bdd"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/> <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="debian"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> <nvpair id="cib-bootstrap-options-start-failure-is-fatal" name="start-failure-is-fatal" value="false"/> <nvpair id="cib-bootstrap-options-pe-warn-series-max" name="pe-warn-series-max" value="1000"/> <nvpair id="cib-bootstrap-options-pe-input-series-max" name="pe-input-series-max" value="1000"/> <nvpair id="cib-bootstrap-options-pe-error-series-max" name="pe-error-series-max" value="1000"/> <nvpair id="cib-bootstrap-options-cluster-recheck-interval" name="cluster-recheck-interval" value="5min"/> </cluster_property_set> </crm_config> <nodes> <node id="1" uname="test2.test.com"> <instance_attributes id="nodes-1"> <nvpair id="nodes-1-master-postgresql_service" name="master-postgresql_service" value="-1000"/> <nvpair id="nodes-1-standby" name="standby" value="on"/> </instance_attributes> </node> <node id="2" uname="test1.test.com"> <instance_attributes id="nodes-2"> <nvpair id="nodes-2-master-postgresql_service" name="master-postgresql_service" value="1001"/> <nvpair id="nodes-2-standby" name="standby" value="off"/> </instance_attributes> </node> </nodes> <resources> <primitive class="ocf" id="postgresql_vip" provider="heartbeat" type="IPaddr2"> <instance_attributes id="postgresql_vip-instance_attributes"> <nvpair id="postgresql_vip-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/> <nvpair id="postgresql_vip-instance_attributes-ip" name="ip" value="10.64.37.202"/> </instance_attributes> <operations> <op id="postgresql_vip-monitor-interval-30" interval="30" name="monitor"/> <op id="postgresql_vip-start-interval-0s" interval="0s" name="start" timeout="20s"/> <op id="postgresql_vip-stop-interval-0s" interval="0s" name="stop" timeout="20s"/> </operations> </primitive> <clone id="postgresql_service-clone"> <primitive class="ocf" id="postgresql_service" provider="heartbeat" type="pgsqlms"> <instance_attributes id="postgresql_service-instance_attributes"> <nvpair id="postgresql_service-instance_attributes-bindir" name="bindir" value="/usr/lib/postgresql/11/bin"/> <nvpair id="postgresql_service-instance_attributes-datadir" name="datadir" value="/var/lib/postgresql/11/main"/> <nvpair id="postgresql_service-instance_attributes-pgdata" name="pgdata" value="/etc/postgresql/11/main"/> <nvpair id="postgresql_service-instance_attributes-pghost" name="pghost" value="/var/run/postgresql"/> <nvpair id="postgresql_service-instance_attributes-recovery_template" name="recovery_template" value="/etc/postgresql/11/main/recovery.conf.pcmk"/> </instance_attributes> <operations> <op id="postgresql_service-demote-interval-0s" interval="0s" name="demote" timeout="120s"/> <op id="postgresql_service-methods-interval-0s" interval="0s" name="methods" timeout="5"/> <op id="postgresql_service-monitor-interval-15s" interval="15s" name="monitor" role="Master" timeout="10s"/> <op id="postgresql_service-monitor-interval-16s" interval="16s" name="monitor" role="Slave" timeout="10s"/> <op id="postgresql_service-notify-interval-0s" interval="0s" name="notify" timeout="60s"/> <op id="postgresql_service-promote-interval-0s" interval="0s" name="promote" timeout="30s"/> <op id="postgresql_service-reload-interval-0s" interval="0s" name="reload" timeout="20"/> <op id="postgresql_service-start-interval-0s" interval="0s" name="start" timeout="60s"/> <op id="postgresql_service-stop-interval-0s" interval="0s" name="stop" timeout="60s"/> </operations> </primitive> <meta_attributes id="postgresql_service-clone-meta_attributes"> <nvpair id="postgresql_service-clone-meta_attributes-master-max" name="master-max" value="1"/> <nvpair id="postgresql_service-clone-meta_attributes-notify" name="notify" value="true"/> <nvpair id="postgresql_service-clone-meta_attributes-promotable" name="promotable" value="true"/> </meta_attributes> </clone> </resources> <constraints> <rsc_colocation id="colocation-postgresql_vip-postgresql_service-clone-INFINITY" rsc="postgresql_vip" rsc-role="Started" score="INFINITY" with-rsc="postgresql_service-clone" with-rsc-role="Master"/> <rsc_order first="postgresql_service-clone" first-action="promote" id="order-postgresql_service-clone-postgresql_vip-Mandatory" kind="Mandatory" symmetrical="false" then="postgresql_vip" then-action="start"/> <rsc_order first="postgresql_service-clone" first-action="demote" id="order-postgresql_service-clone-postgresql_vip-Mandatory-1" kind="Mandatory" symmetrical="false" then="postgresql_vip" then-action="stop"/> </constraints> <rsc_defaults> <meta_attributes id="rsc_defaults-options"> <nvpair id="rsc_defaults-options-migration-threshold" name="migration-threshold" value="5"/> <nvpair id="rsc_defaults-options-resource-stickiness" name="resource-stickiness" value="10"/> </meta_attributes> </rsc_defaults> </configuration> <status> <node_state id="1" uname="test2.test.com" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"> <lrm id="1"> <lrm_resources> <lrm_resource id="postgresql_vip" type="IPaddr2" class="ocf" provider="heartbeat"> <lrm_rsc_op id="postgresql_vip_last_0" operation_key="postgresql_vip_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="4:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:0;4:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test2.test.com" call-id="34" rc-code="0" op-status="0" interval="0" last-run="1583330101" last-rc-change="1583330101" exec-time="69" queue-time="0" op-digest="7ac5fbe03467e5fe49c26f352cabea2a"/> <lrm_rsc_op id="postgresql_vip_monitor_30000" operation_key="postgresql_vip_monitor_30000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="6:4:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:0;6:4:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test2.test.com" call-id="26" rc-code="0" op-status="0" interval="30000" last-rc-change="1583330077" exec-time="49" queue-time="0" op-digest="b89b20ea44fe278083df720fbd3b0099"/> </lrm_resource> <lrm_resource id="postgresql_service" type="pgsqlms" class="ocf" provider="heartbeat"> <lrm_rsc_op id="postgresql_service_last_0" operation_key="postgresql_service_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="8:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:0;8:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test2.test.com" call-id="36" rc-code="0" op-status="0" interval="0" last-run="1583330101" last-rc-change="1583330101" exec-time="5280" queue-time="0" op-digest="280f07cd272b6d3a9e31773b252b9d6c" op-force-restart=" recovery_template datadir pgdata " op-restart-digest="0b2f08977172ea6234e59a5750d2611c"/> <lrm_rsc_op id="postgresql_service_monitor_15000" operation_key="postgresql_service_monitor_15000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="10:4:8:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:8;10:4:8:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test2.test.com" call-id="25" rc-code="8" op-status="0" interval="15000" last-rc-change="1583330078" exec-time="323" queue-time="0" op-digest="ae92b0e10f1578a9b61d24b7bff23308"/> </lrm_resource> </lrm_resources> </lrm> </node_state> <node_state id="2" uname="test1.test.com" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"> <lrm id="2"> <lrm_resources> <lrm_resource id="postgresql_vip" type="IPaddr2" class="ocf" provider="heartbeat"> <lrm_rsc_op id="postgresql_vip_last_0" operation_key="postgresql_vip_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="5:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:0;5:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test1.test.com" call-id="38" rc-code="0" op-status="0" interval="0" last-run="1583330108" last-rc-change="1583330108" exec-time="73" queue-time="1" op-digest="7ac5fbe03467e5fe49c26f352cabea2a"/> <lrm_rsc_op id="postgresql_vip_monitor_30000" operation_key="postgresql_vip_monitor_30000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="6:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:0;6:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test1.test.com" call-id="40" rc-code="0" op-status="0" interval="30000" last-rc-change="1583330108" exec-time="56" queue-time="0" op-digest="b89b20ea44fe278083df720fbd3b0099"/> </lrm_resource> <lrm_resource id="postgresql_service" type="pgsqlms" class="ocf" provider="heartbeat"> <lrm_rsc_op id="postgresql_service_last_0" operation_key="postgresql_service_promote_0" operation="promote" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="11:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:0;11:8:0:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test1.test.com" call-id="36" rc-code="0" op-status="0" interval="0" last-run="1583330107" last-rc-change="1583330107" exec-time="453" queue-time="0" op-digest="280f07cd272b6d3a9e31773b252b9d6c" op-force-restart=" recovery_template datadir pgdata " op-restart-digest="0b2f08977172ea6234e59a5750d2611c"/> <lrm_rsc_op id="postgresql_service_monitor_15000" operation_key="postgresql_service_monitor_15000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.1.0" transition-key="12:8:8:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" transition-magic="0:8;12:8:8:7d67f088-8059-4710-b2cc-c1f7e7f58d4c" exit-reason="" on_node="test1.test.com" call-id="39" rc-code="8" op-status="0" interval="15000" last-rc-change="1583330108" exec-time="327" queue-time="0" op-digest="ae92b0e10f1578a9b61d24b7bff23308"/> </lrm_resource> </lrm_resources> </lrm> </node_state> </status> </cib>

I tried to redeploy changing the value in time i add like 30s to 30 but there is always the default one that i cannot set without "s".
How to fix that ?

@MalloZup
Copy link
Contributor

MalloZup commented Mar 4, 2020

@lotusnoir hi! Thx for issue!

Could you be more precise on your issue? On which OS and version are you running the exporter? ANd also which packages are running in this host? (pacemaker etc versions.)

So if I'm understand correctly, you run the exporter and you have this message in your logs..

Which metric are you looking for and why do you want to change the seconds? Tia and feel free to ping me .

@stefanotorresi
Copy link
Member

stefanotorresi commented Mar 4, 2020

@MalloZup I think this is a problem with these two fields:

Interval int `xml:"interval,attr"`
Timeout int `xml:"timeout,attr"`

These are the only ones we try to parse as int, but if the input is something like 30s Go cannot unmarshall them.

@MalloZup
Copy link
Contributor

MalloZup commented Mar 5, 2020

Ok but I still don't get the context of the problem and user issue

@stefanotorresi
Copy link
Member

stefanotorresi commented Mar 5, 2020

We just need to use string as the type for those two fields, instead of int. We don't actually use them anywhere at the moment, so no further processing is needed. In the future, we might need to convert strings like "30s" to the proper time duration type.

@lotusnoir
Copy link
Author

lotusnoir commented Mar 6, 2020

Hello, soryy for the delay
I'm running into a debian buster

ii corosync 3.0.1-2 amd64 cluster engine daemon and utilities
ii crmsh 4.0.0~git20190108.3d56538-3 all CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 3.0.1-2 amd64 cluster engine common library
ii pacemaker 2.0.1-5 amd64 cluster resource manager
ii pacemaker-cli-utils 2.0.1-5 amd64 cluster resource manager command line utilities
ii pacemaker-common 2.0.1-5 all cluster resource manager common files
ii pacemaker-resource-agents 2.0.1-5 all cluster resource manager general resource agents
ii pcs 0.10.1-2 all Pacemaker Configuration System
ii resource-agents 1:4.2.0-2 amd64 Cluster Resource Agents
ii resource-agents-paf 2.2.1-1 all PostgreSQL resource agent for Pacemaker

After starting the process i get this warning in syslog:

Mar 6 16:27:55 test.test.com ha_cluster_exporter[15181]: time="2020-03-06T16:27:55+01:00" level=warning msg="could not parse cibadmin status from XML: strconv.ParseInt: parsing "30s": invalid syntax"
Mar 6 16:28:10 horus-pgsql-01.prod.kosc.net ha_cluster_exporter[15181]: time="2020-03-06T16:28:10+01:00" level=warning msg="Error while retrieving drbd infos exit status 20"
Mar 6 16:28:10 test.test.com ha_cluster_exporter[15181]: time="2020-03-06T16:28:10+01:00" level=warning msg="cannot parse ring status: corosync-cfgtool returned unexpected output: Printing link status.\nLocal node ID 1\nLINK ID 0\n\taddr\t= 10.64.37.110\n\tstatus\t= OK\n"

I'm looking to get the metrics pacemaker in order to check the status of the cluster
I think @stefanotorresi is right, just changing the type of this 2 fields should resolve the issue.

When i try to change i get error on checks:

--- FAIL: TestParse (0.01s)
parser_test.go:37:
Error Trace: parser_test.go:37
Error: Not equal:
expected: int(0)
actual : string("0")
Test: TestParse
parser_test.go:38:
Error Trace: parser_test.go:38
Error: Not equal:
expected: int(3600)
actual : string("3600")
Test: TestParse
parser_test.go:41:
......

Regards

@MalloZup
Copy link
Contributor

MalloZup commented Mar 6, 2020

thx @lotusnoir for info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants