When reindexing, verify all fields may not work as intended #5840

bernisys · 2024-09-30T11:35:38Z

Describe the bug
We have a JunOS query which should return the SPU utilization (you can check the file, i had given you and Sean the whole bunch of our queries and scripts - it is: resource/snmp_queries/juniper_spu_all.xml).
We have several cases, where this query is failing for some reason on newly integrated devices, and i get a suspicion that it might have to do with the device having only one single SPU installed. When spine is doing the re-caching checks, it somehow fails to query the only sub-OID ".0" ("0" is the only object-index in the SNMP tree - see manual query below).

I have tested multiple spine versions and it started to completely fail in version 1.2.22 and we now use 1.2.27
Up to 1.2.21 we still get the re-cache assert fail but the device is not completely dropped out.
Starting with 1.2.22 the whole device is marked with the "ignore" flag and polling stops for all other data sources as well.

In our case we use the index-OID ".1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3" which is also marked as "walk below in the fields section (does spine even check this? Shouldn't it use "walk" by default when checking an index?). And we use the OID parser to determine the index from the OID itself, even if the manual suggests that cacti can determine this on its own.

Here's the output of a snmpwalk on the tree .1.3.6.1.4.1.2636.3.39.1.12.1.1.1

.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.2.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.4.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.5.0 = Gauge32: 43
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.6.0 = Gauge32: 50
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.7.0 = Gauge32: 62914560
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.8.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.9.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.10.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.11.0 = STRING: "single"
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.12.0 = Gauge32: 50
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.13.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.14.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.15.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.16.0 = Gauge32: 24

Remember, "0" is the only instance - so it might look to cacti as if this is not a table, but a bunch of single objects.

And here is what cacti does once it reaches the "SPU" query ("interfaces" is re-cached perfectly fine!):

1727694394.408670 Total[0.4960] Device[5224] DEBUG: snmp_pdu_create(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408704 Total[0.4960] Device[5224] DEBUG: snmp_pdu_create(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408738 Total[0.4960] Device[5224] DEBUG: snmp_parse_oid(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408771 Total[0.4960] Device[5224] DEBUG: snmp_parse_oid(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408805 Total[0.4960] Device[5224] DEBUG: snmp_add_null_var(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408838 Total[0.4960] Device[5224] DEBUG: snmp_add_null_var(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408872 Total[0.4960] Device[5224] DEBUG: snmp_sess_sync_response(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408905 Total[0.4966] Device[5224] DEBUG: snmp_sess_sync_response(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
**1727694394.408939 Total[0.4966] ERROR: No such Instance for oid '1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3' for Device[5224] with Status[1]**
**1727694394.408972 Total[0.4966] Device[5224] HT[1] DQ[8] RECACHE OID: 1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3, (assert: 0 = output: U)**
**1727694394.409006 Total[0.4966] Device[5224] HT[1] DQ[8] RECACHE ASSERT FAILED: '0=U'**
1727694394.409040 Total[0.4966] WARNING: Skipped oid '.1.3.6.1.4.1.2636.3.39.1.12.1.4.1.1' for Device[5224] as host ignore flag is active
1727694394.409073 Total[0.4966] Device[5224] HT[1] DQ[9] RECACHE OID: .1.3.6.1.4.1.2636.3.39.1.12.1.4.1.1, (assert: 0 = output: (null))
1727694394.409106 Total[0.4966] Device[5224] HT[1] DQ[9] RECACHE ASSERT FAILED: '0=(null)'

Expected behavior

A subtree should be parsable even if the objects in the tree are ending in .0
spine should not mark the whole device failing if one query fails, it should continue with the queries that can be run

Server (please complete the following information):

OS: RHEL 7.9
Version 1.2.27

Compiling (please complete the following information):

compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
autoconf: GNU Autoconf 2.69.
glibc: glibc-2.17-260.el7_6.3
source: release 1.2.27 (starts from 1.2.22)

Additional context
Logs can be provided on demand.

The text was updated successfully, but these errors were encountered:

TheWitness · 2024-09-30T13:27:41Z

Upload the resource XML file.

bernisys · 2024-09-30T14:51:08Z

as mentioned, that's in the pack i uploaded a while ago, but here we go ...
juniper_spu_all.zip

TheWitness · 2024-09-30T15:43:50Z

@bernisys, what happens when you run the command below?

snmpwalk -c blah -v blah_blah 1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3 blahhostname

bernisys · 2024-09-30T17:29:11Z

well, see above output of the snmpwalk of the OID without .3 at the end
it returns the ...1.3.0 OID with its respective value, as one would expect

in other words: .1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3.0 = Gauge32: 0

TheWitness · 2024-09-30T18:50:58Z

That's what I would have expected. I'll have to take a look. This could be an issue in the core of the Cacti code. But the fact that it returned No SUCH OID means that it was likely using a snmpget, which makes no sense looking at the XML file.

bernisys · 2024-10-01T09:49:58Z

Exactly what i thought as well. At least for the index OID it should use "walk" by default, to overcome such kind of issues.
I thought too that it smells like cacti is runing a snmpget instead of walk, but wasn't sure so i didn't phrase it out yet.

bernisys · 2024-10-01T20:50:16Z

confirming: Fixed now.

TheWitness · 2024-10-02T12:59:15Z

Thanks @bernisys !

TheWitness transferred this issue from Cacti/spine Oct 1, 2024

TheWitness changed the title ~~Spine 1.2.22-27 fails a query during re-caching, setting the "ignore device" flag, which suppresses the polling completely~~ Reindex Method 'Verify all Fields' does not work for OID/INDEX parse method Oct 1, 2024

TheWitness added a commit that referenced this issue Oct 1, 2024

Fixing #5840 - Verify All Fields Not working with OID/INDEX parse method

20e8a73

TheWitness added this to the 1.2.28 milestone Oct 1, 2024

TheWitness added bug Undesired behaviour resolved A fixed issue confirmed Bug is confirm by dev team labels Oct 1, 2024

TheWitness added a commit that referenced this issue Oct 1, 2024

Fixing #5840 - Verify All Fields Not working with OID/INDEX parse method

493400c

TheWitness closed this as completed Oct 2, 2024

netniV changed the title ~~Reindex Method 'Verify all Fields' does not work for OID/INDEX parse method~~ When reindexing, verify all fields may not work as intended Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When reindexing, verify all fields may not work as intended #5840

When reindexing, verify all fields may not work as intended #5840

bernisys commented Sep 30, 2024

TheWitness commented Sep 30, 2024

bernisys commented Sep 30, 2024

TheWitness commented Sep 30, 2024

bernisys commented Sep 30, 2024

TheWitness commented Sep 30, 2024

bernisys commented Oct 1, 2024

bernisys commented Oct 1, 2024

TheWitness commented Oct 2, 2024

When reindexing, verify all fields may not work as intended #5840

When reindexing, verify all fields may not work as intended #5840

Comments

bernisys commented Sep 30, 2024

TheWitness commented Sep 30, 2024

bernisys commented Sep 30, 2024

TheWitness commented Sep 30, 2024

bernisys commented Sep 30, 2024

TheWitness commented Sep 30, 2024

bernisys commented Oct 1, 2024

bernisys commented Oct 1, 2024

TheWitness commented Oct 2, 2024