Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When reindexing, verify all fields may not work as intended #5840

Closed
bernisys opened this issue Sep 30, 2024 · 8 comments
Closed

When reindexing, verify all fields may not work as intended #5840

bernisys opened this issue Sep 30, 2024 · 8 comments
Labels
bug Undesired behaviour confirmed Bug is confirm by dev team resolved A fixed issue
Milestone

Comments

@bernisys
Copy link
Contributor

Describe the bug
We have a JunOS query which should return the SPU utilization (you can check the file, i had given you and Sean the whole bunch of our queries and scripts - it is: resource/snmp_queries/juniper_spu_all.xml).
We have several cases, where this query is failing for some reason on newly integrated devices, and i get a suspicion that it might have to do with the device having only one single SPU installed. When spine is doing the re-caching checks, it somehow fails to query the only sub-OID ".0" ("0" is the only object-index in the SNMP tree - see manual query below).

I have tested multiple spine versions and it started to completely fail in version 1.2.22 and we now use 1.2.27
Up to 1.2.21 we still get the re-cache assert fail but the device is not completely dropped out.
Starting with 1.2.22 the whole device is marked with the "ignore" flag and polling stops for all other data sources as well.

In our case we use the index-OID ".1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3" which is also marked as "walk below in the fields section (does spine even check this? Shouldn't it use "walk" by default when checking an index?). And we use the OID parser to determine the index from the OID itself, even if the manual suggests that cacti can determine this on its own.

Here's the output of a snmpwalk on the tree .1.3.6.1.4.1.2636.3.39.1.12.1.1.1

.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.2.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.4.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.5.0 = Gauge32: 43
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.6.0 = Gauge32: 50
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.7.0 = Gauge32: 62914560
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.8.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.9.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.10.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.11.0 = STRING: "single"
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.12.0 = Gauge32: 50
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.13.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.14.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.15.0 = Gauge32: 0
.1.3.6.1.4.1.2636.3.39.1.12.1.1.1.16.0 = Gauge32: 24

Remember, "0" is the only instance - so it might look to cacti as if this is not a table, but a bunch of single objects.

And here is what cacti does once it reaches the "SPU" query ("interfaces" is re-cached perfectly fine!):

1727694394.408670 Total[0.4960] Device[5224] DEBUG: snmp_pdu_create(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408704 Total[0.4960] Device[5224] DEBUG: snmp_pdu_create(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408738 Total[0.4960] Device[5224] DEBUG: snmp_parse_oid(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408771 Total[0.4960] Device[5224] DEBUG: snmp_parse_oid(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408805 Total[0.4960] Device[5224] DEBUG: snmp_add_null_var(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408838 Total[0.4960] Device[5224] DEBUG: snmp_add_null_var(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
1727694394.408872 Total[0.4960] Device[5224] DEBUG: snmp_sess_sync_response(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3)
1727694394.408905 Total[0.4966] Device[5224] DEBUG: snmp_sess_sync_response(1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3) [complete]
**1727694394.408939 Total[0.4966] ERROR: No such Instance for oid '1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3' for Device[5224] with Status[1]**
**1727694394.408972 Total[0.4966] Device[5224] HT[1] DQ[8] RECACHE OID: 1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3, (assert: 0 = output: U)**
**1727694394.409006 Total[0.4966] Device[5224] HT[1] DQ[8] RECACHE ASSERT FAILED: '0=U'**
1727694394.409040 Total[0.4966] WARNING: Skipped oid '.1.3.6.1.4.1.2636.3.39.1.12.1.4.1.1' for Device[5224] as host ignore flag is active
1727694394.409073 Total[0.4966] Device[5224] HT[1] DQ[9] RECACHE OID: .1.3.6.1.4.1.2636.3.39.1.12.1.4.1.1, (assert: 0 = output: (null))
1727694394.409106 Total[0.4966] Device[5224] HT[1] DQ[9] RECACHE ASSERT FAILED: '0=(null)'

Expected behavior

  • A subtree should be parsable even if the objects in the tree are ending in .0
  • spine should not mark the whole device failing if one query fails, it should continue with the queries that can be run

Server (please complete the following information):

  • OS: RHEL 7.9
  • Version 1.2.27

Compiling (please complete the following information):

  • compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
  • autoconf: GNU Autoconf 2.69.
  • glibc: glibc-2.17-260.el7_6.3
  • source: release 1.2.27 (starts from 1.2.22)

Additional context
Logs can be provided on demand.

@TheWitness
Copy link
Member

Upload the resource XML file.

@bernisys
Copy link
Contributor Author

as mentioned, that's in the pack i uploaded a while ago, but here we go ...
juniper_spu_all.zip

@TheWitness
Copy link
Member

@bernisys, what happens when you run the command below?

snmpwalk -c blah -v blah_blah 1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3 blahhostname

@bernisys
Copy link
Contributor Author

well, see above output of the snmpwalk of the OID without .3 at the end
it returns the ...1.3.0 OID with its respective value, as one would expect

in other words: .1.3.6.1.4.1.2636.3.39.1.12.1.1.1.3.0 = Gauge32: 0

@TheWitness
Copy link
Member

That's what I would have expected. I'll have to take a look. This could be an issue in the core of the Cacti code. But the fact that it returned No SUCH OID means that it was likely using a snmpget, which makes no sense looking at the XML file.

@bernisys
Copy link
Contributor Author

bernisys commented Oct 1, 2024

Exactly what i thought as well. At least for the index OID it should use "walk" by default, to overcome such kind of issues.
I thought too that it smells like cacti is runing a snmpget instead of walk, but wasn't sure so i didn't phrase it out yet.

@TheWitness TheWitness transferred this issue from Cacti/spine Oct 1, 2024
@TheWitness TheWitness changed the title Spine 1.2.22-27 fails a query during re-caching, setting the "ignore device" flag, which suppresses the polling completely Reindex Method 'Verify all Fields' does not work for OID/INDEX parse method Oct 1, 2024
@TheWitness TheWitness added this to the 1.2.28 milestone Oct 1, 2024
@TheWitness TheWitness added bug Undesired behaviour resolved A fixed issue confirmed Bug is confirm by dev team labels Oct 1, 2024
@bernisys
Copy link
Contributor Author

bernisys commented Oct 1, 2024

confirming: Fixed now.

@TheWitness
Copy link
Member

Thanks @bernisys !

@netniV netniV changed the title Reindex Method 'Verify all Fields' does not work for OID/INDEX parse method When reindexing, verify all fields may not work as intended Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Undesired behaviour confirmed Bug is confirm by dev team resolved A fixed issue
Projects
None yet
Development

No branches or pull requests

2 participants