Various fixes and improvements regarding support of real-world MIBs #3

dcvmoole · 2024-08-27T13:19:46Z

Summary

The mibs.pysnmp.com collection of MIBs is a highly valuable test set for checking pysmi's support for real-world MIBs. In particular the pysmi post-v0.3.4 switch to Jinja2 template rendering appears to have caused a substantial regression, with the current pysmi version being able to compile and load only 65% of the MIBs in the mibs.pysnmp.com collection.

This PR fixes a series of small pysmi issues and adds a few new forms of leniency for somewhat broken MIBs. As a result of this PR, pysmi is now able to compile and load 84% of the MIBs in the mibs.pysnmp.com collection, with zero regressions. The included fixes also resolve a number of (open and recently closed) GitHub issues.

The full story

The background of this PR is simply that I wanted to use pysmi to prepare a series of private MIBs for use with pysnmp, but I ran into various problems that had mostly also been reported as pysnmp issues already. Now that pysnmp and pysmi are thankfully being actively maintained again, I decided to spend some spare time on addressing the problems I ran into myself. I ended up fixing several other issues as well.

At some point I realized the value of the resurrected mibs.pysnmp.com MIBs collection. As of writing, that collection, in the asn1/ directory, consists of 11961 MIB files¹. Alongside, the pysnmp/ directory contains the compiled versions of those MIBs, to the extent that they could at any point be compiled at all. That directory contains 10713 compiled files right now, and out of those, 9903 can be successfully loaded by pysnmp today. Based on that, I think it is fair to state that the best that pysmi has ever done, is fully and properly support 9903 of the collection's 11961 MIB files, or just under 83%. According to their headers, the overwhelming majority of the compiled files have been generated by pysmi 0.3.4.

However, trying to compile and load the same collection of input MIBs with the current pysmi version (v1.4.4), with a few necessary base changes from this PR², results in the following statistics:

MIB results: 11961 total, 7806 successful, 1231 failed to compile, 2600 failed to load, 254 failed on dependency, 70 failed for other reasons

These results are produced by a simple program I put together for the occasion, rather than by mibdump (which, most importantly, will not try to load the produced files). In any case, these statistics are the basis for my above claim that the current version of pysmi supports merely 65% of the MIBs (7806 out of 11961). The main cause of the regression compared to pysmi 0.3.4 appears to be the conversion to the use of Jinja2 templates of git-a42d1705. It appears that in several respects, that conversion was simply not fully finished.

With the fixes in this PR, which I offer for your consideration, the same test program produces the following output:

MIB results: 11961 total, 10058 successful, 1024 failed to compile, 708 failed to load, 101 failed on dependency, 70 failed for other reasons

As can be seen, the PR enables the use of 2252 additional MIBs from the collection, now totaling 84% (10058 out of 11961). The additional successes represent a strict improvement: there are no cases where a MIB could be compiled and loaded before this PR but not after. Some MIBs have changed between "failed to compile" and "failed to load" in either direction, but of course, from the user's perspective it is not important as to whether a MIB fails to compile or its compiled form fails to load.

I have done manual inspection of a sizable subset of the changes to the MIBs' .py files as generated before and after the PR, to make sure that the improvements are not due to parts of generated code being unintentionally omitted. After all, taken to the extreme, an empty Python file will load perfectly fine too..! All changes that I have checked, are however intentional improvements. Note that such manual inspection is feasible only because of one of the commits in this PR. Verifying all the produced changes was not feasible however, as the full delta is close to 250 KLoC.

Strictly going by numbers, the PR improves the support for the collection beyond what is found in the mibs.pysnmp.com pysnmp/ directory (10058 instead of 9903). That would suggest that this PR improves pysmi to a "MIB tolerance level" never seen before, which does sound nice. However, it should be noted that the latter set is not a strict subset of the former set: there are 38 MIBs whose current mibs.pysnmp.com Python files load successfully, yet are still not part of the "success set" after this PR. Curiously, at least one case (Wellfleet-MIB) is because the ASN.1 MIB is different in contents from the MIB that was used to generate the compiled version. Many of the other cases can likely be resolved by adding a level of tolerance of capitalization differences in MIB file name versus internal MIB name; I have not looked into that yet.

For completeness: in the above results, "failed on dependency" refers to compile-time dependencies only, as load-time dependency failures are harder to filter out, so those are counted as load failures instead. The "failed for other reasons" category mainly covers cases where the compilation did not fail but also did not produce a Python file named after the input MIB file.

I have made available the test program itself as a gist. The program is overly conservative in that it basically starts with a clean slate for every single MIB. In both cases reported above, it was run against the head of the mibs.pysnmp.com repo, which is currently git-66c9072d. On my test system, a single full run takes about 3.5 hours.

The pull request

The PR itself is split up into small commits in order to facilitate reviewing. Needless to say, I would be happy to address feedback.

Test cases are added where applicable: this PR increases the number of test cases from 114 to 251. Overall, the unit test set remains woefully inadequate for catching regressions, but at least the changes of this PR are covered.

The commits have been subjected to the repository's pre-commit hooks, including Black. As a side note, I noticed that running Black on the entire source tree ends up changing a lexer regex in such a way that pysmi becomes completely non-functional. That should probably be addressed at some point.

The commits have not taken into account PyCharm's inspection facility. Even though pysmi has many annotations for that, those annotations also seem to be highly incomplete as they are. I am not a PyCharm user, but I did give its inspection a go; at the very least, it found no new errors.

The fixes in this PR fully resolve the following GitHub issues:

lextudio/pysnmp/issues/83 (open)
lextudio/pysnmp/issues/90 (closed but unaddressed)
lextudio/pysnmp/issues/91 (closed but unaddressed)
etingof/pysmi/issues/48 (open), as referenced in lextudio/pysnmp/issues/54 (open)

It is perhaps worth noting that several of the commits in the PR make adjustments to the JSON output format as well, as side effect of necessary changes at the intermediate code generation level. It is my view that development of pysmi cannot move ahead without occasionally making breaking changes to the JSON format, although in that regard the impact of this PR is fortunately very limited anyway.

Looking further ahead

This PR aims to cover much of the low-hanging fruit. With a few slightly more extensive pysmi changes, my current estimate is that it is possible to increase the number of fully supported MIBs from the mibs.pysnmp.com collection further by roughly 600 MIBs or about 5%, thereby bringing up the total to about 89%. Most of that increase would require two more future changes:

Many MIBs use DEFVAL values that violate the constraints of their corresponding SYNTAX types. In such cases, pyasn1 will fail to instantiate the corresponding object as created by the MIB's compiled pysnmp code, thus resulting in a load failure. If pysmi were to detect such cases itself, it can drop the DEFVAL value altogether (as it already does in some cases), thus generating pysnmp code that can be loaded. Doing so would require pysmi's symbol table stage to save information on all constraints, rather than just a subset as it does now. This change would likely add support for a little over 300 MIBs.
Many MIBs have broken imports. Some MIBs assume imports to be transitive (e.g., when MIB B imports symbol X from MIB A, MIB C expects to be able to import symbol X from MIB B). Some MIBs import symbols from the plain wrong MIB. Other MIBs import symbols that are not available anywhere, but are also not used. All these cases can be fixed with an extra import processing stage that modifies the import and symbol tables to redirect or remove imports. This change would add support for a little over 150 MIBs.

I'd certainly appreciate feedback as to whether such additional changes would be welcome, although I cannot make any promises myself. In any case, going substantially beyond the estimated 89% will be quite a bit harder, as that requires a next level of tolerance for problems with MIBs. Even then, getting anywhere near 100% is impossible without all processing stages (including the parser) being able to selectively discard any and all parts of the input that it cannot both validate and process. I would say that for pysmi, that is not a desirable goal.

A handful of files are actually not MIB files, but some accompanying files that were probably accidentally extracted from an archive along with actual MIBs. For the purpose of the analysis presented here, they do not have a substantial impact. ↩
This "baseline" run makes use of the first three changes from this PR. Without those changes, the output would be messed up by debug messages, the run would get stuck as soon as it gets to A3Com-products-MIB, and the resulting .py files could not easily be compared for regressions. I have confirmed in several ways that these three changes do not affect the baseline statistics in any other way. ↩

The pysmi library uses a proper logging facility to generate its debugging output in all cases, except for one location, which was changed by commit git-0b2378ec to use print(). The result of that change is that users of the library can no longer disable the output from that debugging statement, thereby requiring work-arounds whenever clean output is needed from the library. This commit changes back the debugging statement to use the logging facility as before.

A MIB may attempt to import itself, and multiple MIBs may attempt to import each other. The pysmi compiler deals well with such cases as long as the MIB *file* name matches the MIB name *within* the file. However, the latter name is used to track whether a specific MIB has already been loaded (or failed to load), which means that if that name differs from the MIB's file name, pysmi may end up trying to load the MIB with the same file name more than once, which results in an endless loop of trying to load the same MIB file(s) repeatedly. In such cases, the "mibdump" tool will simply end up hanging forever. This commit adds basic protection for such freezes by ensuring that every MIB file is loaded at most once. Depending on the specific MIBs, circular imports may result in either successful or failed compilation. That is: this commit does not aim to ensure the best possible outcome in such cases, as they are rare and not generally supported. However, this commit does ensure that such cases do not block progress.

For development, debugging, and regression testing purposes, it is useful for code generation to be deterministic, in that when the same (unchanged) MIB is compiled to Python code multiple times, the same code is generated each time--at least, aside from the header comments. The pysmi code generator currently already gets very close to that goal, but falls short when it comes to the generated lists of symbols per imported module. This commit makes generation of those lists deterministic as well.

Due to being misnamed, the set of basic unit test cases for the TEXTUAL-CONVENTION macro were not being executed. From the way they were implemented, it is clear they were never tried in the first place, either. This commit rewrites the unit tests to work properly. As it turns out, they also reveal a bug: the Python code generation unintentionally omitted the REFERENCE clause from the generated code. This commit also fixes that aspect.

The switch to Jinja2 based code generation in git-a42d170 inadvertently broke support for DEFVAL clauses of object types with syntax Bits. Ever since, MIB files that made use of DEFVAL for Bits types would cause pysmi's pysnmp code generation process to fail on an exception. JSON code generation did work, but did not produce the intended output. This commit fixes the DEFVAL support for Bits, thereby making various previously uncompilable MIB files not only compile, but also produce the correct default values for Bits objects for objects upon use. The commit also adds a pair of regression test cases. This commit leaves open the issue that a DEFVAL clause with an empty bit set will result in object instances that have no default value. Please note that as indicated above, this commit makes a small breaking change to the JSON output format structure for Bits DEFVAL values.

Previously, DEFVAL values of 0 (zero) for integer syntaxes would be ignored, thereby causing instantiated objects to have no default value rather than the default value of 0. Similarly, Empty DEFVAL bit sets would be ignored, causing corresponding Bits-type object instances to have no default value rather than an empty bit set as default value. These problems, which have been in place since the inception of pysmi, appear to be due mainly to an if-condition that intended to test for an altogether different case, namely invalid object identifier values. This commit addresses both above problems, with changes to the parser. The test set is extended with new tests to cover the three involved cases. An existing test needed to be fixed as well, by removing its DEFVAL clause: the change revealed a latent bug in that test, which could also be triggered previously by using a non-zero DEFVAL value.

For DEFVAL clauses in objects of syntax OBJECT IDENTIFIER, the pysnmp code generation process did does not generate the intended code, which resulted in nonsensical actual default values for such object instances, instead of usable OIDs. This issue has been in place since the switch to Jinja2 based code generation introduced by git-a42d170, which seems to expect named OIDs whereas this case involves only numeric OIDs, thus resulting in an unintended "stringification" of a Python tuple. This commit fixes the problem and adds a basic test case.

An AUGMENTS clause causes the pysnmp code generation to issue a call on the object being augmented, from the context of the object performing the augumentation. However, if the former object is defined later on in the MIB, it does not yet exist when being augmented. The result is that loading the generated code fails on a NameError exception. This commit fixes the problem by issuing the augmentation-related calls in a second pass, after all the objects have been defined. During the second pass, the object being augmented is guaranteed to exist.

MIB symbol names that contain hyphens cannot be used directly as identifier names in the generated Python code. Therefore, such hyphens are replaced with underscores when generating code for various MIB nodes. However, the generated export table also exports the symbols using the "Pythonized" name as symbol name, rather than the original name. Before the Jinja2 rewrite of git-a42d170, the original symbol names were then made available by generating calls to setLabel() on MIB nodes. The label generation got lost as part of that rewrite, with only some loose code fragments remaining. That caused import errors for various MIBs. Restoring the setLabel() calls would not be difficult. However, there is no reason to export symbols using their Pythonized name at all: instead, they can be exported by means of their original names from the MIB files. The effect is the same, with less code and fewer exceptions to be made. In addition, a next commit will enable exporting types. Types are not covered by the existing setLabel() facility, and therefore would always have required being exported by their real names. Therefore, this commit exports the MIB nodes with their original names, so that those nodes can be imported using those same names, as was already generally expected. Therefore, this commit gets rid of the aforementioned import errors, mainly by simplifying existing code. As part of this change, the Jinja2 template is cleaned up, as it performed many hyphen-to-underscore conversions that were entirely redundant: with imported symbols as only exception, the symbol names are always already converted to their Python versions before being passed to the template. The template clean-up is necessary to avoid confusing the reader as to which names are "Pythonized" or not already, and also serves as the basis for another next commit (regarding Python keywords). The existing tests are extended with new test cases to ensure that the use of hyphens results in generation of proper Python code and that the resulting MIB nodes' getLabel() method calls return the original names. Please note that this commit makes a breaking change to the JSON output: the "name" entries now contain the original, un-Pythonized version of their corresponding entities. Note that in the previous JSON format, the original node name from the MIB was not available at all.

Many MIBs rely on the ability to import type declarations from other MIBs. For unknown reasons, the conversion to Jinja2 dropped the export of type declarations, therefore causing those MIBs to fail to load. That omission was likely unintentional; there are no unit tests that cover this aspect. This commit enables exporting type declarations from MIBs again, so that those can be imported from other MIBS as well. The commit also adds unit tests to avoid future regressions. Note that such tests necessarily rely on a bit more functionality from pysnmp.

Even though RFC 2579 Sec. 3.5 explicitly forbids using a Textual Convention as SYNTAX value of another Textual Convention, deriving textual conventions from other textual conventions is widely used in practice. However, for such cases, pysmi currently generates Python code that cannot be loaded, as it leads to "Method Resolution Order" errors. The reason for those errors is that classes for textual conventions derive from both the TextualConvention base class and the underlying syntax class--in that order. In these "stacked" cases, the underlying syntax class itself also derives from the TextualConvention base class. This commit resolves those errors by changing the order of the base classes. The test set is extended to check various cases that should now be working properly as a result.

This commit fixes issues related to the use of Python keywords in MIB files, which cause problems in the generated Python code if left as is. Support for translating such reserved symbols to non-reserved symbols was already already in place to some degree, but inadequate, as there were still issues for both defining and importing symbols. For defined symbols, this change replaces the incomplete solution from git-b091479b with the original fix of git-00177b06, which was still in place for symtable.py but got dropped for intermediate.py as part of the Jinja2 rewrite of git-a42d170. For imported symbols, this change centralizes the symbol replacement and ensures that the Jinja2 template generates an import table with local symbol names that have been subject to appropriate replacement. The commit adds various new tests for both cases.

Free-form text values, such as DESCRIPTION strings, may contain a few characters that can be problematic when transformed to Python strings: 1. Line terminator characters may be used in any textual strings, as per RFC 2578 Sec. 3.1.1. The Jinja2 template uses triple-quote strings to support such characters in some of those textual strings, but not in all of them. When line terminator characters are used within strings that are transformed to single-quote Python strings, the resulting Python code can not be compiled. 2. Backslash characters, which are similarly allowed in such textual strings, are interpreted within Python strings. They should instead be converted to literal characters. Note that the final character of a string may be a backslash, so use of Python's "r" string prefix does not fully solve this problem. This commit adds a Jinja2 filter that can be used to convert any MIB- provided textual string to a Python string with proper quoting and escaping of backslashes. That new filter is now used for all textual strings. To be exact: it is now used for all strings that are ultimately obtained via the QUOTED_STRING parser token, which is (or at least should be) the only way through which free-form text values can make their way from input MIBs into the code generator. As direct effect of this change, the proper functionality of mibdump's "--keep-texts-layout" switch is restored. Since the switch to Jinja2 of commit git-a42d1705, triple-quoted lines had an extra line terminator character added at the end, and were always getting word-wrapped. Both aspects are dropped with this change, so that "--keep-texts-layout" results in a compiled MIB that perfectly preserves the original text (aside from the original line terminator encodings, i.e., LF vs CR/LF). The existing tests have been changed to reflect the removal of the extra line terminator character as added with git-a42d1705. A few extra representative tests are added to ensure that the above two cases are handled properly in general and that the implementation of the "--keep-texts-layout" switch functions as expected.

As per RFC 2578, conceptual-table object types (i.e., xyzTable) are expected to have a "SEQUENCE OF" syntax whose type matches the syntax of the table row (i.e., of xyzEntry). The pysmi symbol table building procedure relies on that fact, and currently rejects any MIBs that do not conform to that rule. However, there are several real-world MIBs that have typos in their "SEQUENCE OF" type names. This commit adds a leniency exception to the code in order to allow MIBs with such typos to be compiled and loaded after all. The exception is applied only when compiling the MIB would certainly fail otherwise, so compilation of valid MIBs is guaranteed not to be affected by this change. The exception requires keeping track of SEQUENCE types, but as non-SEQUENCE types are already tracked as well, this is not considered a substantial resource drain. A test case is added to verify the leniency.

SMIv2 requires that an INDEX clause lists a series of object-type names. Unlike SMIv2 however, SMIv1 (as per RFC 1212 Sec. 2) also supports that such a series contains certain index syntaxes instead of (or in addition to) object-type names. For example, SMIv1 allows "INDEX { INTEGER }". While pysmi's parser supports the latter case, its code generator has support in place in its symbol table builder only, with the intermediate module merely having a few loose and non-functional code fragments with the remainder being left as a to-do item. This commit finishes the support for SMIv1 index syntaxes, by properly generating "fake column" object-type instances for the index syntaxes, as was likely the original intention as well. A few duplicate class variables in the symbol table and intermediate modules are moved to the base class, as successful Python code generation relies on the fact that both subclasses use exactly the same values for those. While there, a small set of unused and obsolete class variables is removed altogether. The test set is extended with a new test module that covers the new SMIv1-only functionality.

As part of pysmi's code generation, various imports are rewritten from SMIv1 to SMIv2, so as to simplify the way in which standard MIBs are handled. However, an apparent copy-paste error prevents that imports from RFC1158-MIB are being rewritten as intended. The result is that MIBs importing from RFC1158-MIB could be compiled but not loaded. This commit fixes the copy-paste error, and adds a few basic test cases to ensure that each group of rewrites is working as intended.

Omitting the IMPORTS statement is valid, but pysmi does not deal with this case as it should. When there is an IMPORTS statement, pysmi extends the parsed imports table with necessary extra imports. When the IMPORTS statement is omitted however, that extension is applied to a temporary throw-away import table instead. For example, when a MIB without an IMPORTS table uses the INTEGER type, pysmi rewrites INTEGER to Integer32 as usual. However, since the MIB's imports table is not extended in this case, Integer32 itself is never imported from SNMPv2-SMI, resulting in a load-time error. This commit ensures that there is always a parsed import table, even if it is empty. The extensions are then applied to it in all cases, and the above example works as intended. A test case is added as well.

This fix is specifically for WIENER-CRATE-MIB, which uses a "SEQUENCE" declaration that includes a table object symbol. As a result, the corresponding object type is considered to be a table column, resulting in generated Python code that cannot be loaded. This commit adds an extra restriction for object types to be considered table columns, namely that they must not have a "SEQUENCE OF" type syntax. In other words, object types that are tables are no longer ever considered to be table columns. That restriction allows WIENER-CRATE-MIB to be loaded properly. The commit also adds a test case, which consists of a simplified version of the construction found in WIENER-CRATE-MIB.

CLAassistant · 2024-08-27T13:19:54Z

All committers have signed the CLA.

dcvmoole · 2024-08-27T14:00:54Z

I have read the CLA Document and I hereby sign the CLA

dcvmoole · 2024-08-27T14:01:01Z

I have read the Code of Conduct and I hereby accept the Terms

dcvmoole · 2024-08-27T14:05:02Z

I have read the Code of Conduct and I hereby accept the Terms

..that does not seem to have had the intended effect, but I suppose it is the thought that counts.

lextm · 2024-09-02T20:41:22Z

Thanks. It took us a while to review the changes you made, and now we are happy to accept the tremendous contribution.

Don't worry about the breaking changes (like changes in JSON output). We can bump the version number to warn users of such.

dcvmoole · 2024-09-03T07:43:58Z

Thank you for reviewing and merging the changes!

dcvmoole added 18 commits August 24, 2024 21:12

lextm added the enhancement New feature or request label Sep 2, 2024

lextm merged commit cddb4fc into lextudio:main Sep 2, 2024
17 of 19 checks passed

dcvmoole deleted the improved-mib-support branch September 3, 2024 07:44

dcvmoole mentioned this pull request Sep 3, 2024

Fix unit test for Python 3.13 #4

Merged

dcvmoole mentioned this pull request Oct 21, 2024

Fix for named values of BITS used via types #6

Merged

dcvmoole mentioned this pull request Oct 29, 2024

Fix support for display hints being applied #7

Merged

dcvmoole mentioned this pull request Nov 1, 2024

Fixes to allow for regeneration of pysnmp base MIBs #8

Merged

dcvmoole mentioned this pull request Nov 18, 2024

DEFVAL improvements #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various fixes and improvements regarding support of real-world MIBs #3

Various fixes and improvements regarding support of real-world MIBs #3

dcvmoole commented Aug 27, 2024

CLAassistant commented Aug 27, 2024 •

edited

Loading

dcvmoole commented Aug 27, 2024

dcvmoole commented Aug 27, 2024

dcvmoole commented Aug 27, 2024

lextm commented Sep 2, 2024

dcvmoole commented Sep 3, 2024

Various fixes and improvements regarding support of real-world MIBs #3

Various fixes and improvements regarding support of real-world MIBs #3

Conversation

dcvmoole commented Aug 27, 2024

Summary

The full story

The pull request

Looking further ahead

Footnotes

CLAassistant commented Aug 27, 2024 • edited Loading

dcvmoole commented Aug 27, 2024

dcvmoole commented Aug 27, 2024

dcvmoole commented Aug 27, 2024

lextm commented Sep 2, 2024

dcvmoole commented Sep 3, 2024

CLAassistant commented Aug 27, 2024 •

edited

Loading