-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txscript: Increase opcode coverage in reference script tests #1331
Comments
I'd like to work on this. |
Please limit it to a single PR per opcode / opcode group as I did in the aforementioned PRs. It's much easier to deal with targeted PRs than massive ones that reach across every opcode. Also, please make sure that all of the new tests added conform to the clean stack requirements. I'd ultimately like to remove that flag and would prefer any new tests not make that more work than it will already be to transform all of the existing ones that don't conform. |
This has been implemented via the various linked PRs. |
In the future, there will be new script versions which build on the current script version with various improvements. Some examples might be the improved signature hash algorithm proposed in #950 and improvements to the semantics of various opcodes.
This implies a comprehensive set of reference tests for the current script engine version would be extremely beneficial to use a base for future versions to build on. While it is true that the current tests in
scripts_tests.json
already do have a respectable level of coverage, many of them originally came from Bitcoin, are pretty poorly organized, and are not exhaustive of each opcode.This should be remedied by individually examining every opcode (or at least opcode group in the case of data pushes) and providing the same treatment as I did in PRs like #1288, #1289, #1290, and #1291. However, it should be noted that those PRs have the positive and negative tests split across
script_valid.json
andscript_invalid.json
, while both the negative and positive test cases have been combined as of #1320, which have been significantly improved in terms of added detection of specific reasons why the test(s) failed versus just checking that it failed to execute properly.This is important, because, while there is typically no difference in terms of scripts being rejected regardless of whether they finished with a false item on the stack or encountered an error which resulted in early exit, since neither result in successful execution, it does make a difference in ensuring the tests are actually testing what they are intended to test, and it is also possible to distinguish between encountering an error and a false item on the stack via the
NOT
opcode in various circumstances.Ideally, the final result should provide the coverage for each opcode in order according to their numeric value as much as possible so that it is easy and logical to find the tests associated with each opcode and then specific execution. Some exceptions might be if a test relies on the correctness of another opcode, it would be ideal to ensure the specific semantics required by the test in question are preceded (within the same section) by a targeted test which proves that semantic in order to help avoiding falsely attributing a failure to the wrong opcode.
Every test should provide a comment which discusses the specific rule or semantic that is being tested, similar to what I did in the aforementioned PRs.
Next, tests which involve checking overall script engine rules and semantics which are not tied to particular opcodes such as max opcodes per script limits, max script size, max element size pushes, and max stack elements should ideally be located after all of the opcodes themselves have individually been tested. The max limits are already pretty well tested as of recent PRs, however they will undoubtedly need to be relocated as discussed.
Finally, all tests should make use of the newly added repetition syntax I added in PRs #1299 and #1300.
The text was updated successfully, but these errors were encountered: