-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser testing is incredibly inconvenient using Test262 #1356
Comments
For what it's worth, I do try to maintain test262-parser-tests, but haven't been porting tests from test262 proper (and am still undecided if and how to go about that), so it does tend to have less coverage especially of new features. I agree that it would be nice to not use eval in tests which test something else. In a lot of cases it would be enough to just split files into an
I don't know if this is worth it. I maintain a project which amounts to an implementation not supporting On the other hand, if (as I think *should be the case, whether or not it is) every test containing a call to |
You could probably find a way to use a variant of what I have (which generally works, but it's a friend who's been primarily using it). I'll also note that its highly Shift-specific nature makes it much more difficult to use for ESTree-based parsers.
This is probably among the weakest of mine, and when I initially thought about it (after typing it out), I almost removed it before filing the issue, since it didn't really affect parsers so much.
I was thinking that, too, except you'll run into issues with indirect |
Say more? I've run it against other parsers without problems; what's the Shift-specific nature you ran into? (I guess this might the wrong place for that discussion; please feel free to ping me on irc or open an issue there if you'd rather not continue here.) |
More just the fact Shift != ESTree, and the substantial difference between the two AST specs structurally. If you want to find me on Gitter, I can elaborate a little more on this if you'd like. |
Would it be worth revisiting the possibility of merging test262-parser-tests into test262? I'd be curious about any problems with test262-parser-tests too; it would probably be useful to explain here for others. |
I'll note that the Shift concerns I expressed earlier I was just echoing
another person, and the more I look at that repo, the more I see it as kind
of moot. (I've been half hand-holding certain parts of his parser-related
work already. :-()
My main issue personally holding me back from adopting test262-parser-tests
has just been the outdated nature of it. It'd also be nice if it offered
the ability to validate ASTs as well, but that's something apart from this
bug.
…On Fri, Nov 24, 2017, 02:04 Daniel Ehrenberg ***@***.***> wrote:
Would it be worth revisiting the possibility of merging
test262-parser-tests into test262? I'd be curious about any problems with
test262-parser-tests too; it would probably be useful to explain here for
others.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AERrBEqZWGT05igzFWKAXoN1rVf3i87tks5s5mpxgaJpZM4Qo8Uz>
.
|
I imagine it's hard to validate ASTs without picking one, which could be a lot of harness work to adapt to different projects with different ASTs. Maybe Syntax Error or no is a good start. |
Thanks for the detailed report, Isiah! I'm reading some frustration in the tone of your prose. That may be a fabrication on my end, but I can appreciate if you're feeling discouraged. Test262 wasn't created with parsers in mind, and you've had to put in a lot of extra work in order to benefit from those extra 300 tests. Over the past several years, we've been working to make up for this deficiency (see gh-196, gh-360, gh-382, gh-542, gh-655, gh-778, gh-1254), so the fact that we are able to discuss improvement like these at all is reflective of a fair amount of effort. We clearly still have some work ahead of us, but you can take this as a sure sign that the project maintainers are interested. It might also be encouraging for you to know that all of the troubled tests are named according to the "legacy" naming scheme. I point this out to demonstrate that the issues you've identified are technical debt and don't reflect the current standards for new contributions. Regarding your recommendations:
I am usually the first to argue in favor of "more metadata," but I've come to appreciate the maintenance burden that can carry. This isn't to say that we shouldn't introduce an
This could help, but it is a very long-term solution. As of today, the latest attempt at
We're already on-board with you there. As noted above, I consider the tests you've identified as technical debt. Your script may come in handy here. I'm sure there are other things you would have rather spent those three days doing, but you might be able to wring out some more value by using it to update the tests. What do you say? |
The best idea I've had for this is to provide both "maximally obvious" versions of all syntactically correct files, so that parsers can assert that the "obvious" version parses to "the same" AST modulo location differences as the primary version (where the terms in quotes are not particularly well defined). That's what test262-parser-tests currently does. |
@jugglinmike I appreciate your detailed response and all of your work on this issue. I agree with you in not immediately taking up most of the initial suggestions, and look forward to your continued work in adding more parser tests and facilitating adding more syntax error tests. I'm actually fine with leaving in complex, valid tests, but simpler tests could be a good addition as well. Previously, I thought there was an idea that, for each syntax error test, we'd have to be able to identify the line of spec that banned it (is this right?). This is a bit hard for a grammar, but I believe this was holding test262-parser-tests back from being merged into test262. What are your current thoughts on that question? |
You did catch on to my frustration correctly. I tried to highlight it while still keeping it civil and polite.
I'm fully aware it's a very long-term solution unlikely to be implemented soon beyond Babylon and friends. It's partially why I said "assuming they still return completion values" - I personally recall suggesting another interpretation method (for loops specifically - pushing to and returning an array rather than just the completion value) in es-discuss, but it's a bit buried.
The script itself won't exactly be useful for dealing with that (it just links tests to a parser and a testing framework), but the auxiliary data would at least give a guide for how to update them. I doubt you'd be able to simply run an automated script to convert them, either - the examples are all missing their frontmatter as well as most of the associated data, and some of the replacements are non-functional one-liners rather than the fully fledged tests they replaced. So I expect it'd take even longer to accomplish the rest. If someone else wants to, feel free, but I'm kind of burnt out on it. (I also decided to take a bit of a coding break afterwards. Between that and other things, it was well-needed.) |
Aw thanks, Dan!
From my perspective, Test262 benefits immensely from a strong file organization. Being able to definitively identify a single "correct" location for any given test helps maintainers avoid duplication and locate holes in coverage. These are some of the biggest challenges in maintaining a project with over 30,000 tests. It's why I personally have not been motivated by the occasional complaint from contributors about the extra effort the practice requires. It's unrealistic to expect a completely objective organization structure (particularly when it comes to test file names), but I think something (e.g. directory structure) is far better than nothing. Unfortunately, when it comes to tests that specifically concern parsing, this policy applies unevenly:
It's that last consideration that has made me most reluctant to incorporate the test material in the "test262-parser-tests" project. But it may be that I've lost the forest for the trees. Having written all this out, I'm forced to recognize that the "strong naming scheme" requirement is only in place because it serves an end: "avoid duplication and locate holes in coverage." One interesting thing about @bakkot's tooling is that it attempts to serve that same end in the absence of a meaningful naming scheme. In light of that, maybe we should consider bringing in a subset of the "test262-parser-tests"--specifically the tests labeled "fail". We would likewise eschew any sort of meaningful naming convention (likely also omitting metadata for the same reasons) and place all files in some generic directory like In case it isn't clear: I'm just spit-balling at this point. There are certainly more details to work out (notably: if Test262 can/should maintain tooling in Python and JavaScript, and the task of reviewing all of those tests), but I would like to hear from you folks (plus @rwaldron and @leobalter) before going too far with this line of thinking. |
For what it's worth, I think you did that well :)
Fair enough; we can file this as a "nice to have" effort, given that the current state is technically valid. I'll leave it to @leobalter to decide if a new "tracking" issue is in order. |
I'm not against doing something like this, importing the parser tests fully or partially. This would leave me with a question: how this merge represents a guarantee for a maintenance? The file naming scheme and the metadata not helps for identifying coverage for the spec text, but also to locate something in the project. If anything changes in the spec text, these become tools to find and replace the respective tests.
Having tests for non-language is a case where every change in the grammar - not only breaking changes, but even new language features - will demand a search in non-specific syntax tests. I'm thinking any random thing it can be types:
I'm fine to maintain tooling specially in JS, as long this is not required for consuming the tests, but useful for creating new or maintaining existing files. @isiahmeadows:
You're not wrong when you flag the need for improvements here. Most of these files are legacy we really want to update and consuming Test262 for parsers makes this fix even more appealing.
After 2+ years of work on this project I have to say my list of things I'd like to improve is still increasing. Most of the time using my non-work time with different sorts of scripts. I appreciate your dedication here.
+1
+1. If we solve n. 1 first, this could be reevaluated, but I wouldn't mind flagging tests as this could be really helpful for tests using indirect eval calls. I'll check what I can do to find any cases with direct and indirect eval calls.
If the spec can be verified without eval, the static should never be a fallback, but the main streamline. Eval should only come in in cases it's explicitly necessary, for other cases a test runner could simply wrap any content inside.
This is a matter we should discuss only after do expressions are well implemented in the projects consuming Test262 to avoid forcing an adoption of a new feature to test other features, old or not. I'll try to work on item 1 first, which involves reviewing the already listed files. I'd also be very very thankful to anyone willing to contribute, please feel free to engage on this work as my availability is not more than 1 day per week for the next 2 months. |
This is covered by my first point pretty succinctly: "Don't use eval unless you have to". There are certain tests that require it, particularly those that test completion values of statements, but those are incredibly rare from what I observed. The few exceptions I listed in the original comment were about 50% of the tests I found where
Agreed, and it was more of a forward-thinking-style thing rather than a mandatory "do it now". Note the bolded part in what I told @jugglinmike in a reply - I'm fully aware of its status:
(I've been meaning to file that as a potential issue in the |
I'm sorry if I appeared redundant, I just wanted to keep this issue back on
track with the initial requests. It felt initially like I was only
answering to the matters of tests for the negative syntax.
My comments sort of works as a reading for my future self as I plan to have
this fixed.
|
@rwaldron I presume this was closed by mistake? (based on reading the relevant PR) |
This issue was closed prematurely, but I think I've finished refactoring all the unnecessary uses of I may have missed a few, though I definitely skipped some intentionally. Some turned out to be necessary (particularly those concerning statement completion values). If anyone finds more, please let me know. Here's hoping Test262 is a little more friendly for parsers! |
Also, thanks to @leobalter and @rwaldron for reviewing the patches :) |
I was going through the tests to adapt them for parsers (since this is hardly maintained, much less kept up to date), and I found a couple major issues:
Some tests aren't actually testing what they're supposed to
Here's a few examples:
A
. So what'seval
ed isfunction __funcA(__arg){return __arg;}; __funcA
, not the expectedfunction __func\u0041(__arg){return __arg;}; __funcA
. It also has the below issue.A ton of tests are using
eval
unnecessarilyI've found literally over 300 instances where I had to replace a test with another because of
eval
getting in the way. And with few exceptions (like these three), the vast majority of them could be refactored to check for an early error without meaningfully changing the test. For some examples of how this could be achieved, I did find two good examples of what would be helpful for parsers. Those two, I don't have to preprocess and replace their contents, unlike every single other example here.What would I like to see happen?
Here's a few ones:
eval
unless you have to. Any time you useeval
, it's inaccessible to parsers that only concern themselves with static semantics. This includes files like this, where you can always use the raw characters themselves.eval
infeatures
in the frontmatter whenever you use it. This doesn't provide much direct benefit to parsers, but it could definitely provide benefit to engines that don't support dynamic evaluation. (Kinoma XS6 is one example of this, but I suspect there may be others.) Edit: This is probably the weakest among them, since it doesn't directly affect me.eval
, provide both aneval
-based version and a static fallback.do
expressions appear, assuming they still return completion values, start substituting those for equivalenteval
versions as applicable. Parsers will understand the former, but they can't process the latter.This was for a pretty simple script I wrote, but the workaround took about 3 days worth of work to do, so I'm obviously not happy with the way this turned out...
The text was updated successfully, but these errors were encountered: