Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task filters in benchmark definition #529

Closed
dbeyer opened this issue Nov 25, 2019 · 11 comments
Closed

Task filters in benchmark definition #529

dbeyer opened this issue Nov 25, 2019 · 11 comments
Assignees

Comments

@dbeyer
Copy link
Member

dbeyer commented Nov 25, 2019

Currently, I can select tasks by two components:

  • a .set file that defines a set of task definitions and
  • a .prp file that specifies a property.

But, for example in SV-COMP, it is often crucial to also select
by expected verdict.

For example, I would like to run a result validation based on violation witnesses
on files that contain a bug, in order to validate that the bug is described by the witness.

@PhilippWendler
Copy link
Member

That makes sense, and we might want to extend the possible task filters to cover more than the expected result. I am not sure how to express this in the XML, though.

One possible way would be to use the expression language that tools like yq and jq use. However, we would not want to implement this ourselves and I do not know whether an appropriate library exists.

For now, it is quite easy to do this manually by writing a small script that extracts information from task-definition files using yq and outputs a set file that contains the desired tasks. There already exists a nice example script for this which just need to be adjusted for filtering on the expected result instead of the property (starting in line 26 where it calls yq).

@PhilippWendler PhilippWendler changed the title Extend Run Set Selection by Expected Verdict Task filters in benchmark definition Nov 25, 2019
@kfriedberger
Copy link
Member

kfriedberger commented Nov 25, 2019

It might be nicer for a developer and for documentation to directly have something like

<tasks name="ECA">  
  <includesfile>../programs/benchmarks/ReachSafety-ECA.set</includesfile>  
  <propertyfile>../programs/benchmarks/properties/unreach-call.prp</propertyfile>  
  <propertyfilter>true</propertyfilter>  
</tasks>  

or

<tasks name="ECA">  
  <includesfile>../programs/benchmarks/ReachSafety-ECA.set</includesfile>  
  <propertyfile filter="true">../programs/benchmarks/properties/unreach-call.prp</propertyfile>  
</tasks>  

than filtering with a separate script and having an XML file directly containing hundreds of tasks.

@dbeyer
Copy link
Member Author

dbeyer commented Nov 25, 2019

I had in mind the following:

  <tasks name="ReachSafety-BitVectors">
    <includesfile>../sv-benchmarks/c/ReachSafety-BitVectors.set</includesfile>
    <propertyfile expectedverdict="false">../sv-benchmarks/c/properties/unreach-call.prp</propertyfile>
  </tasks>

This defines a set of tasks N x (P x V), where

  • N is a set of file names,
  • P is a set of properties, and
  • V is a set of expected verdicts.
    A task is included if it matches the file name, has the property, and the expected verdict is as given.
  <tasks name="ReachSafety-BitVectors">
    <includesfile>../sv-benchmarks/c/ReachSafety-BitVectors.set</includesfile>
    <propertyfile expectedverdict="false">../sv-benchmarks/c/properties/unreach-call.prp</propertyfile>
    <propertyfile expectedverdict="true">../sv-benchmarks/c/properties/nooverflow.prp</propertyfile>
  </tasks>

@dbeyer
Copy link
Member Author

dbeyer commented Nov 25, 2019

@kfriedberger In the meanwhile I like your original proposal (expected verdict as attribute for property, which I wrote above) much better than my original proposal (expected verdict as tag, which you wrote above).

@PhilippWendler
Copy link
Member

@kfriedberger Of course it is nicer, I already agreed to that. And of course you never need to use XML files that directly contain hundreds of tasks, I already suggested to generate a set file instead.

@dbeyer Interesting idea. This would certainly be implementable, and could also be extended to allow filtering for specific subproperties. So it would cover the current use case.

It seems difficult to extend in the future, though. If we add more information to the task-definition files (obvious ideas here are the language of a task, or things like the machine model which SV-Benchmarks currently needs to track separately), then people naturally would want to filter on these fields as well. And then we would have two different ways on how to specify task filters, although both ways would filter based on the content of the task-definition files.

So the question would be that given the existing workaround, is it still important enough to have this soon rather than as part of a more general solution in the future?

@dbeyer
Copy link
Member Author

dbeyer commented Nov 25, 2019

to generate a set file instead.

Temporarily, just for the BenchExec run, or maintain an extra set of .set files in the repository?

I would also prefer the way it is now: the task definitions (.yml) define the "abilities" or "use cases" for the program, and the benchmark definition (.xml) define the selection for a particular benchmark.

given the existing workaround

You mean asking the validator developers to implement an option?
(This option is considered a bad and unnatural option for a validator.
I would never want to implement this as part of a validator, yet I need to ask people to do it
to circumvent this problem.)

I agree that a query language would be nice, but wouldn't this replace the property-based selection anyway? And the expectedverdict-based selection is just part of the property-based selection?

I like the idea of using a query language, but we would need to experiment with it. It could be that people find this too difficult to understand.

@PhilippWendler
Copy link
Member

to generate a set file instead.

Temporarily, just for the BenchExec run, or maintain an extra set of .set files in the repository?

I would suggest just temporarily, but this is a matter of taste.

given the existing workaround

You mean asking the validator developers to implement an option?

No, I mean the generated set files.

I agree that a query language would be nice, but wouldn't this replace the property-based selection anyway?

No, the property file is used for more than just filtering, and would not be replaced by some general filter. For example, the property file is passed on to the tool, and it is necessary for benchexec to know which expected result from the task-definition file to use.

I like the idea of using a query language, but we would need to experiment with it. It could be that people find this too difficult to understand.

Could be the case. Your proposal would of course be easy to understand, I like that aspect.

@dbeyer
Copy link
Member Author

dbeyer commented Nov 25, 2019

This makes the choice between 'tag' an 'attribute' easy:
tags specify the things that are to be given to the tool (input, property)
and attributes specify the things that are used just for filtering.

@PhilippWendler PhilippWendler self-assigned this Nov 26, 2019
@PhilippWendler
Copy link
Member

It makes sense to extend this functionality to matching for subproperties while we are at it.

Would you prefer

<propertyfile expectedverdict="false(valid-deref)">memsafety.prp</propertyfile>

or

<propertyfile expectedverdict="false" expectedsubproperty="valid-deref">memsafety.prp</propertyfile>

PhilippWendler added a commit that referenced this issue Nov 26, 2019
Implementation of the most important part of #529.
The syntax is <propertyfile expectedverdict="...">,
where the value can be either "true", "false", or "unknown".
Filtering on anything else is not implement.
The filters also work only for tasks that are defined in
task-definition files, not for the deprecated kind of tasks
where the expected verdict is encoded in the file name.

No tests exist yet.
@dbeyer
Copy link
Member Author

dbeyer commented Nov 26, 2019

I would slightly prefer the former because this coincides with what appears in the tables.
While the second better corresponds to the task definition in the .yml files.

PhilippWendler added a commit that referenced this issue Nov 26, 2019
Now the filter from a6df2bd can also look like
<propertyfile expectedverdict="false(valid-deref)">

Part of #529.
@PhilippWendler
Copy link
Member

This is implemented now, and I think this covers all kinds of filters that would make sense for the current state of task-definition files. We can reconsider this after task-definition files have been extended, so I am closing this for now.

Boolean combinations of filters are not implemented, but only disjunctions make sense at all (e.g., all tasks with expected verdict false(valid-deref) or false(valid-free)), and a workaround for this is to simply add several <tasks> tags, each with its own <propertyfile> tag and filter (but the same file-name patterns). An example for this can be seen in the test case from ba2bd23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants