Task filters in benchmark definition #529

dbeyer · 2019-11-25T17:58:55Z

Currently, I can select tasks by two components:

a .set file that defines a set of task definitions and
a .prp file that specifies a property.

But, for example in SV-COMP, it is often crucial to also select
by expected verdict.

For example, I would like to run a result validation based on violation witnesses
on files that contain a bug, in order to validate that the bug is described by the witness.

PhilippWendler · 2019-11-25T18:13:38Z

That makes sense, and we might want to extend the possible task filters to cover more than the expected result. I am not sure how to express this in the XML, though.

One possible way would be to use the expression language that tools like yq and jq use. However, we would not want to implement this ourselves and I do not know whether an appropriate library exists.

For now, it is quite easy to do this manually by writing a small script that extracts information from task-definition files using yq and outputs a set file that contains the desired tasks. There already exists a nice example script for this which just need to be adjusted for filtering on the expected result instead of the property (starting in line 26 where it calls yq).

kfriedberger · 2019-11-25T18:20:27Z

It might be nicer for a developer and for documentation to directly have something like

<tasks name="ECA">  
  <includesfile>../programs/benchmarks/ReachSafety-ECA.set</includesfile>  
  <propertyfile>../programs/benchmarks/properties/unreach-call.prp</propertyfile>  
  <propertyfilter>true</propertyfilter>  
</tasks>

or

<tasks name="ECA">  
  <includesfile>../programs/benchmarks/ReachSafety-ECA.set</includesfile>  
  <propertyfile filter="true">../programs/benchmarks/properties/unreach-call.prp</propertyfile>  
</tasks>

than filtering with a separate script and having an XML file directly containing hundreds of tasks.

dbeyer · 2019-11-25T18:20:52Z

I had in mind the following:

  <tasks name="ReachSafety-BitVectors">
    <includesfile>../sv-benchmarks/c/ReachSafety-BitVectors.set</includesfile>
    <propertyfile expectedverdict="false">../sv-benchmarks/c/properties/unreach-call.prp</propertyfile>
  </tasks>

This defines a set of tasks N x (P x V), where

N is a set of file names,
P is a set of properties, and
V is a set of expected verdicts.
A task is included if it matches the file name, has the property, and the expected verdict is as given.

  <tasks name="ReachSafety-BitVectors">
    <includesfile>../sv-benchmarks/c/ReachSafety-BitVectors.set</includesfile>
    <propertyfile expectedverdict="false">../sv-benchmarks/c/properties/unreach-call.prp</propertyfile>
    <propertyfile expectedverdict="true">../sv-benchmarks/c/properties/nooverflow.prp</propertyfile>
  </tasks>

dbeyer · 2019-11-25T18:23:10Z

@kfriedberger In the meanwhile I like your original proposal (expected verdict as attribute for property, which I wrote above) much better than my original proposal (expected verdict as tag, which you wrote above).

PhilippWendler · 2019-11-25T18:33:02Z

@kfriedberger Of course it is nicer, I already agreed to that. And of course you never need to use XML files that directly contain hundreds of tasks, I already suggested to generate a set file instead.

@dbeyer Interesting idea. This would certainly be implementable, and could also be extended to allow filtering for specific subproperties. So it would cover the current use case.

It seems difficult to extend in the future, though. If we add more information to the task-definition files (obvious ideas here are the language of a task, or things like the machine model which SV-Benchmarks currently needs to track separately), then people naturally would want to filter on these fields as well. And then we would have two different ways on how to specify task filters, although both ways would filter based on the content of the task-definition files.

So the question would be that given the existing workaround, is it still important enough to have this soon rather than as part of a more general solution in the future?

dbeyer · 2019-11-25T18:44:51Z

to generate a set file instead.

Temporarily, just for the BenchExec run, or maintain an extra set of .set files in the repository?

I would also prefer the way it is now: the task definitions (.yml) define the "abilities" or "use cases" for the program, and the benchmark definition (.xml) define the selection for a particular benchmark.

given the existing workaround

You mean asking the validator developers to implement an option?
(This option is considered a bad and unnatural option for a validator.
I would never want to implement this as part of a validator, yet I need to ask people to do it
to circumvent this problem.)

I agree that a query language would be nice, but wouldn't this replace the property-based selection anyway? And the expectedverdict-based selection is just part of the property-based selection?

I like the idea of using a query language, but we would need to experiment with it. It could be that people find this too difficult to understand.

PhilippWendler · 2019-11-25T18:49:27Z

to generate a set file instead.

Temporarily, just for the BenchExec run, or maintain an extra set of .set files in the repository?

I would suggest just temporarily, but this is a matter of taste.

given the existing workaround

You mean asking the validator developers to implement an option?

No, I mean the generated set files.

I agree that a query language would be nice, but wouldn't this replace the property-based selection anyway?

No, the property file is used for more than just filtering, and would not be replaced by some general filter. For example, the property file is passed on to the tool, and it is necessary for benchexec to know which expected result from the task-definition file to use.

I like the idea of using a query language, but we would need to experiment with it. It could be that people find this too difficult to understand.

Could be the case. Your proposal would of course be easy to understand, I like that aspect.

dbeyer · 2019-11-25T18:57:16Z

This makes the choice between 'tag' an 'attribute' easy:
tags specify the things that are to be given to the tool (input, property)
and attributes specify the things that are used just for filtering.

PhilippWendler · 2019-11-26T11:21:58Z

It makes sense to extend this functionality to matching for subproperties while we are at it.

Would you prefer

<propertyfile expectedverdict="false(valid-deref)">memsafety.prp</propertyfile>

or

<propertyfile expectedverdict="false" expectedsubproperty="valid-deref">memsafety.prp</propertyfile>

Implementation of the most important part of #529. The syntax is <propertyfile expectedverdict="...">, where the value can be either "true", "false", or "unknown". Filtering on anything else is not implement. The filters also work only for tasks that are defined in task-definition files, not for the deprecated kind of tasks where the expected verdict is encoded in the file name. No tests exist yet.

dbeyer · 2019-11-26T13:13:16Z

I would slightly prefer the former because this coincides with what appears in the tables.
While the second better corresponds to the task definition in the .yml files.

Now the filter from a6df2bd can also look like <propertyfile expectedverdict="false(valid-deref)"> Part of #529.

PhilippWendler · 2019-11-26T16:10:39Z

This is implemented now, and I think this covers all kinds of filters that would make sense for the current state of task-definition files. We can reconsider this after task-definition files have been extended, so I am closing this for now.

Boolean combinations of filters are not implemented, but only disjunctions make sense at all (e.g., all tasks with expected verdict false(valid-deref) or false(valid-free)), and a workaround for this is to simply add several <tasks> tags, each with its own <propertyfile> tag and filter (but the same file-name patterns). An example for this can be seen in the test case from ba2bd23.

PhilippWendler added the enhancement label Nov 25, 2019

PhilippWendler changed the title ~~Extend Run Set Selection by Expected Verdict~~ Task filters in benchmark definition Nov 25, 2019

PhilippWendler self-assigned this Nov 26, 2019

PhilippWendler added a commit that referenced this issue Nov 26, 2019

Tests for task filters from a6df2bd (#529)

67bf466

PhilippWendler added a commit that referenced this issue Nov 26, 2019

Task filters for benchexec based on expected subproperty violations

d5ad875

Now the filter from a6df2bd can also look like <propertyfile expectedverdict="false(valid-deref)"> Part of #529.

PhilippWendler closed this as completed Nov 26, 2019

This was referenced Feb 23, 2024

Discussion: Qualitative properties and extended task definitions #994

Closed

Tagging Properties and Filtering #998

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task filters in benchmark definition #529

Task filters in benchmark definition #529

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 25, 2019

kfriedberger commented Nov 25, 2019 •

edited

Loading

dbeyer commented Nov 25, 2019

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 25, 2019

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 25, 2019

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 26, 2019

dbeyer commented Nov 26, 2019

PhilippWendler commented Nov 26, 2019

Task filters in benchmark definition #529

Task filters in benchmark definition #529

Comments

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 25, 2019

kfriedberger commented Nov 25, 2019 • edited Loading

dbeyer commented Nov 25, 2019

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 25, 2019

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 25, 2019

dbeyer commented Nov 25, 2019

PhilippWendler commented Nov 26, 2019

dbeyer commented Nov 26, 2019

PhilippWendler commented Nov 26, 2019

kfriedberger commented Nov 25, 2019 •

edited

Loading