-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent-enhancement integration (Step 2) #569
Conversation
…ause it must include at least one parts field"
A sample result of a local experiment:
Note that benchmark There are some "false negative build failures" because LLM modified
I will fix this after we support cloud experiments, which will allow me to verify if my fix is universal. |
/gcbrun skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
pipeline.py
Outdated
|
||
|
||
class Pipeline(): | ||
"""The fuzzing main pipeline, with 3 iterative stages.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some more details (e.g. one sentence per stage) in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point. Done!
pipeline.py
Outdated
"""Executes the stages once.""" | ||
results.append(self.writing_stage.execute(prev_stage_results=results)) | ||
|
||
def execute(self, results: list[Result]) -> list[Result]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit the execution strategy/loop here? Why do we repeatedly execute every stage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
result_classes.py
Outdated
@@ -0,0 +1,81 @@ | |||
"""The data structure of all result kinds.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's not call this "result_classes". How about just "results" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed.
Should it be in a separated dir as well (e.g., experiment
or common
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just keep it where it is for now and move it later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we merge this?
The next step is enabling Prototyper for cloud experiments.
Once done, we will be able to run a full experiment.
stage/writing_stage.py
Outdated
|
||
def _write_new_fuzz_target(self, prev_results: list[Result]) -> Result: | ||
"""Writes a new fuzz target.""" | ||
return self.get_agent('Prototyper').execute(prev_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If particular stages expect agents of certain types to exist, should we ensure they are passed in the constructor? that way, we can never get into a state where they don't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, now we pass all agents expected by each stage in the pipeline constructor.
agent/prototyper.py
Outdated
prompt.save(work_dirs.prompt) | ||
return prompt | ||
|
||
def _parse_tag(self, response: str, tag: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should some of these go into BaseAgent
? I imagine a lot of these things related to formatting, parsing/executing commands would be required for all agents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point, thanks!
I will go through them and move suitable ones to the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done below
agent/prototyper.py
Outdated
|
||
def execute(self, prev_results: list[Result]) -> BuildResult: | ||
"""Executes the agent based on previous result.""" | ||
logger.info('Trial: %d Executing Prototyper', prev_results[-1].trial) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this prev_results[-1]
pattern seems a bit confusing.
Is it always guaranteed that this list is not empty? What happens on the very first run? And also, why does it have to be a list?
Also, "Previous" implies that these are only results from a previous execution, but here it seems like it contains the current trial as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this prev_results[-1] pattern seems a bit confusing.
AH I think we no longer have this in the latest code. I made trial
a class attribute.
Is it always guaranteed that this list is not empty?
Yes, the list is never empty. The list contains 1 Result
item in the first iteration, storing the benchmark-related and experiment-related information. I did this mainly for simplicity reasons: All stages/agents require those info (function signature, project, work directory, etc.) and they all need the previous result too, so I pack those info in Result and pass Result
around. It also makes sense for Result
to store those because we can easily know the result's corresponding benchmark/trail/project/dir. I can use parameters to avoid passing a result item at the first run, but I suspect it will complicate data sharing a little bit.
And also, why does it have to be a list?
Later, we may find the result history helpful for comparisons. For example, the Analysis Stage can learn Result
item 1 with fuzz target source F1
covered project code blocks C1
, and compare it against Result
item 2 with F2
covering C2
. Similar comparison can benefit Enchander in fixing fuzz targets.
Also, "Previous" implies that these are only results from a previous execution, but here it seems like it contains the current trial as well.
Yep good point. I think that has been fixed in the latest code. I renamed them to result_history
.
/gcbrun skip |
Running an experiment locally before merging. |
Implemented:
TODO: