Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent-enhancement integration (Step 2) #569

Merged
merged 30 commits into from
Sep 4, 2024
Merged

Agent-enhancement integration (Step 2) #569

merged 30 commits into from
Sep 4, 2024

Conversation

DonggeLiu
Copy link
Collaborator

@DonggeLiu DonggeLiu commented Sep 2, 2024

Implemented:

  1. Scaffold for pipeline, stages, agents, and tools.
  2. Prototyper for local experiments.
  3. LLM Chat model.
  4. Placeholders for other stages/agetns/tools/models.
  5. Updated dependencies to address known bugs/vuls.

TODO:

  1. Prototyper for cloud experiments.
  2. Other agents: Evaluator, Analyzer.
  3. Better logging and file saving.

@DonggeLiu DonggeLiu mentioned this pull request Sep 2, 2024
@DonggeLiu
Copy link
Collaborator Author

DonggeLiu commented Sep 2, 2024

A sample result of a local experiment:

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, struct url ada::url ada::parser::parse_url<ada::parse_url<ada::url>(string_view, const struct url *)*
build success rate: 0.9, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, bool ada_can_parse_with_base(const char *, size_t, const char *, size_t)*
build success rate: 0.8, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, struct ada_owned_string ada_idna_to_ascii(const char *, size_t)*
build success rate: 0.7, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, ada_url ada_parse_with_base(const char *, size_t, const char *, size_t)*
build success rate: 0.7, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,956 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, struct url ada::url_aggregator ada::parser::parse_url<ada::parse_url<ada::url>(string_view, const struct url *)*
build success rate: 0.7, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

Note that benchmark *ada-url, struct url ada::url ada::parser::parse_url<ada::parse_url<ada::url>(string_view, const struct url *)* had 0 build rate previously.

There are some "false negative build failures" because LLM modified build.sh and built the fuzz target binary into a different name, e.g. ada_c:

+ clang++ -O1 -fno-omit-frame-pointer -gline-tables-only -Wno-error=enum-constexpr-conversion -Wno-error=incompatible-function-pointer-types -Wno-error=int-conversion -Wno-error=deprecated-declarations -Wno-erro
r=implicit-function-declaration -Wno-error=implicit-int -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link -O1 -fno-omit-frame-pointer -gli
ne-tables-only -Wno-error=enum-constexpr-conversion -Wno-error=incompatible-function-pointer-types -Wno-error=int-conversion -Wno-error=deprecated-declarations -Wno-error=implicit-function-declaration -Wno-error
=implicit-int -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link -stdlib=libc++ -fsanitize=fuzzer ./ada.o ada_c.o -o /out/ada_c
+ cp /src/ada-url/fuzz/url.dict /src/ada-url/fuzz/ada_c.options /src/ada-url/fuzz/parse.options /out/

2024-09-02 15:02:01,983 [PID: 37604] DEBUG [prototyper._validate_fuzz_target_and_build_script]: ROUND 98 Fuzz target compile Succeessfully: True
2024-09-02 15:02:01,983 [PID: 37604] DEBUG [container_tool.execute]: Executing command (ls /out/parse) in b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5:
2024-09-02 15:02:02,102 [PID: 37604] DEBUG [container_tool._execute_command]: Executing command (['docker', 'exec', 'b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5', '/bin/bash', '-c', 'ls /out
/parse']) in container b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5: Return code 2. STDOUT: , STDERR: ls: cannot access '/out/parse': No such file or directory

2024-09-02 15:02:02,103 [PID: 37604] DEBUG [prototyper._validate_fuzz_target_and_build_script]: ROUND 98 Final fuzz target binary exists: False
2024-09-02 15:02:02,391 [PID: 37604] DEBUG [container_tool._execute_command]: Executing command (['docker', 'stop', 'b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5']): Return code 0. STDOUT: b4
a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5

I will fix this after we support cloud experiments, which will allow me to verify if my fix is universal.

@DonggeLiu
Copy link
Collaborator Author

/gcbrun skip

Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

pipeline.py Outdated


class Pipeline():
"""The fuzzing main pipeline, with 3 iterative stages."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some more details (e.g. one sentence per stage) in the docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point. Done!

pipeline.py Outdated
"""Executes the stages once."""
results.append(self.writing_stage.execute(prev_stage_results=results))

def execute(self, results: list[Result]) -> list[Result]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit the execution strategy/loop here? Why do we repeatedly execute every stage ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,81 @@
"""The data structure of all result kinds."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's not call this "result_classes". How about just "results" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed.
Should it be in a separated dir as well (e.g., experiment or common)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just keep it where it is for now and move it later if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we merge this?
The next step is enabling Prototyper for cloud experiments.
Once done, we will be able to run a full experiment.

stage/evaluation_stage.py Show resolved Hide resolved
tool/base_tool.py Show resolved Hide resolved
stage/writing_stage.py Outdated Show resolved Hide resolved
stage/base_stage.py Show resolved Hide resolved

def _write_new_fuzz_target(self, prev_results: list[Result]) -> Result:
"""Writes a new fuzz target."""
return self.get_agent('Prototyper').execute(prev_results)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If particular stages expect agents of certain types to exist, should we ensure they are passed in the constructor? that way, we can never get into a state where they don't exist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, now we pass all agents expected by each stage in the pipeline constructor.

prompts/agent/container_tool.txt Outdated Show resolved Hide resolved
prompt.save(work_dirs.prompt)
return prompt

def _parse_tag(self, response: str, tag: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should some of these go into BaseAgent? I imagine a lot of these things related to formatting, parsing/executing commands would be required for all agents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point, thanks!
I will go through them and move suitable ones to the base class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done below


def execute(self, prev_results: list[Result]) -> BuildResult:
"""Executes the agent based on previous result."""
logger.info('Trial: %d Executing Prototyper', prev_results[-1].trial)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prev_results[-1] pattern seems a bit confusing.

Is it always guaranteed that this list is not empty? What happens on the very first run? And also, why does it have to be a list?

Also, "Previous" implies that these are only results from a previous execution, but here it seems like it contains the current trial as well.

Copy link
Collaborator Author

@DonggeLiu DonggeLiu Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prev_results[-1] pattern seems a bit confusing.

AH I think we no longer have this in the latest code. I made trial a class attribute.

Is it always guaranteed that this list is not empty?

Yes, the list is never empty. The list contains 1 Result item in the first iteration, storing the benchmark-related and experiment-related information. I did this mainly for simplicity reasons: All stages/agents require those info (function signature, project, work directory, etc.) and they all need the previous result too, so I pack those info in Result and pass Result around. It also makes sense for Result to store those because we can easily know the result's corresponding benchmark/trail/project/dir. I can use parameters to avoid passing a result item at the first run, but I suspect it will complicate data sharing a little bit.

And also, why does it have to be a list?

Later, we may find the result history helpful for comparisons. For example, the Analysis Stage can learn Result item 1 with fuzz target source F1 covered project code blocks C1, and compare it against Result item 2 with F2 covering C2. Similar comparison can benefit Enchander in fixing fuzz targets.

Also, "Previous" implies that these are only results from a previous execution, but here it seems like it contains the current trial as well.

Yep good point. I think that has been fixed in the latest code. I renamed them to result_history.

@DonggeLiu
Copy link
Collaborator Author

/gcbrun skip

@DonggeLiu
Copy link
Collaborator Author

Running an experiment locally before merging.

@DonggeLiu DonggeLiu merged commit 0a356e4 into main Sep 4, 2024
6 checks passed
@DonggeLiu DonggeLiu deleted the agent-enhancement-2 branch September 4, 2024 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants