Agent-enhancement integration (Step 2) #569

DonggeLiu · 2024-09-02T05:19:53Z

Implemented:

Scaffold for pipeline, stages, agents, and tools.
Prototyper for local experiments.
LLM Chat model.
Placeholders for other stages/agetns/tools/models.
Updated dependencies to address known bugs/vuls.

TODO:

Prototyper for cloud experiments.
Other agents: Evaluator, Analyzer.
Better logging and file saving.

…ause it must include at least one parts field"

…patible

DonggeLiu · 2024-09-02T05:26:31Z

A sample result of a local experiment:

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, struct url ada::url ada::parser::parse_url<ada::parse_url<ada::url>(string_view, const struct url *)*
build success rate: 0.9, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, bool ada_can_parse_with_base(const char *, size_t, const char *, size_t)*
build success rate: 0.8, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, struct ada_owned_string ada_idna_to_ascii(const char *, size_t)*
build success rate: 0.7, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,955 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, ada_url ada_parse_with_base(const char *, size_t, const char *, size_t)*
build success rate: 0.7, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

2024-09-03 04:35:11,956 [PID: 636020] INFO [run_all_experiments._print_experiment_results]: ================================================================================
*ada-url, struct url ada::url_aggregator ada::parser::parse_url<ada::parse_url<ada::url>(string_view, const struct url *)*
build success rate: 0.7, crash rate: 0.0, found bug: 0, max coverage: 0.0, max line coverage diff: 0.0
max coverage sample:
max coverage diff sample:
max coverage diff report: None

Note that benchmark *ada-url, struct url ada::url ada::parser::parse_url<ada::parse_url<ada::url>(string_view, const struct url *)* had 0 build rate previously.

There are some "false negative build failures" because LLM modified build.sh and built the fuzz target binary into a different name, e.g. ada_c:

+ clang++ -O1 -fno-omit-frame-pointer -gline-tables-only -Wno-error=enum-constexpr-conversion -Wno-error=incompatible-function-pointer-types -Wno-error=int-conversion -Wno-error=deprecated-declarations -Wno-erro
r=implicit-function-declaration -Wno-error=implicit-int -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link -O1 -fno-omit-frame-pointer -gli
ne-tables-only -Wno-error=enum-constexpr-conversion -Wno-error=incompatible-function-pointer-types -Wno-error=int-conversion -Wno-error=deprecated-declarations -Wno-error=implicit-function-declaration -Wno-error
=implicit-int -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link -stdlib=libc++ -fsanitize=fuzzer ./ada.o ada_c.o -o /out/ada_c
+ cp /src/ada-url/fuzz/url.dict /src/ada-url/fuzz/ada_c.options /src/ada-url/fuzz/parse.options /out/

2024-09-02 15:02:01,983 [PID: 37604] DEBUG [prototyper._validate_fuzz_target_and_build_script]: ROUND 98 Fuzz target compile Succeessfully: True
2024-09-02 15:02:01,983 [PID: 37604] DEBUG [container_tool.execute]: Executing command (ls /out/parse) in b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5:
2024-09-02 15:02:02,102 [PID: 37604] DEBUG [container_tool._execute_command]: Executing command (['docker', 'exec', 'b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5', '/bin/bash', '-c', 'ls /out
/parse']) in container b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5: Return code 2. STDOUT: , STDERR: ls: cannot access '/out/parse': No such file or directory

2024-09-02 15:02:02,103 [PID: 37604] DEBUG [prototyper._validate_fuzz_target_and_build_script]: ROUND 98 Final fuzz target binary exists: False
2024-09-02 15:02:02,391 [PID: 37604] DEBUG [container_tool._execute_command]: Executing command (['docker', 'stop', 'b4a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5']): Return code 0. STDOUT: b4
a0f1ad77527a5fc63e88c3136f9fc230a02811a54b5cb747a83e1e7da4e5a5

I will fix this after we support cloud experiments, which will allow me to verify if my fix is universal.

DonggeLiu · 2024-09-02T05:35:02Z

/gcbrun skip

oliverchang

nice!

oliverchang · 2024-09-02T05:40:40Z

pipeline.py

+
+
+class Pipeline():
+  """The fuzzing main pipeline, with 3 iterative stages."""


can you add some more details (e.g. one sentence per stage) in the docstring?

Yep, good point. Done!

oliverchang · 2024-09-02T05:41:26Z

pipeline.py

+    """Executes the stages once."""
+    results.append(self.writing_stage.execute(prev_stage_results=results))
+
+  def execute(self, results: list[Result]) -> list[Result]:


Can you explain a bit the execution strategy/loop here? Why do we repeatedly execute every stage ?

oliverchang · 2024-09-02T05:42:09Z

result_classes.py

@@ -0,0 +1,81 @@
+"""The data structure of all result kinds."""


nit: let's not call this "result_classes". How about just "results" ?

Yep, fixed.
Should it be in a separated dir as well (e.g., experiment or common)?

we can just keep it where it is for now and move it later if needed.

Shall we merge this?
The next step is enabling Prototyper for cloud experiments.
Once done, we will be able to run a full experiment.

stage/evaluation_stage.py

tool/base_tool.py

stage/writing_stage.py

stage/base_stage.py

oliverchang · 2024-09-02T05:46:51Z

stage/writing_stage.py

+
+  def _write_new_fuzz_target(self, prev_results: list[Result]) -> Result:
+    """Writes a new fuzz target."""
+    return self.get_agent('Prototyper').execute(prev_results)


If particular stages expect agents of certain types to exist, should we ensure they are passed in the constructor? that way, we can never get into a state where they don't exist.

Done, now we pass all agents expected by each stage in the pipeline constructor.

prompts/agent/container_tool.txt

oliverchang · 2024-09-03T04:45:17Z

agent/prototyper.py

+    prompt.save(work_dirs.prompt)
+    return prompt
+
+  def _parse_tag(self, response: str, tag: str) -> str:


should some of these go into BaseAgent? I imagine a lot of these things related to formatting, parsing/executing commands would be required for all agents?

Yes, good point, thanks!
I will go through them and move suitable ones to the base class.

oliverchang · 2024-09-03T04:49:41Z

agent/prototyper.py

+
+  def execute(self, prev_results: list[Result]) -> BuildResult:
+    """Executes the agent based on previous result."""
+    logger.info('Trial: %d Executing Prototyper', prev_results[-1].trial)


this prev_results[-1] pattern seems a bit confusing.

Is it always guaranteed that this list is not empty? What happens on the very first run? And also, why does it have to be a list?

Also, "Previous" implies that these are only results from a previous execution, but here it seems like it contains the current trial as well.

this prev_results[-1] pattern seems a bit confusing.

AH I think we no longer have this in the latest code. I made trial a class attribute.

Is it always guaranteed that this list is not empty?

Yes, the list is never empty. The list contains 1 Result item in the first iteration, storing the benchmark-related and experiment-related information. I did this mainly for simplicity reasons: All stages/agents require those info (function signature, project, work directory, etc.) and they all need the previous result too, so I pack those info in Result and pass Result around. It also makes sense for Result to store those because we can easily know the result's corresponding benchmark/trail/project/dir. I can use parameters to avoid passing a result item at the first run, but I suspect it will complicate data sharing a little bit.

And also, why does it have to be a list?

Later, we may find the result history helpful for comparisons. For example, the Analysis Stage can learn Result item 1 with fuzz target source F1 covered project code blocks C1, and compare it against Result item 2 with F2 covering C2. Similar comparison can benefit Enchander in fixing fuzz targets.

Also, "Previous" implies that these are only results from a previous execution, but here it seems like it contains the current trial as well.

Yep good point. I think that has been fixed in the latest code. I renamed them to result_history.

DonggeLiu · 2024-09-03T06:34:58Z

/gcbrun skip

DonggeLiu · 2024-09-03T06:35:17Z

Running an experiment locally before merging.

DonggeLiu added 19 commits September 2, 2024 10:08

Update Google AI platform to fix error: "Unable to submit request bec…

d6c6807

…ause it must include at least one parts field"

Update requests to mitigate potential vul

ad566b7

Add functions to check and prepare project image

b7bbe3a

Add ChatModel + related functions and settings, make other models com…

13f0cd1

…patible

Allow prompt builder to build a prompt with given text

8c1372d

Allow appending to prompt text

61b28c7

Add an arg to enable agent mode

1de866f

Integrate fuzzing pipeline when running experiment

133c0b2

Capture another known API error

e7d1a5f

Fuzzing pipeline

507f42b

Add stages in pipeline

2b099ca

LLM agents for stages in fuzzing pipeline

70b56b0

Tools for LLM agents

f50a3f5

The initial prompt for container tool

b144d0b

The classes for result data types

14f7ba6

Implement get_tool in base_agent

a20ffa7

Add placeholders for now

ede6090

Minor format fix

9d4a0c4

More place holder for GPT models

ef9bcd4

DonggeLiu mentioned this pull request Sep 2, 2024

Agent-enhancement #558

Open

DonggeLiu requested a review from oliverchang September 2, 2024 05:22

oliverchang reviewed Sep 2, 2024

View reviewed changes

DonggeLiu added 6 commits September 3, 2024 09:33

Add Trial id in logs

03f1dfa

Remove docker run ... from the command sent back to LLM

92ee7f1

More detailed docstrings

5d24cb5

Rename result class

c7b1a24

Rename variables to avoid confusion

8b424da

More doc string

0033a0c

DonggeLiu added 4 commits September 3, 2024 11:47

Initialize agents in constructor

64c0d6d

Relocate tool prompts and generalize the way to read from it

a4c5516

More doc strings

cec3dba

More docstrings

8ceaac9

oliverchang reviewed Sep 3, 2024

View reviewed changes

oliverchang approved these changes Sep 3, 2024

View reviewed changes

Moving reusable functions into base class

d7bbbf1

DonggeLiu merged commit 0a356e4 into main Sep 4, 2024
6 checks passed

DonggeLiu deleted the agent-enhancement-2 branch September 4, 2024 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent-enhancement integration (Step 2) #569

Agent-enhancement integration (Step 2) #569

DonggeLiu commented Sep 2, 2024 •

edited

Loading

DonggeLiu commented Sep 2, 2024 •

edited

Loading

DonggeLiu commented Sep 2, 2024

oliverchang left a comment

oliverchang Sep 2, 2024

DonggeLiu Sep 3, 2024

oliverchang Sep 2, 2024

DonggeLiu Sep 3, 2024

oliverchang Sep 2, 2024

DonggeLiu Sep 3, 2024

oliverchang Sep 3, 2024

DonggeLiu Sep 3, 2024

oliverchang Sep 2, 2024

DonggeLiu Sep 3, 2024

oliverchang Sep 3, 2024

DonggeLiu Sep 3, 2024

DonggeLiu Sep 3, 2024

oliverchang Sep 3, 2024

DonggeLiu Sep 3, 2024 •

edited

Loading

DonggeLiu commented Sep 3, 2024

DonggeLiu commented Sep 3, 2024



		class Pipeline():
		"""The fuzzing main pipeline, with 3 iterative stages."""

		@@ -0,0 +1,81 @@
		"""The data structure of all result kinds."""

Agent-enhancement integration (Step 2) #569

Agent-enhancement integration (Step 2) #569

Conversation

DonggeLiu commented Sep 2, 2024 • edited Loading

DonggeLiu commented Sep 2, 2024 • edited Loading

DonggeLiu commented Sep 2, 2024

oliverchang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DonggeLiu Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

DonggeLiu commented Sep 3, 2024

DonggeLiu commented Sep 3, 2024

DonggeLiu commented Sep 2, 2024 •

edited

Loading

DonggeLiu commented Sep 2, 2024 •

edited

Loading

DonggeLiu Sep 3, 2024 •

edited

Loading