benchmarks: update JVM benchmarks #445

DavidKorczynski · 2024-07-06T18:01:01Z

Changes the JVM benchmarks to be split into three buckets: 1) jvm-all: all java projects in oss-fuzz
2) jvm-medium: a smaller set of ~20 projects, which is useful for
testing semantic changes in prompts without needing to check all
projects.
3) jvm-small: set of three projects that can be used to test changes in
infrastructure to ensure no infra regressions.

DavidKorczynski · 2024-07-06T18:02:16Z

Generated using [far-reach-low-coverage,jvm-public-candidates,easy-params-far-reach] and max 6 targets per oracle

DavidKorczynski · 2024-07-06T18:02:49Z

/gcbrun exp -n dk-test-jvm3234 -m vertex_ai_gemini-1-5 -b jvm-small

Changes the JVM benchmarks to be split into three buckets: 1) jvm-all: all java projects in oss-fuzz 2) jvm-medium: a smaller set of ~20 projects, which is useful for testing semantic changes in prompts without needing to check all projects. 3) jvm-small: set of three projects that can be used to test changes in infrastructure to ensure no infra regressions. Signed-off-by: David Korczynski <david@adalogics.com>

DavidKorczynski · 2024-07-06T18:43:13Z

Experiment looks good

DonggeLiu · 2024-07-07T23:57:38Z

benchmark-sets/jvm-all/antlr3-java.yaml

@@ -0,0 +1,55 @@
+"functions":
+- "exceptions": []


Is exceptions a new field in FI?
I noticed this new field in the new C/C++ benchmarks, too.
It is harmless to OFG, but if it serves no actual purpose, removing it would be cleaner.

Nice catch. It's used but we need to bump introspector for this to reflect in the benchmarks generated from introspector.oss-fuzz.com versus locally. Doing this before landing this PR by way of google/oss-fuzz#12170

The benchmarks have been updated now! Will run a small experiment and if all goes well there then I will run a large experiment

DavidKorczynski · 2024-07-10T20:25:35Z

/gcbrun exp -n dk-test-infra5205 -m vertex_ai_gemini-1-5 -b jvm-small -i

Signed-off-by: David Korczynski <david@adalogics.com>

DavidKorczynski · 2024-07-10T20:50:12Z

Small experiment is looking great https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-07-11-445-dk-test-infra5205-jvm-small/index.html, let's do a full run

DavidKorczynski · 2024-07-10T20:52:37Z

/gcbrun exp -n dk-test-infra5209 -m vertex_ai_gemini-1-5 -b jvm-all -i

DavidKorczynski requested a review from DonggeLiu July 6, 2024 18:43

DonggeLiu approved these changes Jul 7, 2024

View reviewed changes

Merge branch 'main' into update-jvm-benchmarks

c4afbee

update benchmarks to latest FI data, including exceptions

53c8aca

Signed-off-by: David Korczynski <david@adalogics.com>

DavidKorczynski requested a review from DonggeLiu July 10, 2024 22:10

DonggeLiu approved these changes Jul 11, 2024

View reviewed changes

DavidKorczynski merged commit baad3a1 into main Jul 11, 2024
7 checks passed

DavidKorczynski deleted the update-jvm-benchmarks branch July 11, 2024 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks: update JVM benchmarks #445

benchmarks: update JVM benchmarks #445

DavidKorczynski commented Jul 6, 2024

DavidKorczynski commented Jul 6, 2024

DavidKorczynski commented Jul 6, 2024

DavidKorczynski commented Jul 6, 2024

DonggeLiu Jul 7, 2024

DavidKorczynski Jul 8, 2024

DavidKorczynski Jul 10, 2024

DavidKorczynski commented Jul 10, 2024

DavidKorczynski commented Jul 10, 2024 •

edited

Loading

DavidKorczynski commented Jul 10, 2024

benchmarks: update JVM benchmarks #445

benchmarks: update JVM benchmarks #445

Conversation

DavidKorczynski commented Jul 6, 2024

DavidKorczynski commented Jul 6, 2024

DavidKorczynski commented Jul 6, 2024

DavidKorczynski commented Jul 6, 2024

DonggeLiu Jul 7, 2024

Choose a reason for hiding this comment

DavidKorczynski Jul 8, 2024

Choose a reason for hiding this comment

DavidKorczynski Jul 10, 2024

Choose a reason for hiding this comment

DavidKorczynski commented Jul 10, 2024

DavidKorczynski commented Jul 10, 2024 • edited Loading

DavidKorczynski commented Jul 10, 2024

DavidKorczynski commented Jul 10, 2024 •

edited

Loading