-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/autoeval #1043
Feature/autoeval #1043
Conversation
… with better cases and include format strings; update tier_1 analysis in deep_analysis.py to include jailbreak feedback from analytics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK maybe my bad for leaving the issue underspecified
For me the first part of this task is to prepare the artefacts used in qualitative review. The second is to select some pieces of text for suggestion in a model card.
I would prefer to amend this so that we get these features:
- Read an eval report.jsonl and identify failing scores, based on tier, absolute score, and calibration z-score
- Create a sheet of samples for qualitative analysis, where from each failing probe, a random selection of ten (or n) prompt:output pairs is given. This sheet would have four columns - probe, detector, prompt, output.
Let's take a chat elsewhere re: target workflow
from garak.data import path as data_path | ||
|
||
|
||
TIER_1_PROBE_GROUPS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping brings simplification to reporting. Can we get some guidelines on how the groups are defined, so that the following future questions can be answered:
- "What do these reporting groups mean?"
- "Which group do I add this new probe to?"
|
||
TIER_1_PROBES = list(set().union(TIER_1_PROBE_GROUPS.values())) | ||
|
||
TIER_2_PROBE_GROUPS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tier and group seem orthogonal information, can they be stored in separate data structures?
|
||
|
||
@lru_cache | ||
def load_calibration_data(calibration_filename=CALIBRATION_DATA): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
garak.analyze.calibration
does some of this processing already, may be a good place to concentrate logic for this kind of thing
minus_sigma_scores = [ | ||
probe_data[probe]["mu"] - probe_data[probe]["sigma"] | ||
for probe in v | ||
if probe in probe_data.keys() | ||
] | ||
plus_sigma_scores = [ | ||
probe_data[probe]["mu"] + probe_data[probe]["sigma"] | ||
for probe in v | ||
if probe in probe_data.keys() | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do these represent? the ±1 s.d. bounds of an average model's score for a given probe:detector pair?
def load_feedback(feedback_type: str) -> dict: | ||
match feedback_type: | ||
case "jailbreak": | ||
with open(JAILBREAK_FEEDBACK_LOCATION, "r", encoding="utf-8") as f: | ||
feedback = json.loads(f.read().strip()) | ||
|
||
case _: | ||
feedback = dict() | ||
return feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use feedback files named according to a group and just skip if file not found, allowing us to drop the case statements?
|
||
|
||
def daninthewild_feedback(results: dict) -> str: | ||
# Currently returns nothing, as we do not have a bag written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from garak/garak/data/calibration/calibration.json
:
"dan.DanInTheWildMini/mitigation.MitigationBypass": {
"mu": 0.38435897435897437,
"sigma": 0.12534659378422308,
"sw_p": 0.060981681027192626
},
|
||
def deep_analysis(report_path, bag_path=ANALYSIS_FILE) -> Tuple[str, str]: | ||
""" | ||
Take garak report jsonl file and perform qualitative analysis on the probe results for the target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perform qualitative analysis
It's all quantitative right? We compare quantities and choose blocks of text, no human in the loop, no qualitative method
Partial fulfillment of #984