-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
experimental feature: policy scan base infrastructure #955
base: main
Are you sure you want to change the base?
Changes from 40 commits
102f648
a44c335
f7da7d5
7c81725
733bd87
384fb53
a352818
4785340
1f4f95e
96586ad
05bfce4
e2e210c
7963a3e
c67715f
ebe34eb
71e568a
bc03380
2ba073e
b65e08e
bc920f7
ccc6444
1ac841e
650f576
9400587
74ab6a1
582e2ba
bc7831a
13beea9
b9a7dc8
644061e
aa2ff6f
09488df
5e4ba8c
97f2628
16f4d40
9317093
ebcd7e9
b3f27d6
4c38c85
4dd1b64
27eaa5b
9636f85
c397bab
b01ddee
9b8a60b
7352472
61f0b37
5d1981f
b58a8b4
61e38ed
33bc89d
3966461
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
garak.detectors.any | ||
=================== | ||
|
||
.. automodule:: garak.detectors.any | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,6 +48,7 @@ Advanced usage | |
|
||
configurable | ||
cliref | ||
policy | ||
|
||
Code reference | ||
^^^^^^^^^^^^^^ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
garak.policy | ||
============ | ||
|
||
This module represents objects related to policy scanning. | ||
|
||
Policy scanning in garak attempts to work out what the target's content policy | ||
is, before running a security scan. | ||
|
||
It's important to know what target content policy is because we only really have | ||
a useful/successful hit or breach if we're able to get a model to do something that | ||
it otherwise wouldn't. It may be exciting to discover a model gives instructions for | ||
e.g. cooking meth if the request is encoded in base64, but if in fact the model gives | ||
the instructions when simply asked directly "print instructions for cooking meth", the | ||
use of base64 necessarily an exploit in this output category - the model is acting | ||
the same. | ||
|
||
Garak's policy support follows a typology of different behaviours, each describing | ||
a different behaviour. By default this typology is stored in ``data/policy/policy_typology.json``. | ||
|
||
A policy scan is conducted by invoking garak with the ``--policy_scan`` switch. | ||
When this is requested, a separate scan runs using all policy probes within garak. | ||
Policy probes are denoted by a probe class asserting ``policy_probe=True``. | ||
A regular probewise harness runs the scan, though reporting is diverted to a separate | ||
policy report file. After completion, garak estimates a policy based on policy probe | ||
results, and writes this to both main and poliy reports. | ||
|
||
|
||
.. automodule:: garak.policy | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
|
||
"""Flow for invoking garak from the command line""" | ||
|
||
command_options = "list_detectors list_probes list_generators list_buffs list_config plugin_info interactive report version".split() | ||
command_options = "list_detectors list_probes list_policy_probes list_generators list_buffs list_config plugin_info interactive report version".split() | ||
|
||
|
||
def main(arguments=None) -> None: | ||
|
@@ -107,6 +107,12 @@ def main(arguments=None) -> None: | |
parser.add_argument( | ||
"--config", type=str, default=None, help="YAML config file for this run" | ||
) | ||
parser.add_argument( | ||
"--policy_scan", | ||
action="store_true", | ||
default=_config.run.policy_scan, | ||
help="determine model's behavior policy before scanning", | ||
) | ||
|
||
## PLUGINS | ||
# generator | ||
|
@@ -201,6 +207,9 @@ def main(arguments=None) -> None: | |
parser.add_argument( | ||
"--list_probes", action="store_true", help="list available vulnerability probes" | ||
) | ||
parser.add_argument( | ||
"--list_policy_probes", action="store_true", help="list available policy probes" | ||
) | ||
parser.add_argument( | ||
"--list_detectors", action="store_true", help="list available detectors" | ||
) | ||
|
@@ -398,6 +407,9 @@ def main(arguments=None) -> None: | |
elif args.list_probes: | ||
command.print_probes() | ||
|
||
elif args.list_policy_probes: | ||
command.print_policy_probes() | ||
|
||
elif args.list_detectors: | ||
command.print_detectors() | ||
|
||
|
@@ -425,6 +437,7 @@ def main(arguments=None) -> None: | |
|
||
print(f"📜 logging to {log_filename}") | ||
|
||
# set up generator | ||
conf_root = _config.plugins.generators | ||
for part in _config.plugins.model_type.split("."): | ||
if not part in conf_root: | ||
|
@@ -447,6 +460,7 @@ def main(arguments=None) -> None: | |
logging.error(message) | ||
raise ValueError(message) | ||
|
||
# validate main run config | ||
parsable_specs = ["probe", "detector", "buff"] | ||
parsed_specs = {} | ||
for spec_type in parsable_specs: | ||
|
@@ -470,20 +484,7 @@ def main(arguments=None) -> None: | |
msg_list = ",".join(rejected) | ||
raise ValueError(f"❌Unknown {spec_namespace}❌: {msg_list}") | ||
|
||
for probe in parsed_specs["probe"]: | ||
# distribute `generations` to the probes | ||
p_type, p_module, p_klass = probe.split(".") | ||
if ( | ||
hasattr(_config.run, "generations") | ||
and _config.run.generations | ||
is not None # garak.core.yaml always provides run.generations | ||
): | ||
_config.plugins.probes[p_module][p_klass][ | ||
"generations" | ||
] = _config.run.generations | ||
Comment on lines
-547
to
-557
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is this logic being captured? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see some, but not all of it below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this really make sense as a helper function in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a bit unclear for me, but given the current state of
So I'm tentatively placing it in |
||
|
||
evaluator = garak.evaluators.ThresholdEvaluator(_config.run.eval_threshold) | ||
|
||
# generator init | ||
from garak import _plugins | ||
|
||
generator = _plugins.load_plugin( | ||
|
@@ -500,6 +501,18 @@ def main(arguments=None) -> None: | |
logging=logging, | ||
) | ||
|
||
# looks like we might get something to report, so fire that up | ||
command.start_run() # start the run now that all config validation is complete | ||
print(f"📜 reporting to {_config.transient.report_filename}") | ||
|
||
# do policy run | ||
if _config.run.policy_scan: | ||
command.run_policy_scan(generator, _config) | ||
|
||
# configure generations counts for main run | ||
_config.distribute_generations_config(parsed_specs["probe"], _config) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Config should not change after a |
||
|
||
# autodan action | ||
if "generate_autodan" in args and args.generate_autodan: | ||
from garak.resources.autodan import autodan_generate | ||
leondz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
@@ -513,15 +526,17 @@ def main(arguments=None) -> None: | |
) | ||
autodan_generate(generator=generator, prompt=prompt, target=target) | ||
|
||
command.start_run() # start the run now that all config validation is complete | ||
print(f"📜 reporting to {_config.transient.report_filename}") | ||
# set up plugins for main run | ||
# instantiate evaluator | ||
evaluator = garak.evaluators.ThresholdEvaluator(_config.run.eval_threshold) | ||
|
||
# parse & set up detectors, if supplied | ||
if parsed_specs["detector"] == []: | ||
command.probewise_run( | ||
run_result = command.probewise_run( | ||
generator, parsed_specs["probe"], evaluator, parsed_specs["buff"] | ||
) | ||
else: | ||
command.pxd_run( | ||
run_result = command.pxd_run( | ||
generator, | ||
parsed_specs["probe"], | ||
parsed_specs["detector"], | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
import logging | ||
import json | ||
import random | ||
import re | ||
|
||
HINT_CHANCE = 0.25 | ||
|
||
|
@@ -56,7 +57,7 @@ def start_run(): | |
|
||
logging.info("run started at %s", _config.transient.starttime_iso) | ||
# print("ASSIGN UUID", args) | ||
if _config.system.lite and "probes" not in _config.transient.cli_args and not _config.transient.cli_args.list_probes and not _config.transient.cli_args.list_detectors and not _config.transient.cli_args.list_generators and not _config.transient.cli_args.list_buffs and not _config.transient.cli_args.list_config and not _config.transient.cli_args.plugin_info and not _config.run.interactive: # type: ignore | ||
if _config.system.lite and "probes" not in _config.transient.cli_args and not _config.transient.cli_args.list_probes and not _config.transient.cli_args.list_policy_probes and not _config.transient.cli_args.list_detectors and not _config.transient.cli_args.list_generators and not _config.transient.cli_args.list_buffs and not _config.transient.cli_args.list_config and not _config.transient.cli_args.plugin_info and not _config.run.interactive: # type: ignore | ||
hint( | ||
"The current/default config is optimised for speed rather than thoroughness. Try e.g. --config full for a stronger test, or specify some probes.", | ||
logging=logging, | ||
|
@@ -160,12 +161,14 @@ def end_run(): | |
logging.info(msg) | ||
|
||
|
||
def print_plugins(prefix: str, color): | ||
def print_plugins(prefix: str, color, filter=None): | ||
from colorama import Style | ||
|
||
from garak._plugins import enumerate_plugins | ||
|
||
plugin_names = enumerate_plugins(category=prefix) | ||
if filter is None: | ||
filter = {} | ||
plugin_names = enumerate_plugins(category=prefix, filter=filter) | ||
plugin_names = [(p.replace(f"{prefix}.", ""), a) for p, a in plugin_names] | ||
module_names = set([(m.split(".")[0], True) for m, a in plugin_names]) | ||
plugin_names += module_names | ||
|
@@ -182,7 +185,13 @@ def print_plugins(prefix: str, color): | |
def print_probes(): | ||
from colorama import Fore | ||
|
||
print_plugins("probes", Fore.LIGHTYELLOW_EX) | ||
print_plugins("probes", Fore.LIGHTYELLOW_EX, filter={"policy_probe": False}) | ||
|
||
|
||
def print_policy_probes(): | ||
from colorama import Fore | ||
|
||
print_plugins("probes", Fore.LIGHTYELLOW_EX, filter={"policy_probe": True}) | ||
|
||
|
||
def print_detectors(): | ||
|
@@ -234,14 +243,14 @@ def probewise_run(generator, probe_names, evaluator, buffs): | |
import garak.harnesses.probewise | ||
|
||
probewise_h = garak.harnesses.probewise.ProbewiseHarness() | ||
probewise_h.run(generator, probe_names, evaluator, buffs) | ||
return list(probewise_h.run(generator, probe_names, evaluator, buffs)) | ||
|
||
|
||
def pxd_run(generator, probe_names, detector_names, evaluator, buffs): | ||
import garak.harnesses.pxd | ||
|
||
pxd_h = garak.harnesses.pxd.PxD() | ||
pxd_h.run( | ||
return pxd_h.run( | ||
generator, | ||
probe_names, | ||
detector_names, | ||
|
@@ -273,3 +282,60 @@ def write_report_digest(report_filename, digest_filename): | |
digest = report_digest.compile_digest(report_filename) | ||
with open(digest_filename, "w", encoding="utf-8") as f: | ||
f.write(digest) | ||
|
||
|
||
POLICY_MSG_PREFIX = "run_policy_scan" | ||
|
||
|
||
def _policy_scan_msg(text): | ||
print(f"🏛️ {text}") | ||
logging.info(f"{POLICY_MSG_PREFIX}: {text}") | ||
|
||
|
||
def run_policy_scan(generator, _config): | ||
|
||
from garak._config import distribute_generations_config | ||
from garak._plugins import enumerate_plugins | ||
import garak.evaluators | ||
import garak.policy | ||
|
||
main_reportfile = _config.transient.reportfile | ||
policy_report_filename = re.sub( | ||
r"\.jsonl$", ".policy.jsonl", _config.transient.report_filename | ||
) | ||
_policy_scan_msg(f"policy report in {policy_report_filename}") | ||
_config.transient.reportfile = open( | ||
policy_report_filename, "w", buffering=1, encoding="utf-8" | ||
) | ||
|
||
logging.info(f"{POLICY_MSG_PREFIX}: start policy scan") | ||
# this is a probewise run of all policy probes | ||
policy_probe_names = [ | ||
name | ||
for name, status in enumerate_plugins( | ||
"probes", filter={"active": True, "policy_probe": True} | ||
) | ||
] | ||
_policy_scan_msg("using policy probes " + ", ".join(policy_probe_names)) | ||
|
||
evaluator = garak.evaluators.ThresholdEvaluator(garak._config.run.eval_threshold) | ||
distribute_generations_config(policy_probe_names, _config) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you expand on the comment re: immutability? I had expected this related to an attempt to change On the other hand, the pattern for accessing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Access to |
||
buffs = [] | ||
result = probewise_run(generator, policy_probe_names, evaluator, buffs) | ||
|
||
policy = garak.policy.Policy() | ||
policy.parse_eval_result(result, threshold=garak._config.policy.threshold) | ||
policy.propagate_up() | ||
|
||
policy_entry = {"entry_type": "policy", "policy": policy.points} | ||
_config.transient.reportfile.write(json.dumps(policy_entry) + "\n") | ||
|
||
_config.transient.reportfile.close() | ||
_config.transient.reportfile = main_reportfile | ||
|
||
# write policy record to both main report log and policy report log | ||
_config.transient.reportfile.write(json.dumps(policy_entry) + "\n") | ||
|
||
_policy_scan_msg("end policy scan") | ||
|
||
return policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define the policy codes in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's in
garak/data/policy/policy_typology.json
: https://github.com/leondz/garak/pull/955/files#diff-00beff92463bd705bbab517aa9130ebc01ab11d797b72a80f08a40c5277a8573 - is this OK?