-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Holistic Evaluation of Text-to-Image Models (HEIM) #1939
Conversation
…into heim_merge
…into heim_merge
…into heim_merge
…into heim_merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising, much better than the last PR!
Partial review, since 14k loc is too much for me to review at once.
Also, can we arrange a mini design review meeting for the schema changes?
src/helm/benchmark/metrics/image_generation/detectors/vitdet.py
Outdated
Show resolved
Hide resolved
…into heim_merge
…into heim_merge
…into heim_merge
…into heim_merge
…into heim_merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will do the rest in a couple of hours.
src/helm/benchmark/window_services/image_generation/clip_window_service.py
Show resolved
Hide resolved
src/helm/benchmark/window_services/image_generation/lexica_search_window_service.py
Show resolved
Hide resolved
# Initialize `FileCache` for text-to-image model APIs | ||
local_file_cache_path: str = os.path.join(self.cache_path, "output", host_organization) | ||
file_cache: FileCache = LocalFileCache(local_file_cache_path, file_extension="png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make file_cache
a provider_binding
rather than a constant_binding
i.e. lazily create the folder in a lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind explaining provider_binding
vs constant_binding
a bit more and sharing how the lambda looks like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For provider_binding
, the value should be a parameter-less function that returns the value, rather than the value itself. It can either be a lambda or a named method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll send you a PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened PR #2175 for this.
with open(file_path, "wb") as f: | ||
f.write(compute()) | ||
|
||
return file_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be relative to base_path
, as opposed to relative to the current working directory? Currently this means that I can't change my cache from "prod_env/cache" to anything else without also having to migrate all my requests strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is we need the images when computing metrics, but the metrics are not aware of base_path
. They just have the paths in ScenarioState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this pull request, the MediaObject
in RequestResponse
in the ScenarioState
requires a path. The HEIM pull requests populates this path with things like prod_env/cache/openai/77437ea482144bf7b9275a0acee997db.png
. I have a bunch of issues with this:
- The name file cache, and the location of the cache in
prod_env/cache
, suggests that it is temporary storage. However, it is not actually temporary -prod_env/cache
contains the only copy of generated images, and deletingprod_env/cache
results in loss of data that comprises part of thescenario_state.json
files. - The paths in the responses will have the
prod_env/cache
prefix baked in which means that if the user sets the--local-path
flag to something else, this breaks all paths in the responses cache and thescenario_state.json
files, and the user has to do some manual migration. - It's not really clear what relative file paths should be relative to - this pull request takes the position that it should be relative to the current working directory, but there are advantages to making it relative to the
prod_env/cache
, or to thebenchmark_output
. - We do not upload anything in
prod_env/cache
, which means that the publicly uploadedscenario_state.json
files will contain paths to unavailable files. - Even if I wanted to upload the files in
prod_env/cache
to a web server, I would be forced to upload them withprod_env/cache
as part of the URL so that the links keep working. - Images from different run specs and different run suites are mixed, so I can't upload the images for a specific suite or run - I'd have to write a script to do this.
Some possible alternatives:
- Remove FileCache: Store the image in the response cache. Both SQLite and MongoDB have binary blob types that do this (or we can base64 encode, which is less efficient).
- Write out the files to the run folder. i.e.
/benchmark_output/run/some_scenario:model=some_model/media/77437ea482144bf7b9275a0acee997db.png
InMediaObject
inRequestResponse
in theScenarioState
, the paths will bemedia/77437ea482144bf7b9275a0acee997db.png
- relative to the location ofscenario_state.json
.
Summarizing some stuff from a Slack converation:
I thought of another solution that is decent, quick to implement and addresses most of the issues you mentioned. We can cache the images in prod_env when the clients receive a request. When we generate the Scenario State, we can create a copy of the files in benchmark_output and specify the copied path in Scenario State, so benchmark_output will have the copy of the images. The downside is we now will have an extra copy of the images.
Me:
That sounds reasonable as well - a second copy is fine because the cache copy is temporary, so we only have one permanent copy.
It wouldn't address the baked-in paths in the response cache, but I would be okay with compromising on that.
Also, with this, we wouldn't need the base64 encoding any more - we can just link to the actual file. So we actually save on storage by eliminating the base64 encoded copy (because binary is more efficient than base64)
Percy:
Having a temporary cache in
prod_env/cache
makes sense because it mirrors the way we cache everything else.
But at that point, why not just store in mongo or the sqlite db?
To understand the proposal for
scenario_state.json
, the files would just be parallel to that? It seems likeprod_env/cache
should not appear in the paths.
Merging HEIM changes (scenarios, models, and metrics) changes into HELM.