Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holistic Evaluation of Text-to-Image Models (HEIM) #1939

Merged
merged 109 commits into from
Dec 21, 2023
Merged

Holistic Evaluation of Text-to-Image Models (HEIM) #1939

merged 109 commits into from
Dec 21, 2023

Conversation

teetone
Copy link
Member

@teetone teetone commented Oct 24, 2023

Merging HEIM changes (scenarios, models, and metrics) changes into HELM.

@teetone teetone requested a review from yifanmai October 24, 2023 20:44
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising, much better than the last PR!

Partial review, since 14k loc is too much for me to review at once.

Also, can we arrange a mini design review meeting for the schema changes?

scripts/offline_eval/deepfloyd/deepfloyd.py Outdated Show resolved Hide resolved
setup.cfg Show resolved Hide resolved
setup.cfg Show resolved Hide resolved
src/helm/benchmark/adaptation/adapter_spec.py Outdated Show resolved Hide resolved
src/helm/benchmark/adaptation/adapter_spec.py Outdated Show resolved Hide resolved
src/helm/benchmark/scenarios/scenario.py Outdated Show resolved Hide resolved
src/helm/common/request.py Show resolved Hide resolved
src/helm/benchmark/metrics/statistic.py Outdated Show resolved Hide resolved
scripts/offline_eval/deepfloyd/deepfloyd.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
setup.cfg Show resolved Hide resolved
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, will do the rest in a couple of hours.

src/helm/proxy/clients/clip_score_client.py Show resolved Hide resolved
src/helm/proxy/clients/auto_client.py Outdated Show resolved Hide resolved
src/helm/config/model_metadata.yaml Show resolved Hide resolved
src/helm/config/model_deployments.yaml Show resolved Hide resolved
src/helm/proxy/clients/image_generation/__init__.py Outdated Show resolved Hide resolved
src/helm/benchmark/run_expander.py Show resolved Hide resolved
src/helm/benchmark/test_model_properties.py Outdated Show resolved Hide resolved
src/helm/proxy/services/remote_service.py Show resolved Hide resolved
src/helm/common/file_caches/file_cache.py Outdated Show resolved Hide resolved
src/helm/common/file_caches/local_file_cache.py Outdated Show resolved Hide resolved
Comment on lines +71 to +73
# Initialize `FileCache` for text-to-image model APIs
local_file_cache_path: str = os.path.join(self.cache_path, "output", host_organization)
file_cache: FileCache = LocalFileCache(local_file_cache_path, file_extension="png")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make file_cache a provider_binding rather than a constant_binding i.e. lazily create the folder in a lambda.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind explaining provider_binding vs constant_binding a bit more and sharing how the lambda looks like?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For provider_binding, the value should be a parameter-less function that returns the value, rather than the value itself. It can either be a lambda or a named method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll send you a PR for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened PR #2175 for this.

with open(file_path, "wb") as f:
f.write(compute())

return file_path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be relative to base_path, as opposed to relative to the current working directory? Currently this means that I can't change my cache from "prod_env/cache" to anything else without also having to migrate all my requests strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we need the images when computing metrics, but the metrics are not aware of base_path. They just have the paths in ScenarioState.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this pull request, the MediaObject in RequestResponse in the ScenarioState requires a path. The HEIM pull requests populates this path with things like prod_env/cache/openai/77437ea482144bf7b9275a0acee997db.png. I have a bunch of issues with this:

  1. The name file cache, and the location of the cache in prod_env/cache, suggests that it is temporary storage. However, it is not actually temporary - prod_env/cache contains the only copy of generated images, and deleting prod_env/cache results in loss of data that comprises part of the scenario_state.json files.
  2. The paths in the responses will have the prod_env/cache prefix baked in which means that if the user sets the --local-path flag to something else, this breaks all paths in the responses cache and the scenario_state.json files, and the user has to do some manual migration.
  3. It's not really clear what relative file paths should be relative to - this pull request takes the position that it should be relative to the current working directory, but there are advantages to making it relative to the prod_env/cache , or to the benchmark_output.
  4. We do not upload anything in prod_env/cache, which means that the publicly uploaded scenario_state.json files will contain paths to unavailable files.
  5. Even if I wanted to upload the files in prod_env/cache to a web server, I would be forced to upload them with prod_env/cache as part of the URL so that the links keep working.
  6. Images from different run specs and different run suites are mixed, so I can't upload the images for a specific suite or run - I'd have to write a script to do this.

Some possible alternatives:

  1. Remove FileCache: Store the image in the response cache. Both SQLite and MongoDB have binary blob types that do this (or we can base64 encode, which is less efficient).
  2. Write out the files to the run folder. i.e. /benchmark_output/run/some_scenario:model=some_model/media/77437ea482144bf7b9275a0acee997db.png In MediaObject in RequestResponse in the ScenarioState, the paths will be media/77437ea482144bf7b9275a0acee997db.png - relative to the location of scenario_state.json.

Summarizing some stuff from a Slack converation:

@teetone:

I thought of another solution that is decent, quick to implement and addresses most of the issues you mentioned. We can cache the images in prod_env when the clients receive a request. When we generate the Scenario State, we can create a copy of the files in benchmark_output and specify the copied path in Scenario State, so benchmark_output will have the copy of the images. The downside is we now will have an extra copy of the images.

Me:

That sounds reasonable as well - a second copy is fine because the cache copy is temporary, so we only have one permanent copy.
It wouldn't address the baked-in paths in the response cache, but I would be okay with compromising on that.
Also, with this, we wouldn't need the base64 encoding any more - we can just link to the actual file. So we actually save on storage by eliminating the base64 encoded copy (because binary is more efficient than base64)

Percy:

Having a temporary cache in prod_env/cache makes sense because it mirrors the way we cache everything else.
But at that point, why not just store in mongo or the sqlite db?

To understand the proposal for scenario_state.json, the files would just be parallel to that? It seems like prod_env/cache should not appear in the paths.

@teetone teetone requested a review from yifanmai December 20, 2023 16:03
@yifanmai yifanmai merged commit 346f765 into main Dec 21, 2023
6 checks passed
@yifanmai yifanmai deleted the heim_merge branch December 21, 2023 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants