Question: How to use Stanford HELM for local model setup? #1858

Anindyadeep · 2023-09-25T10:28:10Z

Prologue

Here is my understanding of the high-level architecture of HELM (Please correct me if I am wrong here). HELM acts as a client that sends some texts (from the benchmark dataset) to some server (HuggingFace, Open AI, etc...), and the string that is returned back to the client is then evaluated through HELM. The models we are interested in are written down in the spec file. When it comes to local, we provide neurips/local so that it knows we are using a local model.

NeurIPS Efficiency Challenge

NeurIPS's LLM challenge recently very popular. There is an implementation in the sample submission folder, by Lightning AI and LlamaRecipies. I am taking Lightning AI's implementation in this case. They implemented a simple FastAPI server, with two endpoints /tokenize and /process.

My questions

Is it safe to believe that, HELM expects to have these two API endpoints so that it can send the requests and evaluate them?
Is there any other way to evaluate local models? Right now, we are doing this through neurips/local, can I do it through <myusername>/local? I skimmed through the documentation and seems like we can add more models, but did not get clear.
Even when neurips/local is set, how does HELM get to know there is some localhost being set up and it needs to send the requests there? Does it look for some specific port?
With the current configuration, can I evaluate with multiple LLMs?
Do, I always have to set a client-server setting, or is there a functional or modular way to do the same?
Are there any metrics or benchmarks that require external internet access? Like today if I set the HELM client and my compute server (that contains the LLM) all locally (without internet), will all the evaluation benchmarks work?

The HELM package is super helpful and an awesome initiative by Stanford CRFM for evaluating LLMs with such huge number of scenarios, benchmarks and metrics.

The text was updated successfully, but these errors were encountered:

yifanmai · 2023-09-26T17:15:17Z

Is it safe to believe that, HELM expects to have these two API endpoints so that it can send the requests and evaluate them?

That is correct.

Is there any other way to evaluate local models? Right now, we are doing this through neurips/local, can I do it through /local? I skimmed through the documentation and seems like we can add more models, but did not get clear.

Yes, you will eventually be able to change the model name to whatever you want, and also add multiple model names. Support will be added in #1861 which will be merged in the next couple of days. If you'd like to try it out first, you can use that branch

Even when neurips/local is set, how does HELM get to know there is some localhost being set up and it needs to send the requests there? Does it look for some specific port?

Currently it only requires the model to be at http://localhost:8080 (see here) but #1861 will allow you to change the URL and port to anything you want, including non-localhost URLs.

With the current configuration, can I evaluate with multiple LLMs?

Multiple LLMs will be supported by #1861.

Do, I always have to set a client-server setting, or is there a functional or modular way to do the same?

Yes, we support a few other ways. The most relevant ones are:

If you're using Lit-GPT, we will soon support evaluating directly without a client-server setup in Lit-GPT model integration #1783 which will be landed hopefully today.
If you are using Hugging Face, you can evaluate using a Hugging Face checkpoint. If you're interested, I can provide more instructions on how to do this.

Are there any metrics or benchmarks that require external internet access? Like today if I set the HELM client and my compute server (that contains the LLM) all locally (without internet), will all the evaluation benchmarks work?

Currently HELM requires an internet connection, but we do want to support running it without an internet connection eventually. If this is an high priority issue for you, please file new issues for any issues you find.

All benchmarks require downloading the external datasets to local disk. To run this in offline mode, you need to pre-populate your local dataset cache in benchmark_output/scenarios by running all the runs specs once (e.g. using a debugging model like simple/model). Subsequent runs will use the local cache rather than fetching the dataset from the internet.
Some metrics require downloading external files or calling an external API. For instance, toxicity_metrics requires calling Perspective API. We should probably provide an offline mode that disables these metrics.

Anindyadeep · 2023-09-26T18:47:44Z

Thank you so much @yifanmai for the descriptive answers. This is really helpful. I will file an issue for the offline setup if it gets in our priority too.

pongib · 2023-09-27T14:22:42Z

Hello @yifanmai ,

I came across the response where you mentioned,

If you are using Hugging Face, you can evaluate using a Hugging Face checkpoint. If you're interested, I can provide more instructions on how to do this.

I have checked the documentation at https://crfm-helm.readthedocs.io/en/latest/huggingface_models/ regarding evaluating using a Hugging Face checkpoint. However, I wanted to confirm if there are any additional instructions or recommendations beyond what's covered in that documentation. I would greatly appreciate any further guidance on this matter.

Thank you for your assistance and the excellent work on HELM!

yifanmai · 2023-09-27T23:10:47Z

Regarding the Hugging Face Hub model integration, the documentation should cover all of the main functionality.

Additionally, there is a hidden experimental flag --enable-local-huggingface-models that will let you evaluate a Hugging Face model on local disk (not hub). If you'd like to try it out, there are instructions in the description of #1505. This is flag experimental and will probably be removed in the next release of HELM - we will soon provide a better way to evaluate local models.

anmolagarwal999 · 2023-10-17T11:01:23Z

@yifanmai
Regarding the server side code for the NeurIPS Efficiency Challenge, is it possible to know where exactly in HELM codebase is the request being made to the endpoints and the response is being parsed/evaluated ?

Also, what are the expected outputs (by HELM) when it makes the call to the /process and /tokenize endpoints. I tried to infer this by looking at the code the server side code currently looks slightly buggy (issue: llm-efficiency-challenge/neurips_llm_efficiency_challenge#47) \c @drisspg

msaroufim added the competition Support for the NeurIPS Large Language Model Efficiency Challenge label Sep 26, 2023

yifanmai added the user question label Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to use Stanford HELM for local model setup? #1858

Question: How to use Stanford HELM for local model setup? #1858

Anindyadeep commented Sep 25, 2023 •

edited

Loading

yifanmai commented Sep 26, 2023

Anindyadeep commented Sep 26, 2023

pongib commented Sep 27, 2023

yifanmai commented Sep 27, 2023

anmolagarwal999 commented Oct 17, 2023

Question: How to use Stanford HELM for local model setup? #1858

Question: How to use Stanford HELM for local model setup? #1858

Comments

Anindyadeep commented Sep 25, 2023 • edited Loading

Prologue

NeurIPS Efficiency Challenge

My questions

yifanmai commented Sep 26, 2023

Anindyadeep commented Sep 26, 2023

pongib commented Sep 27, 2023

yifanmai commented Sep 27, 2023

anmolagarwal999 commented Oct 17, 2023

Anindyadeep commented Sep 25, 2023 •

edited

Loading