tests : add integration test for lora adapters #8957

ltoniazzi · 2024-08-09T19:43:41Z

Context

Partially solving Issue #8708, by adding a test that:

Downloads reduced-sized llms for several architectures in .safetensors (where each architecture is overfitted to some text ) along with a corresponding adapter (overfitted to another text).
Converts base and lora models to .gguf.
Runs inference asserting the overfitted texts are returned as completions with llama-cli (--lora)

TODOs

Figure out why lora adapter for Gemma-2 is not working (mix of last layer=first layer and setting temp=0).
Add llama-3 (working correctly)
Add more Phi-3 (working correctly).
~~make models smaller (size 32?, now 64-> ~50/70MB)~~. (Getting input size issues in gguf with llama at 32)
Add assertions.
~~Figure out how sh files works in the actions. The script needs llama-cli, python env and git lfs~~ In a separate PR, follow .github/workflows/server.yml
Currently models downloaded from this personal hf repo, so need to move the models to ggml-org. (Models generated with code here). Created repo lora-tests

Current output

./tests/test-lora-conversion-inference.sh --verbose:

PR checks

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ltoniazzi · 2024-08-10T13:49:40Z

@ngxson I've done it for Llama-3 and Gemma-2.

✅ Llama-3 is working fine (see image in this PR description)!
❌ Gemma-2 I'm still struggling to make the adapter work. Will have to look into it a bit more.

In the meantime, I was planning to do Phi-3 now. Which other architectures are popular in llama.cpp to add a couple more?

ngxson · 2024-08-11T22:32:03Z

tests/test_lora_conversion_and_inference.sh

+
+    echo "Running llama-cli with exported lora for $model_name with size $size_matrix..."
+    OUTPUT_LORA_MERGED=$(llama-cli -m $MODELS_REPO/$model_name/size=$size_matrix/base/Base-F32-lora-merged.gguf \
+    -p "I see a little silhouetto" -n 50 --seed 42)


I think you're missing --temp 0 for consistent result.

ltoniazzi · 2024-08-12T09:50:33Z

@ngxson Phi-3 is also working fine, and I added assertions in the .sh test file. I think we can finish off this PR by:

By setting it up to run correctly in the actions (not sure if it needs more setup)
Moving models to the ggml hf repo.

For Gemma-2 I'll try figure out where the bug is this week (as it could very well be in my code to reduce the size of the model) and add it as a todo in the issue #8708.

ltoniazzi · 2024-08-16T07:16:58Z

@ngxson @mofosyne I fixed and added the Gemma2 model to the tests (the bug was in my code from the fact that Gemma embedding weights equal the lm_head weights , which I think can cause some issues when converting adapters, I'll open an issue about it this weekend)

Anyway, to complete this PR I still need to:

See if new test script runs in the git-actions (not sure if it needs more setup). How to trigger an actions run?
Move models to the ggml hf repo. Who should I contact to do this?

ggerganov

Approved the PR so that the CI can run when you push new commits.

I'll add you to https://huggingface.co/ggml-org - let me know your HF username

ltoniazzi · 2024-08-16T07:42:28Z

I'll add you to https://huggingface.co/ggml-org - let me know your HF username

@ggerganov Thanks! My HF Username is ltoniazzi

ltoniazzi · 2024-08-16T08:09:23Z

@ggerganov Should I create a new model repo in ggml-org called reduced-models-for-testing?

I'm not sure what is a good naming, as also I want to make sure people do not think it's related to reducing size by quantization (basically these "reduced" models keep the original architecture the same apart from reducing the hidden_size parameter).

ggerganov · 2024-08-16T08:48:03Z

Should I create a new model repo in ggml-org called reduced-models-for-testing?

Yes, make a new repo. Can call it lora-tests to avoid confusion

ngxson · 2024-08-16T09:19:52Z

See if new test script runs in the git-actions (not sure if it needs more setup). How to trigger an actions run?

I'm not sure if you will need special permissions to create a new github workflow. The code for CI can be based on .github/workflows/server.yml

But I think I can do this later on. I'll need to see what's the condition to trigger the workflow, because I think it's not useful to run the workflow on all commits.

Move models to the ggml hf repo.

I see you now have access to ggml-org. Ideally I think each model should be in a dedicated repo (since you're also uploading safetensors files, not just GGUF). The model can be prefixed, for example test-llama or test-gemma. The small models can also be used in other tests, not just lora test.

For adapters, you can also upload the PEFT safetensors files, one repo per adapter, with prefix lora-test-*

ltoniazzi · 2024-08-16T10:19:08Z

@ngxson Thanks, but before refactoring the repo can we make the current test run successfully?

I added the test to the Make/CMake files so it runs locally with make test, but it fails in the build because it's running the python scripts to convert base/adapters and it seems the build environment does not have python dependencies.
What is the best approach to fix this?

ltoniazzi · 2024-08-16T10:25:10Z

(since you're also uploading safetensors files, not just GGUF).

@ngxson I am uploading only safetensors, as the test includes conversion to gguf of both base and adapter.

ngxson · 2024-08-16T10:27:00Z

make test is only used for ctest, so I prefer not to add the test there. For lora test (it's e2e test), it's better to have its own workflow.

ltoniazzi · 2024-08-16T10:34:13Z

@ngxson If you point me to a base workflow that I can mimic I can have a go at it. Or do you prefer to merge the new test script alone and create the new workflow separately?

ngxson · 2024-08-16T10:43:51Z

I think it's ok to merge this PR as-is, please tell me when the models are uploaded to huggingface so I can run the test locally.

If you still want to try out the workflow:

The code for CI can be based on .github/workflows/server.yml

Server workflow already have python and cmake, you can take it as the template.

ltoniazzi · 2024-08-16T10:52:55Z

please tell me when the models are uploaded to huggingface

@ngxson They are uploaded, you can run the script locally

./tests/test-lora-conversion-inference.sh --verbose

Ok let's merge this, then I'll have a go at the workflow in a separate PR and we can discuss the model's folder structure in the meantime.

ltoniazzi · 2024-08-16T20:00:32Z

@ngxson After you merge this I prepared the workflow PR here #9058.

ngxson · 2024-08-18T09:57:51Z

I made some minor changes in this commit: d6f7b8f

I ran it locally and confirmed that it works. So it's good to merge now. Thanks for taking your time on this.

* Add printing to check weights match torch version * minor code style changes --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

github-actions bot added the testing Everything test related label Aug 9, 2024

mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix bugfix fixes an issue or bug labels Aug 10, 2024

ltoniazzi changed the title ~~[WIP] Add test for --lora flag~~ [WIP] Add integration test for lora adapters Aug 10, 2024

ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 9b4f2ff to 35d04e7 Compare August 10, 2024 13:15

ngxson reviewed Aug 11, 2024

View reviewed changes

ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 9aa01b7 to 56a8992 Compare August 12, 2024 10:05

ltoniazzi changed the title ~~[WIP] Add integration test for lora adapters~~ Add integration test for lora adapters Aug 13, 2024

ltoniazzi force-pushed the testing/add-test-for-hot-lora branch 4 times, most recently from 5ebdefc to c19e4fa Compare August 16, 2024 07:09

ltoniazzi marked this pull request as ready for review August 16, 2024 07:09

ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from c19e4fa to 32ed624 Compare August 16, 2024 07:23

ggerganov approved these changes Aug 16, 2024

View reviewed changes

ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 32ed624 to 163f43f Compare August 16, 2024 09:57

Add printing to check weights match torch version

9a8f050

ltoniazzi force-pushed the testing/add-test-for-hot-lora branch from 163f43f to 9a8f050 Compare August 16, 2024 10:49

ltoniazzi mentioned this pull request Aug 16, 2024

Add lora test workflow (WIP) #9058

Open

8 tasks

minor code style changes

d6f7b8f

ngxson changed the title ~~Add integration test for lora adapters~~ tests: add integration test for lora adapters Aug 18, 2024

ngxson changed the title ~~tests: add integration test for lora adapters~~ tests : add integration test for lora adapters Aug 18, 2024

ngxson merged commit 2339a0b into ggerganov:master Aug 18, 2024
8 checks passed

ltoniazzi deleted the testing/add-test-for-hot-lora branch August 19, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests : add integration test for lora adapters #8957

tests : add integration test for lora adapters #8957

ltoniazzi commented Aug 9, 2024 •

edited

Loading

ltoniazzi commented Aug 10, 2024 •

edited

Loading

ngxson Aug 11, 2024

ltoniazzi commented Aug 12, 2024

ltoniazzi commented Aug 16, 2024

ggerganov left a comment

ltoniazzi commented Aug 16, 2024

ltoniazzi commented Aug 16, 2024

ggerganov commented Aug 16, 2024

ngxson commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 16, 2024

ngxson commented Aug 16, 2024

ltoniazzi commented Aug 16, 2024

ngxson commented Aug 16, 2024

ltoniazzi commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 16, 2024

ngxson commented Aug 18, 2024

tests : add integration test for lora adapters #8957

tests : add integration test for lora adapters #8957

Conversation

ltoniazzi commented Aug 9, 2024 • edited Loading

Context

TODOs

Current output

PR checks

ltoniazzi commented Aug 10, 2024 • edited Loading

ngxson Aug 11, 2024

Choose a reason for hiding this comment

ltoniazzi commented Aug 12, 2024

ltoniazzi commented Aug 16, 2024

ggerganov left a comment

Choose a reason for hiding this comment

ltoniazzi commented Aug 16, 2024

ltoniazzi commented Aug 16, 2024

ggerganov commented Aug 16, 2024

ngxson commented Aug 16, 2024 • edited Loading

ltoniazzi commented Aug 16, 2024 • edited Loading

ltoniazzi commented Aug 16, 2024

ngxson commented Aug 16, 2024

ltoniazzi commented Aug 16, 2024

ngxson commented Aug 16, 2024

ltoniazzi commented Aug 16, 2024 • edited Loading

ltoniazzi commented Aug 16, 2024

ngxson commented Aug 18, 2024

ltoniazzi commented Aug 9, 2024 •

edited

Loading

ltoniazzi commented Aug 10, 2024 •

edited

Loading

ngxson commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 16, 2024 •

edited

Loading

ltoniazzi commented Aug 16, 2024 •

edited

Loading