-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Research] Steering vectors #1472
base: master
Are you sure you want to change the base?
Conversation
It's good to note that the authors of the post said they were going to try this out with vicuna-13B as well, so we can see how it generalizes accross different models |
Also, from a quick glance through your code I saw that the steering vector retrieval layer is always the same as the steering vector add layer. They also allow steering vectors sourced from earlier layers to be used at later layers, which might be necessary to get good behavior.
Did you source the steering vector from a lower layer? That's what they do. source = layer 2 Add = layer 20 Not source = layer 20 add = layer 20 |
I didn't notice that in the article, all mentions of layers are about only one layer and where they inserted it. But it should be easy to test. |
Fix typo Co-authored-by: Extra Dosages <extradosages@gmail.com>
I tried the code as-is and the parameters are clearly affecting the output, just not steering it. I ran through the code and if I understand it correctly, I think it's not computing the steering vector as is described in the post. Let me know if you understand what I mean and if you agree or not. |
I know that it is computing something because I added a dump of the vector to the disk and the arithmetic seems to be working, add, substract, positive and negative coefficient seem to be changing the vector as expected. I think maybe there is some difference between GPT-2 and LLaMa, that makes it not work as-is, could be that it needs a small tweak or something? |
It's possible. I'm experimenting with different inputs and layer sources and targets. It's clearly affecting the output, but it just seems kind of random so far |
I was not really seeing anything working until I used a fixed seed, otherwise the results are too random. I will try to test again over the weekend in some automated way. |
I think it's also important because of this note to use greedy sampling.
|
An author of the post confirmed that their method works well with vicuna 13B: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector?commentId=eket7tugMDJgBYfwP I tried LLaMa 13B but getting similarly poor results as LLaMa 7B. It makes me think there is something missing in this implementation, not sure what |
I found this notebook also linked in the article: https://colab.research.google.com/drive/1y84fhgkGX0ft2DmYJB3K13lAyf-0YonK Gonna look it over. |
Can you also check the review comment I wrote? I do think I found an actual mistake this time. Fixing it didn't improve the results though :( |
Is what they call the |
I think it's actually more close to EDIT: actually, modifying |
It's clearly doing more work that modifiny Note that I printed the add and sub prompts to stdout. Clearly it's "thinking" about business in some way. The quality's not quite there but I think this is getting a lot closer! |
It's very clearly being steered!
|
So far I'm getting the best results with |
Ok I'm getting really good results now:
It's very very clearly being steered towards talking about programming in python. Before this I was using the I'll post my changes in a separate branch so you can compare, not all changes are relevant. |
I even got it to read a question in English and answer in Dutch (my language) It's not at all reliable but it's very clearly being steered into the direction of including a dutch answer
Here are a couple of outputs that I found interesting
(Being human is not a "sort", but a way of living. I'm convinced this is the last time in my life I'll find an article this good) Here I lowered the temperature from 1.0 to 0.7, getting a really good answer.
Translated by gpt-4 (I need to go to sleep otherwise I'd translate it myself xD)
My god, it's actually getting really clever in here:
|
Signed-off-by: Henri Vasserman <henv@hot.ee>
Awesome stuff. 🚀
They didn't manage it with GPT-2 but seems like LLaMa is much better.
Me too and I'm an hour later from you Some more experimentation ideas:
|
Wedding example works now: main -m ../models/llama-7b-q4_0.bin -n 64 --seed 123 \
--steering-add "I talk about weddings constantly" \
--steering-sub "I do not talk about weddings constantly" \
--steering-source 5 \
--steering-layer 5 \
--steering-mul 3 \
--prompt "I went up to my friend and said, '"
Something that you have to take care of: the prompt has to be longer than the steering, otherwise it can cause interference. That's why I had to add |
Ooh that explains a lot of the weirder outputs I was seeing. A lot of them were copying steering input verbatim in a strange way and then continuing as if that never happened.. I wonder if that affects the idea of possibly being able to inject something like a system prompt in the steering though. It's something I want to look into. |
What would also be really interesting is to see whether cached residual streams from smaller models would work in larger models. I'll have a look at the architecture of llama to see if that idea makes any sense. It would be insane if we'd be able to use really fast, small models to perform the steering for slower, larger models. Edit: this wouldn't likely work well. Each llama version is its own independently trained model. It's not the case that they only trained one large model and somehow pruned it to smaller ones like my idea depends on. This "might" still work if the earlier layers emerged to be similar because of how machine learning models converge to similar lower layers (I recall from image recognition models that the lower layers basically always end up with the same features. Recognizing basic schapes, lines, orientation etc) It's not super likely though, since the ordering of the layer is most likely still completely different even if the features it detects are similar. Still wouldn't hurt to try though but not getting my hopes up. |
I don't think it's likely to work. The training usually starts from a random state. The more important factor is that the embedding size is different: 4096 for 7B, 5120 for 13B etc. |
|
It's not random, but it takes a lot of trial and error to find what works well for a certain use case. Generally it works well to have both low, at around 6 or 8. But for other usecases you might want to try different values. |
I found that most of the time they should be the same. Lower numbers are for the lexical level so you can make it say dirty words, while higher numbers are more abstract which can change more the understanding of things the model has but it is harder to influence it at that level. But it is possible to use the vector extracted from one layer and use it in another, too, sometimes it works. But really, there is a lot to research here. None of it is very rigorous yet. I haven't had time to do more testing. |
This is nothing more than an appreciative observer providing a place for people to put thumbs-up emojis to encourage @Azeirah and @SlyEcho to remember and further explore this really interesting thread of research activity.. Finding ways to 'tilt' a model is a super interesting concept (as are things which help visualise the state of the network..) Best regards from the wider observing llama.cpp community. |
I still think this is a really cool idea, but I'm not sure if context free guidance offers similar benefits? Although it doesn't have a positive prompt I suppose.
I also think there is still a lot to be explored with steering vectors, especially in the area of stacking steering vectors. IE what happens if you add a steering vector for "+python -Ruby" and "+teacher explains code -As a large language model" or something like that? The cool thing is lets you set a goal and a personality for your AI without affecting performance and no finetuning needed. From what I understand, the parameter space of LLMs is so huge that you should be able to just additively stack steering vectors without them affecting each other in unwanted ways too much. |
It needs to be updated at least. CFG is similar but it works on the token probability level. The steering was producing pretty neat results but it had the limitation that the length of the vector meant the influence was made in the beginning only. Also choosing the layers etc was a bit experimental. |
For #1460
Original paper: Steering GPT-2-XL by adding an activation vector
TODO: make a test script for all their examples and try to find the effect of the parameters.
I also wanted to see what the vectors look like so I imported them to Numpy and plotted them: