Experiments with seeing the secret thoughts of LLM's

I had a play with Quiet-STaR

It has "private thoughts" that have never been tuned to human preferences. I was curious about it's private thought.

occasionally there is a glimpse of duplicity
mostly is garbled, or unrelated like regular CoT
curious about a larger model

The thoughts share these properties with normal Chain Of Thought

the conclusion is sometimes not faithfull to the reasoning
it's sometimes garbled (normal for a small 7b parameter model)

I think all these differences could become more distince in a larger model. And it's fascinating to see thoughts that have been trained to be effective, not to please humans. However some of the thoughts contradict each other, so it would be even nicer if we could somehow make sure that they are used, but this is an open research question.

Links:

Main notebook: https://github.com/wassname/quiet-star/blob/main/main2.ipynb
Tweet thread: https://twitter.com/wassname/status/1774709219884949928/photo/1

Quiet-STaR

Code for Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking.

This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer). Our patches were applied to Huggingface's transformers version 4.37.0.dev0 under src/transformers/models/mistral/ -- we cannot guarantee that other changes to their implementation will not affect our implementation, so for reproducibility, we encourage using the same version.

One pitfall to be wary of: the model is not taught not to generate start and end thought tokens. Thus, when performing actual inference, it is necessary to mask these out.

We make an 8-thought-token ahead (including start and end tokens) model available via Huggingface.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configuration_mistral.py		configuration_mistral.py
eval_helpers.py		eval_helpers.py
main2 copy.ipynb		main2 copy.ipynb
main2.ipynb		main2.ipynb
mjc_research_log.md		mjc_research_log.md
modeling_mistral.py		modeling_mistral.py
pyproject.toml		pyproject.toml
quiet-star-train.py		quiet-star-train.py
scratch.ipynb		scratch.ipynb
zero-shotcot-eval.py		zero-shotcot-eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiments with seeing the secret thoughts of LLM's

Quiet-STaR

About

Releases

Packages

Languages

License

wassname/quiet-star

Folders and files

Latest commit

History

Repository files navigation

Experiments with seeing the secret thoughts of LLM's

Quiet-STaR

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages