You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In llama.cpp it is possible to save and load the entire context state in one operation with llama_copy_state_data and llama_set_state_data. For example this could be used to evaluate a large system prompt once, save it to disk, and then load the state every time a new conversation is started.
However with the batch decoding this isn't really possible. If you have many sequences being evaluated at once you can only load and save them all simultaneously.
The text was updated successfully, but these errors were encountered:
+1 for this, also, I'd prefer to have a struct save_config to llama_session_save / llama_session_load.
The reason is because we may have other save options in the future, for example in my playground I'm experimenting with the ability to use f16 KV cache, but save/load them as q4_K.
With that, we can also choose save / not to save embeddings / logits
Feature Description
Would it be possible to create functions that looked something like this:
llama_kv_save_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * dst);
llama_kv_load_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * src);
Motivation
In llama.cpp it is possible to save and load the entire context state in one operation with
llama_copy_state_data
andllama_set_state_data
. For example this could be used to evaluate a large system prompt once, save it to disk, and then load the state every time a new conversation is started.However with the batch decoding this isn't really possible. If you have many sequences being evaluated at once you can only load and save them all simultaneously.
The text was updated successfully, but these errors were encountered: