Save/Load Just One Sequence #5843

martindevans · 2024-03-03T01:46:36Z

Feature Description

Would it be possible to create functions that looked something like this:

llama_kv_save_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * dst);
llama_kv_load_seq(struct llama_context * ctx, llama_seq_id seq_id, uint8_t * src);

Motivation

In llama.cpp it is possible to save and load the entire context state in one operation with llama_copy_state_data and llama_set_state_data. For example this could be used to evaluate a large system prompt once, save it to disk, and then load the state every time a new conversation is started.

However with the batch decoding this isn't really possible. If you have many sequences being evaluated at once you can only load and save them all simultaneously.

The text was updated successfully, but these errors were encountered:

ngxson · 2024-03-03T11:20:57Z

+1 for this, also, I'd prefer to have a struct save_config to llama_session_save / llama_session_load.

The reason is because we may have other save options in the future, for example in my playground I'm experimenting with the ability to use f16 KV cache, but save/load them as q4_K.

With that, we can also choose save / not to save embeddings / logits

kaetemi · 2024-03-09T11:51:22Z

+1, for saving and loading cache of individual slots in server.

kaetemi · 2024-03-27T04:40:49Z

Working on this. :)

kaetemi · 2024-04-20T07:12:57Z

Implemented as llama_state_seq_get_size, llama_state_seq_get_data, and llama_state_seq_set_data in commit beea6e1.

martindevans added the enhancement New feature or request label Mar 3, 2024

kaetemi mentioned this issue Mar 27, 2024

llama : save and restore kv cache for single seq id #6341

Merged

kaetemi closed this as completed Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save/Load Just One Sequence #5843

Save/Load Just One Sequence #5843

martindevans commented Mar 3, 2024

ngxson commented Mar 3, 2024

kaetemi commented Mar 9, 2024

kaetemi commented Mar 27, 2024

kaetemi commented Apr 20, 2024

Save/Load Just One Sequence #5843

Save/Load Just One Sequence #5843

Comments

martindevans commented Mar 3, 2024

Feature Description

Motivation

ngxson commented Mar 3, 2024

kaetemi commented Mar 9, 2024

kaetemi commented Mar 27, 2024

kaetemi commented Apr 20, 2024