-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In Llama model, only the embedding layer is converted to lora layer. #14
Comments
Without code to look at, I can only speculate that it's because the linear layers aren't being converted by:
or you're not calling Since you're getting the embedding weights (which likely sit in the top module) I would suspect you might only be calling the top module's I'm trying to finetune a Phi-3 model myself.
Here's the Phi model's top module: #[replace_layer_fields]
#[derive(Debug, Clone, AutoLoraConvert)]
pub struct PhiModel {
embed_tokens: Embedding,
layers: Vec<DecoderLayer>,
norm: RmsNorm,
lm_head: Linear,
device: Device,
dtype: DType,
} This would convert the
where in each self.load , get_lora_model is called for each module's child modules.
I'm not sure if
and new way:
Another thing to keep in mind is that when doing: let mut optimizer = candle_nn::SGD::new(varmap.all_vars(), 0.003).unwrap(); you'll have both lora and normal variables in the varmap. Unclear if this causes issues/slowdown/OOM when training. Haven't tried. I think you can filter that out by doing: let vars = varmap
.data()
.lock()
.unwrap()
.iter()
.filter(|s| s.0.contains("lora))
.collect::<Vec<_>>();
let mut optimizer = candle_nn::SGD::new(vars, 0.003).unwrap(); since you can name the variables with However, I'm getting OOM on only let num_params = vars
.iter()
.map(|s| s.1.shape().dims().iter().product::<usize>()) // assuming vectors and matrices of weights
.sum::<usize>(); while running Phi-3 with no input on an RTX 3090 24GB. So not entirely sure I'm doing this correctly either. Perhaps @EricLBuehler can give some inputs? |
Found this: But the following modification (using HashMap) doesn't help with OOM: pub fn from_mmaped_safetensors<'a, P: AsRef<Path>>(
paths: &[P],
dtype: DType,
device: &Device,
silent: bool,
) -> Result<VarBuilderArgs<'a, Box<dyn SimpleBackend>>, Error> {
let mut map = HashMap::new();
{
let tensors = unsafe { candle_core::safetensors::MmapedSafetensors::multi(paths)? };
if silent {
for (name, _) in tensors.tensors() {
let tensor = tensors
.load(&name, device)?
.to_device(device)?
.to_dtype(dtype)?;
map.insert(name.clone(), tensor);
}
} else {
for (name, _) in tensors.tensors().iter() {
let tensor = tensors
.load(name, device)?
.to_device(device)?
.to_dtype(dtype)?;
map.insert(name.clone(), tensor);
}
};
}
Ok(VarBuilder::from_tensors(map, dtype, device))
} |
@AntBlo memory usage of backprop is very high, what is your GPU memory capacity? |
From nvidia-smi: Put this on the back burner for a bit, but if there's anything I can test then let me know |
I tried to fine tune TinyLlama with this crate. After training, the safetensors saved only contains two tensors:
I expand the macro in mod llama and find that these two layers will be used in embedding layers.
So none of linear layer in the self-attention block is converted to lora layer. When I use my fine-tuned model, it behave exactly the same as before.
The text was updated successfully, but these errors were encountered: