Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Llama model, only the embedding layer is converted to lora layer. #14

Open
Adamska1008 opened this issue Apr 9, 2024 · 5 comments
Open

Comments

@Adamska1008
Copy link

Adamska1008 commented Apr 9, 2024

I tried to fine tune TinyLlama with this crate. After training, the safetensors saved only contains two tensors:

lora_llama.b0
lora_llama.a0

I expand the macro in mod llama and find that these two layers will be used in embedding layers.

        pub fn get_lora_model<'a>(
            &'a mut self,
            lora_config: candle_lora::LoraConfig,
            vb: &candle_nn::VarBuilder,
            linear_config: Option<candle_lora::LoraLinearConfig>,
            conv1d_config: Option<candle_lora::LoraConv1dConfig>,
            conv2d_config: Option<candle_lora::LoraConv2dConfig>,
            embed_config: Option<candle_lora::LoraEmbeddingConfig>,
        ) {
            let mut linear: ::std::collections::HashMap<
                String,
                &dyn candle_lora::LinearLayerLike,
            > = ::std::collections::HashMap::new();
            let mut conv1d: ::std::collections::HashMap<
                String,
                &dyn candle_lora::Conv1dLayerLike,
            > = ::std::collections::HashMap::new();
            let mut conv2d: ::std::collections::HashMap<
                String,
                &dyn candle_lora::Conv2dLayerLike,
            > = ::std::collections::HashMap::new();
            let mut embed: ::std::collections::HashMap<
                String,
                &dyn candle_lora::EmbeddingLayerLike,
            > = ::std::collections::HashMap::new();
            [(embed.insert("wte".to_string(), &*self.wte))];
            if !linear.is_empty() && linear_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for linear layers."),
                    );
                };
            }
            if !conv1d.is_empty() && conv1d_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for conv1d layers."),
                    );
                };
            }
            if !conv2d.is_empty() && conv2d_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for conv2d layers."),
                    );
                };
            }
            if !embed.is_empty() && embed_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for embedding layers."),
                    );
                };
            }
            let mut builder = candle_lora::SelectedLayersBuilder::new();
            if linear_config.is_some() {
                builder = builder.add_linear_layers(linear, linear_config.unwrap());
            }
            if conv1d_config.is_some() {
                builder = builder.add_conv1d_layers(conv1d, conv1d_config.unwrap());
            }
            if conv2d_config.is_some() {
                builder = builder.add_conv2d_layers(conv2d, conv2d_config.unwrap());
            }
            if embed_config.is_some() {
                builder = builder.add_embed_layers(embed, embed_config.unwrap());
            }
            let selection = builder.build();
            let new_layers = candle_lora::Lora::convert_model(selection, lora_config, &vb);
            [
                (self
                    .wte = ::std::sync::Arc::new(
                    new_layers.embed.get("wte").unwrap().clone(),
                )),
            ];
        }

So none of linear layer in the self-attention block is converted to lora layer. When I use my fine-tuned model, it behave exactly the same as before.

@AntBlo
Copy link

AntBlo commented Apr 28, 2024

Without code to look at, I can only speculate that it's because the linear layers aren't being converted by:

#[replace_layer_fields]
#[derive(Debug, Clone, AutoLoraConvert)]

or you're not calling get_lora_model on those layers.

Since you're getting the embedding weights (which likely sit in the top module) I would suspect you might only be calling the top module's get_lora_model method.

I'm trying to finetune a Phi-3 model myself.
I'm getting the attention layers here:

src/main.rs:38:5] &vars = [
    (
        "model.layers.4.mlp.down_proj.lora_linear.a0.weight",
        Var(
            Tensor[dims 1, 8192; bf16, cuda:0],
        ),
    ),
    (
        "model.layers.9.self_attn.o_proj.lora_linear.a0.weight",
        Var(
            Tensor[dims 1, 3072; bf16, cuda:0],
        ),
    ),
    (
        "model.layers.9.mlp.down_proj.lora_linear.a0.weight",
        Var(
            Tensor[dims 1, 8192; bf16, cuda:0],
        ),
    ),
... // more weights
]

Here's the Phi model's top module:

#[replace_layer_fields]
#[derive(Debug, Clone, AutoLoraConvert)]
pub struct PhiModel {
    embed_tokens: Embedding,
    layers: Vec<DecoderLayer>,
    norm: RmsNorm,
    lm_head: Linear,
    device: Device,
    dtype: DType,
}

This would convert the Embedding and Linear layers to lora layers.
However, it would not convert self.layers automatically. You'd have to iterate through them as is done here:


where in each self.load, get_lora_model is called for each module's child modules.

I'm not sure if inner: Box<dyn LinearLayerLike> is converted.
Seems like a combination of the old way:

layer: Box<dyn LinearLayerLike>,

and new way:

Another thing to keep in mind is that when doing:

let mut optimizer = candle_nn::SGD::new(varmap.all_vars(), 0.003).unwrap();

you'll have both lora and normal variables in the varmap. Unclear if this causes issues/slowdown/OOM when training. Haven't tried.

I think you can filter that out by doing:

let vars = varmap
        .data()
        .lock()
        .unwrap()
        .iter()
        .filter(|s| s.0.contains("lora))
        .collect::<Vec<_>>();
let mut optimizer = candle_nn::SGD::new(vars, 0.003).unwrap();

since you can name the variables with VarBuilder before passing it to get_lora_model.

However, I'm getting OOM on only 1608000 params, calculated using this:

let num_params = vars
    .iter()
    .map(|s| s.1.shape().dims().iter().product::<usize>()) // assuming vectors and matrices of weights
    .sum::<usize>();

while running Phi-3 with no input on an RTX 3090 24GB. So not entirely sure I'm doing this correctly either.
(specifically it panics with CUDA_ERROR_OUT_OF_MEMORY on https://github.com/huggingface/candle/blob/c68ed8963fb6fc842f20d84baa07ff97b56aedb4/candle-nn/src/optim.rs#L21 but I feel like this might be a separate issue, or me just not understanding memory usage of backprop)

Perhaps @EricLBuehler can give some inputs?

@AntBlo
Copy link

AntBlo commented Apr 29, 2024

Found this:
huggingface/candle#2079

But the following modification (using HashMap) doesn't help with OOM:

pub fn from_mmaped_safetensors<'a, P: AsRef<Path>>(
    paths: &[P],
    dtype: DType,
    device: &Device,
    silent: bool,
) -> Result<VarBuilderArgs<'a, Box<dyn SimpleBackend>>, Error> {
    let mut map = HashMap::new();
    {
        let tensors = unsafe { candle_core::safetensors::MmapedSafetensors::multi(paths)? };

        if silent {
            for (name, _) in tensors.tensors() {
                let tensor = tensors
                    .load(&name, device)?
                    .to_device(device)?
                    .to_dtype(dtype)?;
                map.insert(name.clone(), tensor);
            }
        } else {
            for (name, _) in tensors.tensors().iter() {
                let tensor = tensors
                    .load(name, device)?
                    .to_device(device)?
                    .to_dtype(dtype)?;
                map.insert(name.clone(), tensor);
            }
        };
    }

    Ok(VarBuilder::from_tensors(map, dtype, device))
}

@EricLBuehler
Copy link
Owner

@AntBlo memory usage of backprop is very high, what is your GPU memory capacity?

@AntBlo
Copy link

AntBlo commented Aug 23, 2024

@EricLBuehler

From nvidia-smi:
NVIDIA GeForce RTX 3090 with 24576MiB (24GB VRAM)

Put this on the back burner for a bit, but if there's anything I can test then let me know

@EricLBuehler
Copy link
Owner

@AntBlo 24GB should be enough for backprop. This may be connected to #21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants