RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long #34573

qmin2 · 2024-11-02T04:12:05Z

System Info

transformers == 4.45
torch == 2.4.1 + cu118
accelerate == 1.0.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

dataset = load_dataset("pg19")
dataloader = {
    split: DataLoader(dataset[split], batch_size=args.batch_size, shuffle=(split == 'train'),
                      pin_memory=True) for split in ['train', 'validation', 'test']}

accelerator = Accelerator()
device = accelerator.device
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # tokenizer.pad_token = tokenizer.eos_token e.g.
model = LlamaForCausalLM.from_pretrained(model_name, config=config, torch_dtype = torch.bfloat16).to(device)
model.resize_token_embeddings(len(tokenizer))
train_dataloader, eval_dataloader, model, optimizer, lr_scheduler = accelerator.prepare(
    dataloader["train"], dataloader["validation"], model, optimizer, lr_scheduler
    )

for epoch in range(1, args.num_epochs + 1):
        start_time = perf_counter()

        model.train()
        train_loss = 0

        for idx, batch in enumerate(tqdm(train_dataloader, disable=args.disable_tqdm)):
            inputs = tokenizer(batch['text'], padding="longest", truncation=True, max_length=2200, return_tensors='pt', return_token_type_ids=False).to(device)
 
            inputs['labels'] = inputs['input_ids'].clone()

            label_mask = inputs['attention_mask'].bool()
            inputs['labels'][~label_mask] = -100
            
            loss = model(**inputs).loss

            accelerator.backward(loss)

            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

Expected behavior

I'm using PyTorch 2.4.1 +cu118 and transformers 4.45, training with a batch size of 2 with 2 nvidia A100-80GB. When padding appeared in a batch, the attention_mask in LlamaSdpaAttention was activated(i.e. not None at this step).

causal_mask = attention_mask
if attention_mask is not None:
    causal_mask = causal_mask[:, :, :, : key_states.shape[-2]]

After performing the torch.nn.functional.scaled_dot_product_attention operation, I encountered the following error at this line
accelerator.backward(loss)

RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long
For now, I’ve resolved this by skipping batches that include padding, but I would like to understand the root cause and potential solutions for this issue.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-11-04T14:02:42Z

cc @muellerzr @SunMarc !

SunMarc · 2024-11-04T14:32:44Z

Hey @qmin2, can you share your accelerate config ? I see in other posts the same issue as you are facing, maybe this is relevant.

qmin2 · 2024-11-10T03:58:33Z

Sorry for the late reply.

this is my accelerate config

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  deepspeed_config_file: /home/qmin2/3rd_semester_research/mixed_tokens/ds_config.json
  zero3_init_flag: false
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

this is my deepspeed_config

{
    "bf16": {
        "enabled": true
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 1e-5,
            "weight_decay": 1e-5,
            "torch_adam": true,
            "adam_w_mode": true
        }
    },
    "scheduler": {
        "type": "WarmupCosineLR",
        "params":{
            "total_num_steps" : 7500,
            "warmup_min_ratio" : 0.1
        }
    },
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": "auto",
        "contiguous_gradients": true
    },
    "gradient_accumulation_steps": 1,
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

And I encounter another similar issue.
I am using a custom 4D attention mask in LlamaForCausalLM and passing it as input. The model(Llama3.1) is configured with bfloat16. I am encountering an issue with scaled_dot_product_attention in the following line:

attn_output = torch.nn.functional.scaled_dot_product_attention(
    query_states,
    key_states,
    value_states,
    attn_mask=causal_mask,
    dropout_p=self.attention_dropout if self.training else 0.0,
    is_causal=is_causal,
)

The error message I get is a dtype mismatch between query_states and attention_bias. To resolve this, I converted my custom attention_mask to bfloat16 to match the llama3.1 model's dtype. After making this change, the previous error disappears, but a new issue arises during the backward pass with accelerator.backward(loss):

RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long

I suspect that this issue is related to the activation of the causal_mask in LlamaSdpaAttention. The same error occurs when padding is present in the input, and the causal mask is activated.

github-actions · 2024-12-04T08:03:26Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

qmin2 added the bug label Nov 2, 2024

Rocketknight1 added the Accelerate label Nov 4, 2024

github-actions bot closed this as completed Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long #34573

RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long #34573

qmin2 commented Nov 2, 2024 •

edited

Loading

Rocketknight1 commented Nov 4, 2024

SunMarc commented Nov 4, 2024

qmin2 commented Nov 10, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024

RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long #34573

RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long #34573

Comments

qmin2 commented Nov 2, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Nov 4, 2024

SunMarc commented Nov 4, 2024

qmin2 commented Nov 10, 2024 • edited Loading

github-actions bot commented Dec 4, 2024

qmin2 commented Nov 2, 2024 •

edited

Loading

qmin2 commented Nov 10, 2024 •

edited

Loading