ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1 #1489

xzuyn · 2024-04-07T00:38:30Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

It should run without an error, as it does when you have micro_batch_size and eval_batch_size set to 1.

Current behaviour

Returns two errors;

ValueError: expected sequence of length 406 at dim 1 (got 75)

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (rejected_input_ids in this case) have excessive nesting (inputs type list where type int is expected).

Traceback (most recent call last):
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 759, in convert_to_tensors
    tensor = as_tensor(value)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 721, in as_tensor
    return torch.tensor(value)
ValueError: expected sequence of length 406 at dim 1 (got 75)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/train.py", line 160, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
    return inner_training_loop(
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2085, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/accelerate/data_loader.py", line 451, in __iter__
    current_batch = next(dataloader_iter)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/monkeypatch/data/batch_dataset_fetcher.py", line 32, in fetch
    return self.collate_fn(data)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/utils/collators.py", line 106, in __call__
    features = self.tokenizer.pad(
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3369, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 224, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 775, in convert_to_tensors
    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`rejected_input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Steps to reproduce

Run the YAML provided, which has a micro_batch_size and eval_batch_size of 2.

I tested:

micro_batch_size: 1 & eval_batch_size: 1 - Works
micro_batch_size: 2 & eval_batch_size: 2 - Errors
micro_batch_size: 2 & eval_batch_size: 1 - Errors
micro_batch_size: 1 & eval_batch_size: 2 - Errors

Config yaml

wandb_project: MV02-7B
wandb_entity:
wandb_watch:
wandb_name: ORPO-QLoRA-run_1-Test-1
wandb_log_model:

output_dir: ./MV02-Test-1-run_1-ORPO-7B-QLoRA
resume_from_checkpoint:
save_steps: 10
saves_per_epoch:
save_safetensors: true
save_total_limit: 5
hub_model_id:
hub_strategy:

base_model: alpindale/Mistral-7B-v0.2-hf
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: false
is_mistral_derived_model: true
is_falcon_derived_model: false
is_qwen_derived_model: false

bf16: true
fp16: false
tf32: false

load_in_8bit: false
load_in_4bit: true
strict: false

sequence_len: 4096
s2_attention: false
sample_packing: false
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false

adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 64
lora_dropout: 0.125
lora_fan_in_fan_out:
lora_target_linear:
save_embedding_layers:
peft_layers_to_transform:
peft_use_dora:
peft_use_rslora: true
peft_layer_replication:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_modules_to_save:

unfrozen_parameters:

rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
chat_template: chatml
datasets:
  - path: argilla/ultrafeedback-binarized-preferences-cleaned
    type: orpo.chat_template
val_set_size: 0.01
eval_sample_packing: false
evaluation_strategy: steps
eval_steps: 10
evals_per_epoch:
test_datasets:
dataset_prepared_path: ./Test-1-seed42
push_dataset_to_hub:
hf_use_auth_token:
shuffle_merged_datasets: true

num_epochs: 1
gradient_accumulation_steps: 8
micro_batch_size: 2
eval_batch_size: 2
warmup_steps: 0
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.00001
loraplus_lr_ratio: 8
loraplus_lr_embedding:
cosine_min_lr_ratio:
weight_decay: 0.01
max_grad_norm: 1.0
logging_steps: 1

gradient_checkpointing: true
early_stopping_patience: false
local_rank:
xformers_attention: false
flash_attention: false
sdp_attention: true

loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3

debug: true
seed: 42
deepspeed:
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.10.12

axolotl branch-commit

main/bda48f0

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

LeeWonc · 2024-04-08T10:17:48Z

Same issue....

xzuyn added the bug Something isn't working label Apr 7, 2024

This was referenced Apr 19, 2024

fix orpo to support micro batch size greatter than one #1540

Closed

ORPO Trainer replacement #1551

Merged

winglian closed this as completed in #1551 Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1 #1489

ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1 #1489

xzuyn commented Apr 7, 2024

LeeWonc commented Apr 8, 2024

ORPO seems broken with micro_batch_size or eval_batch_size > 1 #1489

ORPO seems broken with micro_batch_size or eval_batch_size > 1 #1489

Comments

xzuyn commented Apr 7, 2024

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

LeeWonc commented Apr 8, 2024

ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1 #1489

ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1 #1489