AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

ashmalvayani · 2024-04-17T23:22:19Z

System Info

transformer version: 4.40.0.dev
python version: 3.10
torch: 2.0.1 cu11.7

I am fine-tuning https://huggingface.co/CohereForAI/c4ai-command-r-v01 model with axolotl framework. The config lora.yaml file is as follows:

base_model: CohereForAI/c4ai-command-r-v01
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
    - path: Data_Clean3.json
      ds_type: json
      type: alpaca
dataset_prepared_path: /last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0
val_set_size: 0.0
output_dir: ./outputs/c4ai_lora

sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

adapter: qlora
lora_model_dir:
sample_packing: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
# bf16: auto
# fp16: 
# tf32: false
bf16: false
fp16: true
tf32: true

gradient_checkpointing: false  # don't use with fsdp_activation_checkpointing
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 0
saves_per_epoch: 1
debug:
weight_decay: 0.0
# deepspeed: deepspeed_configs/zero3.json

special_tokens:
  bos_token: "<BOS_TOKEN>"
  eos_token: "<|END_OF_TURN_TOKEN|>"
  pad_token: "<PAD>"

To reproduce, install the axolotl environment and run the following:

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch -m axolotl.cli.train examples/cohere-command/lora.yaml

The problem exists when I run the quantized model in 4 bit, in 8 bit it does not have any issues and run smoothly.

Who can help?

@pacman100 @SunMarc @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Install axolotl environment from: https://github.com/OpenAccess-AI-Collective/axolotl
run

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch -m axolotl.cli.train examples/cohere-command/lora.yaml

You may change the dataset as mentioned in the axolotl environment.

Expected behavior

It should start the training, but in 4_bit quantization qlora it's causing error.

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-04-18T08:29:37Z

Hi @ashmalvayani
Can you share the full traceback ? I am wondering if you did not have the correct PEFT version as we recently fixed that on transformers #30162

hiyouga · 2024-04-18T08:50:48Z

It seems that there is still a problem in Transformers (see huggingface/peft#1635)

transformers/src/transformers/modeling_utils.py

Lines 1164 to 1166 in 28a2283

    
           nb_params = ( 
        
               quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size() 
        
           )

It can be resolved after this pr is merged
#30133

ashmalvayani · 2024-04-18T12:52:23Z

Hi @ashmalvayani Can you share the full traceback ? I am wondering if you did not have the correct PEFT version as we recently fixed that on transformers #30162

Please find below the complete traceback:

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch -m axolotl.cli.train examples/cohere-command/lora.yaml
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `8`
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2024-04-18 14:41:38,913] [INFO] [datasets.<module>:58] [PID:1765365] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,913] [INFO] [datasets.<module>:58] [PID:1765362] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,913] [INFO] [datasets.<module>:58] [PID:1765363] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,949] [INFO] [datasets.<module>:58] [PID:1765359] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,973] [INFO] [datasets.<module>:58] [PID:1765361] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,992] [INFO] [datasets.<module>:58] [PID:1765364] PyTorch version 2.0.1 available.
[2024-04-18 14:41:38,998] [INFO] [datasets.<module>:58] [PID:1765358] PyTorch version 2.0.1 available.
[2024-04-18 14:41:39,083] [INFO] [datasets.<module>:58] [PID:1765360] PyTorch version 2.0.1 available.
[2024-04-18 14:41:40,627] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,627] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,640] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,646] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,647] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,650] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,651] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:40,659] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-18 14:41:43,726] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765362] Created a temporary directory at /tmp/tmpqvejhx60
[2024-04-18 14:41:43,727] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765362] Writing /tmp/tmpqvejhx60/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,779] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765359] Created a temporary directory at /tmp/tmpy1t_1x1q
[2024-04-18 14:41:43,779] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765359] Writing /tmp/tmpy1t_1x1q/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,780] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765363] Created a temporary directory at /tmp/tmpbipza1zr
[2024-04-18 14:41:43,780] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765363] Writing /tmp/tmpbipza1zr/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,787] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765365] Created a temporary directory at /tmp/tmpxtst6jkt
[2024-04-18 14:41:43,787] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765365] Writing /tmp/tmpxtst6jkt/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,814] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765361] Created a temporary directory at /tmp/tmply3_95zj
[2024-04-18 14:41:43,814] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765361] Writing /tmp/tmply3_95zj/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,821] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765358] Created a temporary directory at /tmp/tmpjdkafe1p
[2024-04-18 14:41:43,822] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765358] Writing /tmp/tmpjdkafe1p/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,841] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765364] Created a temporary directory at /tmp/tmpd20a66wt
[2024-04-18 14:41:43,841] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765364] Writing /tmp/tmpd20a66wt/_remote_module_non_scriptable.py
[2024-04-18 14:41:43,876] [INFO] [torch.distributed.nn.jit.instantiator.<module>:21] [PID:1765360] Created a temporary directory at /tmp/tmpdh55n_uz
[2024-04-18 14:41:43,877] [INFO] [torch.distributed.nn.jit.instantiator._write:76] [PID:1765360] Writing /tmp/tmpdh55n_uz/_remote_module_non_scriptable.py
[2024-04-18 14:41:44,672] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765363] [RANK:5] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,672] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765363] [RANK:5] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,681] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765365] [RANK:7] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,681] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765365] [RANK:7] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,682] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765362] [RANK:4] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,682] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765362] [RANK:4] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,710] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765359] [RANK:1] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,710] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765359] [RANK:1] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,787] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765364] [RANK:6] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,787] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765364] [RANK:6] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,791] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765361] [RANK:3] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,791] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765361] [RANK:3] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,802] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765358] [RANK:0] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,802] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765358] [RANK:0] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,806] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:275] [PID:1765360] [RANK:2] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-04-18 14:41:44,806] [INFO] [axolotl.utils.config.models.input.check_bf16:1026] [PID:1765360] [RANK:2] bf16 support detected, but not enabled for this configuration.
[2024-04-18 14:41:44,856] [INFO] [axolotl.normalize_config:182] [PID:1765363] [RANK:5] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,868] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,870] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765363] Added key: store_based_barrier_key:1 to store for rank: 5
[2024-04-18 14:41:44,878] [INFO] [axolotl.normalize_config:182] [PID:1765365] [RANK:7] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,880] [INFO] [axolotl.normalize_config:182] [PID:1765362] [RANK:4] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,889] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,890] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765365] Added key: store_based_barrier_key:1 to store for rank: 7
[2024-04-18 14:41:44,890] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,891] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765362] Added key: store_based_barrier_key:1 to store for rank: 4
[2024-04-18 14:41:44,918] [INFO] [axolotl.normalize_config:182] [PID:1765359] [RANK:1] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:44,931] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:44,932] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765359] Added key: store_based_barrier_key:1 to store for rank: 1
[2024-04-18 14:41:44,993] [INFO] [axolotl.normalize_config:182] [PID:1765364] [RANK:6] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,005] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,006] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765364] Added key: store_based_barrier_key:1 to store for rank: 6
[2024-04-18 14:41:45,010] [INFO] [axolotl.normalize_config:182] [PID:1765358] [RANK:0] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,019] [INFO] [axolotl.normalize_config:182] [PID:1765361] [RANK:3] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,020] [INFO] [axolotl.normalize_config:182] [PID:1765360] [RANK:2] GPU memory usage baseline: 0.000GB (+0.857GB misc)
[2024-04-18 14:41:45,021] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,021] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-04-18 14:41:45,023] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765358] Added key: store_based_barrier_key:1 to store for rank: 0
[2024-04-18 14:41:45,030] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,031] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-04-18 14:41:45,031] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765361] Added key: store_based_barrier_key:1 to store for rank: 3
[2024-04-18 14:41:45,032] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765360] Added key: store_based_barrier_key:1 to store for rank: 2
[2024-04-18 14:41:45,033] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765360] Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,033] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765365] Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,034] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765363] Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,034] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765358] Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,035] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765359] Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,035] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765362] Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,037] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765364] Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
[2024-04-18 14:41:45,042] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765361] Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes.
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

****************************************
**** Axolotl Dependency Versions *****
  accelerate: 0.28.0         
        peft: 0.10.0         
transformers: 4.40.0.dev0    
         trl: 0.8.2.dev0     
       torch: 2.0.1          
bitsandbytes: 0.43.0         
****************************************
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,028] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765362] [RANK:4] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,029] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765362] [RANK:4] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,029] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765362] [RANK:4] PAD: 0 / <PAD>
[2024-04-18 14:41:46,029] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765362] [RANK:4] UNK: None / None
[2024-04-18 14:41:46,029] [INFO] [axolotl.load_tokenizer:293] [PID:1765362] [RANK:4] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765364] [RANK:6] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765364] [RANK:6] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765364] [RANK:6] PAD: 0 / <PAD>
[2024-04-18 14:41:46,048] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765364] [RANK:6] UNK: None / None
[2024-04-18 14:41:46,048] [INFO] [axolotl.load_tokenizer:293] [PID:1765364] [RANK:6] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765360] [RANK:2] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765360] [RANK:2] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765360] [RANK:2] PAD: 0 / <PAD>
[2024-04-18 14:41:46,054] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765360] [RANK:2] UNK: None / None
[2024-04-18 14:41:46,054] [INFO] [axolotl.load_tokenizer:293] [PID:1765360] [RANK:2] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765363] [RANK:5] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765363] [RANK:5] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765363] [RANK:5] PAD: 0 / <PAD>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765365] [RANK:7] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765363] [RANK:5] UNK: None / None
[2024-04-18 14:41:46,083] [INFO] [axolotl.load_tokenizer:293] [PID:1765363] [RANK:5] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765365] [RANK:7] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765365] [RANK:7] PAD: 0 / <PAD>
[2024-04-18 14:41:46,083] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765365] [RANK:7] UNK: None / None
[2024-04-18 14:41:46,083] [INFO] [axolotl.load_tokenizer:293] [PID:1765365] [RANK:7] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765359] [RANK:1] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765359] [RANK:1] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765359] [RANK:1] PAD: 0 / <PAD>
[2024-04-18 14:41:46,084] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765359] [RANK:1] UNK: None / None
[2024-04-18 14:41:46,084] [INFO] [axolotl.load_tokenizer:293] [PID:1765359] [RANK:1] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765358] [RANK:0] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765358] [RANK:0] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765358] [RANK:0] PAD: 0 / <PAD>
[2024-04-18 14:41:46,157] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765358] [RANK:0] UNK: None / None
[2024-04-18 14:41:46,157] [INFO] [axolotl.load_tokenizer:293] [PID:1765358] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:41:46,160] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765358] [RANK:0] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:46,179] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765358] [RANK:0] Prepared dataset loaded from disk...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765361] [RANK:3] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765361] [RANK:3] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765361] [RANK:3] PAD: 0 / <PAD>
[2024-04-18 14:41:46,305] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765361] [RANK:3] UNK: None / None
[2024-04-18 14:41:46,305] [INFO] [axolotl.load_tokenizer:293] [PID:1765361] [RANK:3] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:41:50,947] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765364] [RANK:6] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,947] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765361] [RANK:3] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,947] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765362] [RANK:4] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765359] [RANK:1] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765360] [RANK:2] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765363] [RANK:5] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,948] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:1765365] [RANK:7] Loading prepared dataset from disk at /mnt/beegfs/fahad.khan/axolotl/last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0/3308b18091e3a983103cbeb4cceb82d0...
[2024-04-18 14:41:50,966] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765364] [RANK:6] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,965] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765365] [RANK:7] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,974] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765362] [RANK:4] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,974] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765360] [RANK:2] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,974] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765361] [RANK:3] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,976] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765359] [RANK:1] Prepared dataset loaded from disk...
[2024-04-18 14:41:50,985] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:1765363] [RANK:5] Prepared dataset loaded from disk...
[2024-04-18 14:41:51,193] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] total_num_tokens: 74_255_688
[2024-04-18 14:41:52,742] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] `total_supervised_tokens: 54_401_856`
[2024-04-18 14:41:59,207] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,289] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,394] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,419] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,477] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,533] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,534] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] data_loader_len: 8972
[2024-04-18 14:41:59,620] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,667] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 9281961
[2024-04-18 14:41:59,857] [INFO] [axolotl.log:61] [PID:1765358] [RANK:0] sample_packing_eff_est across ranks: [0.7693688273429871, 0.7703332304954529, 0.7695566415786743, 0.7701941728591919, 0.7695893049240112, 0.7702759504318237, 0.7708409428596497, 0.770734429359436]
[2024-04-18 14:41:59,858] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] sample_packing_eff_est: 0.78
[2024-04-18 14:41:59,858] [DEBUG] [axolotl.log:61] [PID:1765358] [RANK:0] total_num_steps: 1121
[2024-04-18 14:41:59,921] [DEBUG] [axolotl.train.log:61] [PID:1765358] [RANK:0] loading tokenizer... CohereForAI/c4ai-command-r-v01
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765365] [RANK:7] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765365] [RANK:7] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765365] [RANK:7] PAD: 0 / <PAD>
[2024-04-18 14:42:00,490] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765365] [RANK:7] UNK: None / None
[2024-04-18 14:42:00,490] [INFO] [axolotl.load_tokenizer:293] [PID:1765365] [RANK:7] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765358] [RANK:0] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765358] [RANK:0] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765358] [RANK:0] PAD: 0 / <PAD>
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765358] [RANK:0] UNK: None / None
[2024-04-18 14:42:00,497] [INFO] [axolotl.load_tokenizer:293] [PID:1765358] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-18 14:42:00,497] [DEBUG] [axolotl.train.log:61] [PID:1765358] [RANK:0] loading model and peft_config...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765363] [RANK:5] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765363] [RANK:5] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765363] [RANK:5] PAD: 0 / <PAD>
[2024-04-18 14:42:00,535] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765363] [RANK:5] UNK: None / None
[2024-04-18 14:42:00,535] [INFO] [axolotl.load_tokenizer:293] [PID:1765363] [RANK:5] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765364] [RANK:6] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765364] [RANK:6] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765364] [RANK:6] PAD: 0 / <PAD>
[2024-04-18 14:42:00,548] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765364] [RANK:6] UNK: None / None
[2024-04-18 14:42:00,548] [INFO] [axolotl.load_tokenizer:293] [PID:1765364] [RANK:6] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,549] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765362] [RANK:4] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,550] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765362] [RANK:4] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,550] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765362] [RANK:4] PAD: 0 / <PAD>
[2024-04-18 14:42:00,550] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765362] [RANK:4] UNK: None / None
[2024-04-18 14:42:00,550] [INFO] [axolotl.load_tokenizer:293] [PID:1765362] [RANK:4] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765360] [RANK:2] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765360] [RANK:2] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765360] [RANK:2] PAD: 0 / <PAD>
[2024-04-18 14:42:00,567] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765360] [RANK:2] UNK: None / None
[2024-04-18 14:42:00,567] [INFO] [axolotl.load_tokenizer:293] [PID:1765360] [RANK:2] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765361] [RANK:3] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765361] [RANK:3] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765361] [RANK:3] PAD: 0 / <PAD>
[2024-04-18 14:42:00,604] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765361] [RANK:3] UNK: None / None
[2024-04-18 14:42:00,604] [INFO] [axolotl.load_tokenizer:293] [PID:1765361] [RANK:3] No Chat template selected. Consider adding a chat template for easier inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:279] [PID:1765359] [RANK:1] EOS: 255001 / <|END_OF_TURN_TOKEN|>
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:280] [PID:1765359] [RANK:1] BOS: 5 / <BOS_TOKEN>
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:281] [PID:1765359] [RANK:1] PAD: 0 / <PAD>
[2024-04-18 14:42:00,644] [DEBUG] [axolotl.load_tokenizer:282] [PID:1765359] [RANK:1] UNK: None / None
[2024-04-18 14:42:00,644] [INFO] [axolotl.load_tokenizer:293] [PID:1765359] [RANK:1] No Chat template selected. Consider adding a chat template for easier inference.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:18<00:00,  1.23s/it]
Loading checkpoint shards:  73%|████████████████████████████████████████████████████                   | 11/15 [00:18<00:06,  1.60s/it][2024-04-18 14:42:19,717] [INFO] [axolotl.load_model:720] [PID:1765365] [RANK:7] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.118GB misc)
[2024-04-18 14:42:19,725] [INFO] [axolotl.load_model:771] [PID:1765365] [RANK:7] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:19,726] [INFO] [axolotl.load_model:780] [PID:1765365] [RANK:7] converting modules to torch.float16 for flash attention
Loading checkpoint shards:  73%|████████████████████████████████████████████████████                   | 11/15 [00:18<00:06,  1.58s/it][2024-04-18 14:42:19,990] [INFO] [axolotl.load_model:825] [PID:1765365] [RANK:7] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.118GB misc)
Loading checkpoint shards:  73%|████████████████████████████████████████████████████                   | 11/15 [00:18<00:06,  1.59s/it]Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Loading checkpoint shards:  80%|████████████████████████████████████████████████████████▊              | 12/15 [00:20<00:04,  1.48s/it][2024-04-18 14:42:21,244] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards:  93%|██████████████████████████████████████████████████████████████████▎    | 14/15 [00:20<00:01,  1.44s/it][2024-04-18 14:42:22,177] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:21<00:00,  1.43s/it]
Loading checkpoint shards:  87%|█████████████████████████████████████████████████████████████▌         | 13/15 [00:21<00:02,  1.45s/it][2024-04-18 14:42:22,712] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards:  87%|█████████████████████████████████████████████████████████████▌         | 13/15 [00:21<00:02,  1.46s/it][2024-04-18 14:42:22,749] [INFO] [axolotl.load_model:720] [PID:1765358] [RANK:0] GPU memory usage after model load: 19.706GB (+0.171GB cache, +3.243GB misc)
[2024-04-18 14:42:22,756] [INFO] [axolotl.load_model:771] [PID:1765358] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:22,758] [INFO] [axolotl.load_model:780] [PID:1765358] [RANK:0] converting modules to torch.float16 for flash attention
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:21<00:00,  1.46s/it]
[2024-04-18 14:42:23,018] [WARNING] [axolotl.load_lora:984] [PID:1765358] [RANK:0] Exception caught during model.print_trainable_parameters(): 'torch.dtype' object has no attribute 'itemsize'
[2024-04-18 14:42:23,029] [INFO] [axolotl.load_model:825] [PID:1765358] [RANK:0] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +3.243GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:23,194] [INFO] [axolotl.train.log:61] [PID:1765358] [RANK:0] Pre-saving adapter config to ./outputs/c4ai_lora
[2024-04-18 14:42:23,304] [INFO] [axolotl.load_model:720] [PID:1765363] [RANK:5] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:23,311] [INFO] [axolotl.load_model:771] [PID:1765363] [RANK:5] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:23,313] [INFO] [axolotl.load_model:780] [PID:1765363] [RANK:5] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:23,339] [INFO] [axolotl.train.log:61] [PID:1765358] [RANK:0] Starting trainer...
Loading checkpoint shards:  93%|██████████████████████████████████████████████████████████████████▎    | 14/15 [00:22<00:01,  1.34s/it][2024-04-18 14:42:23,504] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:23,572] [INFO] [axolotl.load_model:825] [PID:1765363] [RANK:5] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
Loading checkpoint shards:  93%|██████████████████████████████████████████████████████████████████▎    | 14/15 [00:22<00:01,  1.35s/it]Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:22<00:00,  1.52s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:22<00:00,  1.53s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:22<00:00,  1.53s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:23<00:00,  1.54s/it]
[2024-04-18 14:42:24,022] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 15/15 [00:23<00:00,  1.54s/it]
[2024-04-18 14:42:24,216] [INFO] [axolotl.load_model:720] [PID:1765362] [RANK:4] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,228] [INFO] [axolotl.load_model:771] [PID:1765362] [RANK:4] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,230] [INFO] [axolotl.load_model:780] [PID:1765362] [RANK:4] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,342] [INFO] [axolotl.load_model:720] [PID:1765359] [RANK:1] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,348] [INFO] [axolotl.load_model:720] [PID:1765360] [RANK:2] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,356] [INFO] [axolotl.load_model:720] [PID:1765364] [RANK:6] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,357] [INFO] [axolotl.load_model:771] [PID:1765360] [RANK:2] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,357] [INFO] [axolotl.load_model:771] [PID:1765359] [RANK:1] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,358] [INFO] [axolotl.load_model:780] [PID:1765360] [RANK:2] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,359] [INFO] [axolotl.load_model:780] [PID:1765359] [RANK:1] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,363] [INFO] [axolotl.load_model:771] [PID:1765364] [RANK:6] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,365] [INFO] [axolotl.load_model:780] [PID:1765364] [RANK:6] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,487] [INFO] [axolotl.load_model:720] [PID:1765361] [RANK:3] GPU memory usage after model load: 19.706GB (+0.171GB cache, +2.259GB misc)
[2024-04-18 14:42:24,495] [INFO] [axolotl.load_model:771] [PID:1765361] [RANK:3] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-18 14:42:24,497] [INFO] [axolotl.load_model:780] [PID:1765361] [RANK:3] converting modules to torch.float16 for flash attention
[2024-04-18 14:42:24,514] [INFO] [axolotl.load_model:825] [PID:1765362] [RANK:4] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:24,638] [INFO] [axolotl.load_model:825] [PID:1765364] [RANK:6] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
[2024-04-18 14:42:24,639] [INFO] [axolotl.load_model:825] [PID:1765360] [RANK:2] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
[2024-04-18 14:42:24,641] [INFO] [axolotl.load_model:825] [PID:1765359] [RANK:1] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
[2024-04-18 14:42:24,715] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:24,720] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:24,791] [INFO] [axolotl.load_model:825] [PID:1765361] [RANK:3] GPU memory usage after adapters: 19.745GB (+7.983GB cache, +2.259GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-18 14:42:25,162] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,439] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,469] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,578] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,674] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,729] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,744] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,831] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,937] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:25,997] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,187] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,356] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,374] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,381] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,473] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,589] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,667] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,787] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,800] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,935] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:26,937] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,198] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,457] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,463] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:27,678] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6514461040496826 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
[2024-04-18 14:42:30,051] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765365] Added key: store_based_barrier_key:2 to store for rank: 7
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6153619289398193 seconds
[2024-04-18 14:42:30,817] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765358] Added key: store_based_barrier_key:2 to store for rank: 0
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6068532466888428 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5634384155273438 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5859313011169434 seconds
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5404446125030518 seconds
[2024-04-18 14:42:31,718] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765363] Added key: store_based_barrier_key:2 to store for rank: 5
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.5590782165527344 seconds
[2024-04-18 14:42:31,865] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765362] Added key: store_based_barrier_key:2 to store for rank: 4
Using /home/ashmal.vayani/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ashmal.vayani/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.649749279022217 seconds
[2024-04-18 14:42:32,201] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765359] Added key: store_based_barrier_key:2 to store for rank: 1
[2024-04-18 14:42:32,414] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765360] Added key: store_based_barrier_key:2 to store for rank: 2
[2024-04-18 14:42:32,451] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765361] Added key: store_based_barrier_key:2 to store for rank: 3
[2024-04-18 14:42:32,710] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:442] [PID:1765364] Added key: store_based_barrier_key:2 to store for rank: 6
[2024-04-18 14:42:32,710] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765364] Rank 6: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,711] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765360] Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765363] Rank 5: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765365] Rank 7: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765358] Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,714] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765359] Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,717] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765361] Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:32,718] [INFO] [torch.distributed.distributed_c10d._store_based_barrier:476] [PID:1765362] Rank 4: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes.
[2024-04-18 14:42:36,001] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,062] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,066] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,089] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,095] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,109] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,161] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,597] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765362] [RANK:4] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,762] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765361] [RANK:3] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,763] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765360] [RANK:2] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,830] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765364] [RANK:6] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,859] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765363] [RANK:5] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,906] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765365] [RANK:7] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:36,952] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765359] [RANK:1] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  0%|                                                                                                        | 0/11503 [00:00<?, ?it/s]You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[2024-04-18 14:42:37,518] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
[2024-04-18 14:42:38,287] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:1765358] [RANK:0] packing_efficiency_estimate: 0.78 total_num_tokens per device: 9281961
You're using a CohereTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2219, in _inner_training_loop
    self.current_flos += float(self.floating_point_ops(inputs))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3878, in floating_point_ops
    return self.model.floating_point_ops(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1218, in floating_point_ops
    return 6 * self.estimate_tokens(input_dict) * self.num_parameters(exclude_embeddings=exclude_embeddings)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1165, in num_parameters
    quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()
AttributeError: 'torch.dtype' object has no attribute 'element_size'
  0%|                                                                                                        | 0/11503 [00:03<?, ?it/s]
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765359 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765361 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765364 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1765365 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1765358) of binary: /home/ashmal.vayani/anaconda3/envs/axolotl/bin/python
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1048, in launch_command
    multi_gpu_launcher(args)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 702, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
axolotl.cli.train FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 1765360)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 1765362)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 1765363)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-18_14:42:42
  host      : 675d-4.dl-labs.ai
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1765358)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

ashmalvayani · 2024-04-18T17:24:26Z

Upgrading the torch to 2.1.0, with this command:
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia

and changing the flash attention installation via this:

pip uninstall flash-attn
FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn

fixed both the issue of element_size and bf16 was solved.
"triu_tril_cuda_template" not implemented for 'BFloat16'

However, I think it's a work around and not the actual fix. I could be wrong, can you please let me know?

QiFengSu · 2024-04-19T18:23:00Z

变形金刚似乎还是有问题的（见huggingface/peft#1635)

transformers/src/transformers/modeling_utils.py

Lines 1164 to 1166 in 28a2283

nb_params = (

quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()

)

合并此pr后可以解决 #30133

I had the same issue with the transformers line that you mentioned. I managed to fix it using the solution from the issue you highlighted, but I'm not entirely sure if it will work properly since the fix only involved setting a constant.

ashmalvayani · 2024-04-19T21:39:36Z

变形金刚似乎还是有问题的（见huggingface/peft#1635)

transformers/src/transformers/modeling_utils.py

Lines 1164 to 1166 in 28a2283

nb_params = (

quant_storage.itemsize if hasattr(quant_storage, "itemsize") else quant_storage.element_size()

)

合并此pr后可以解决 #30133

I had the same issue with the transformer line that you mentioned. I managed to fix it using the solution from the issue you highlighted, but I'm not entirely sure if it will work properly since the fix only involved setting a constant.

I see how there was a solution of manually adding a helper function like this

quant_storage = self.hf_quantizer.quantization_config.bnb_4bit_quant_storage
nb_params = get_dtype_size(quant_storage)
total_numel.append(param.numel() * 2 * nb_params)

However, the current line you've highlighted "Lines 1164 to 1166" is the issue where the quant_storage neither has the item_size and neither has the element_size (in earlier and later versions of transformers respectively).

NanoCode012 mentioned this issue Apr 18, 2024

"triu_tril_cuda_template" not implemented for 'BFloat16' axolotl-ai-cloud/axolotl#1532

Closed

1 task

ashmalvayani closed this as completed Apr 18, 2024

younesbelkada mentioned this issue Apr 22, 2024

fix for itemsize => element_size() for torch backwards compat #30133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

ashmalvayani commented Apr 17, 2024

younesbelkada commented Apr 18, 2024

hiyouga commented Apr 18, 2024 •

edited

Loading

ashmalvayani commented Apr 18, 2024 •

edited

Loading

ashmalvayani commented Apr 18, 2024 •

edited

Loading

QiFengSu commented Apr 19, 2024

ashmalvayani commented Apr 19, 2024

AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

AttributeError: 'torch.dtype' object has no attribute 'element_size' #30304

Comments

ashmalvayani commented Apr 17, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented Apr 18, 2024

hiyouga commented Apr 18, 2024 • edited Loading

ashmalvayani commented Apr 18, 2024 • edited Loading

ashmalvayani commented Apr 18, 2024 • edited Loading

QiFengSu commented Apr 19, 2024

ashmalvayani commented Apr 19, 2024

hiyouga commented Apr 18, 2024 •

edited

Loading

ashmalvayani commented Apr 18, 2024 •

edited

Loading

ashmalvayani commented Apr 18, 2024 •

edited

Loading