kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora' #201

dsbyprateekg · 2024-02-28T11:48:47Z

Hi,

I am trying to run Alpaca + Gemma 7b full example.ipynb in Kaggle environment and getting following error-

while running the below code-

Installed libraries versions are: langchain-0.1.9, langchain-community-0.0.24, langchain-core-0.1.27, sentence-transformers-2.4.0
Please have a look at this issue.

Jonaskouwenhoven · 2024-02-28T12:30:24Z

Just encountered the same error on Colab. Seems to be a new issue

DeanChugall · 2024-02-28T12:30:58Z

Just downgrade HF PEFT to 0.8.2 until unsloth team fix new DORA support form HF PEFT.

!pip install --force-reinstall --no-cache-dir peft==0.8.2

danielhanchen · 2024-02-28T12:34:24Z

Oh my I will get this fixed ASAP

RonanKMcGovern · 2024-02-28T12:37:08Z

Yeah, it's because HuggingFace just merged their DoRA branch to main in the last days. Probably that new argument is slipping through.

DeanChugall · 2024-02-28T12:37:22Z

It would be great if we could integrate PEFT internally in Unsloth to prevent these reverse breaking changes in external packages.

BenjaminBossan · 2024-02-28T12:52:24Z

Thanks @RonanKMcGovern for sending me here.

Let's set up CI using PEFT and unsloth main to prevent this in the future. Do you want to set it up on your side or should we look into adding it to PEFT?

Regarding this specific error, if possible, add **kwargs to the method so that future additions won't lead to the same kind of error.

danielhanchen · 2024-02-28T13:06:26Z

@BenjaminBossan Should be fine in the future hopefully - I rewrote the code to use inspect.getsource to patch it internally :) I used to have 1 custom function, but now its dynamic patching

danielhanchen · 2024-02-28T13:11:10Z

Doing some tests on my end and will push it asap!! Sorry everyone for the issue and also thanks for notifying me!

danielhanchen · 2024-02-28T13:22:59Z

@DeanChugall @dsbyprateekg @Jonaskouwenhoven Again sorry - just fixed it!! On Kaggle / Colab, a reinstall of Unsloth will have to take place - no need to disconnect - just press restart and run all.

For local machines: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Again sorry and also thanks for notifying me!!

dsbyprateekg · 2024-02-28T13:34:40Z

@danielhanchen Thanks a lot for the quick response and the fix.
It's working but facing another error-

ValueError: Invalid pattern: '**' can only be an entire path component

Can you please check and help me to resolve this as well?

danielhanchen · 2024-02-28T13:37:58Z

@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?

dsbyprateekg · 2024-02-28T13:38:53Z

@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?

It's my bad, I forgot to attach the logs.
Please find attached the complete logs of the error-
logs_kaggle.txt

danielhanchen · 2024-02-28T13:40:05Z

@dsbyprateekg Is ur Kaggle instance connected to the internet?

dsbyprateekg · 2024-02-28T13:40:34Z

@dsbyprateekg Is ur Kaggle instance connected to the internet?

Yes.

danielhanchen · 2024-02-28T13:41:51Z

Hmm weird bug indeed

danielhanchen · 2024-02-28T13:43:48Z

@dsbyprateekg Oh try pip install --upgrade datasets I might have to change the datasets version

dsbyprateekg · 2024-02-28T13:49:05Z

@DeanChugall Thanks again! It solved my issue and I am able to proceed.

danielhanchen · 2024-02-28T13:54:35Z

@dsbyprateekg Oh the datasets issue is fine as well? Also I'll reopen this temporarily for people who might have the same issue!! I'll close this in a few days :)

dsbyprateekg · 2024-02-28T14:25:31Z

@danielhanchen Yes, datasets issue was also resolved. But now facing another error-
TypeError: '>' not supported between instances of 'NoneType' and 'int'

While running the training command-
`from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
num_train_epochs=1,
max_steps = None,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)`

Logs are attached.
logs_kaggle.txt

dsbyprateekg · 2024-02-28T14:40:11Z

so the issue is resolved once I commented the line max_steps = None.

The next error is with command trainer_stats = trainer.train() and it's related to wandb logon.
Although I have not used it anywhere in the code. It seems it is picking up internally.
`UsageError Traceback (most recent call last)
Cell In[11], line 1
----> 1 trainer_stats = trainer.train()

File /opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:331, in SFTTrainer.train(self, *args, **kwargs)
328 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
329 self.model = self._trl_activate_neftune(self.model)
--> 331 output = super().train(*args, **kwargs)
333 # After training we make sure to retrieve back the original forward pass method
334 # for the embedding layer by removing the forward post hook.
335 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1624, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1622 hf_hub_utils.enable_progress_bars()
1623 else:
-> 1624 return inner_training_loop(
1625 args=args,
1626 resume_from_checkpoint=resume_from_checkpoint,
1627 trial=trial,
1628 ignore_keys_for_eval=ignore_keys_for_eval,
1629 )

File :272, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:370, in CallbackHandler.on_train_begin(self, args, state, control)
368 def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl):
369 control.should_training_stop = False
--> 370 return self.call_event("on_train_begin", args, state, control)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:414, in CallbackHandler.call_event(self, event, args, state, control, **kwargs)
412 def call_event(self, event, args, state, control, **kwargs):
413 for callback in self.callbacks:
--> 414 result = getattr(callback, event)(
415 args,
416 state,
417 control,
418 model=self.model,
419 tokenizer=self.tokenizer,
420 optimizer=self.optimizer,
421 lr_scheduler=self.lr_scheduler,
422 train_dataloader=self.train_dataloader,
423 eval_dataloader=self.eval_dataloader,
424 **kwargs,
425 )
426 # A Callback can skip the return of control if it doesn't change it.
427 if result is not None:

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:767, in WandbCallback.on_train_begin(self, args, state, control, model, **kwargs)
765 args.run_name = None
766 if not self._initialized:
--> 767 self.setup(args, state, model, **kwargs)

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:740, in WandbCallback.setup(self, args, state, model, **kwargs)
737 init_args["name"] = args.run_name
739 if self._wandb.run is None:
--> 740 self._wandb.init(
741 project=os.getenv("WANDB_PROJECT", "huggingface"),
742 **init_args,
743 )
744 # add config parameters (run may have been created manually)
745 self._wandb.config.update(combined_dict, allow_val_change=True)

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1195, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1193 if logger is not None:
1194 logger.exception(str(e))
-> 1195 raise e
1196 except KeyboardInterrupt as e:
1197 assert logger

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1172, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1170 try:
1171 wi = _WandbInit()
-> 1172 wi.setup(kwargs)
1173 assert wi.settings
1174 except_exit = wi.settings._except_exit

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:306, in _WandbInit.setup(self, kwargs)
303 settings.update(init_settings, source=Source.INIT)
305 if not settings._offline and not settings._noop:
--> 306 wandb_login._login(
307 anonymous=kwargs.pop("anonymous", None),
308 force=kwargs.pop("force", None),
309 _disable_warning=True,
310 _silent=settings.quiet or settings.silent,
311 _entity=kwargs.get("entity") or settings.entity,
312 )
314 # apply updated global state after login was handled
315 wl = wandb.setup()

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:317, in _login(anonymous, key, relogin, host, force, timeout, _backend, _silent, _disable_warning, _entity)
314 return logged_in
316 if not key:
--> 317 wlogin.prompt_api_key()
319 # make sure login credentials get to the backend
320 wlogin.propogate_login()

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:247, in _WandbLogin.prompt_api_key(self)
241 if status == ApiKeyStatus.NOTTY:
242 directive = (
243 "wandb login [your_api_key]"
244 if self._settings._cli_only_mode
245 else "wandb.login(key=[your_api_key])"
246 )
--> 247 raise UsageError("api_key not configured (no-tty). call " + directive)
249 self.update_session(key, status=status)
250 self._key = key

UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key])
`

danielhanchen · 2024-02-28T14:59:19Z

@dsbyprateekg On wandb:

import os
os.environ["WANDB_DISABLED"] = "true"

then for TrainingArgs:

  seed = 3407,
  output_dir = "outputs",
  report_to = "none",

dsbyprateekg · 2024-02-29T04:34:38Z

@danielhanchen I have added my wandb login but now I am facing nbclient.exceptions.DeadKernelError: Kernel died error while doing the training using command trainer_stats = trainer.train()

Please check logs and see if you find something wrong here.
logs_kaggle.txt

danielhanchen · 2024-02-29T12:55:46Z

@dsbyprateekg Oh on the topic of Kaggle - would the Mistral notebook we have help? https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook I tested that vigourously, so hopefully that one doesn't have any issues

danielhanchen added the URGENT BUG Urgent bug label Feb 28, 2024

RonanKMcGovern mentioned this issue Feb 28, 2024

Implement DoRA huggingface/peft#1474

Merged

danielhanchen added the fixed - pending confirmation Fixed, waiting for confirmation from poster label Feb 28, 2024

dsbyprateekg closed this as completed Feb 28, 2024

danielhanchen reopened this Feb 28, 2024

danielhanchen added fixed Fixed! and removed URGENT BUG Urgent bug fixed - pending confirmation Fixed, waiting for confirmation from poster labels Feb 28, 2024

DarshanPatel-Yubi mentioned this issue Mar 11, 2024

Unable to resume training from checkpoint | ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group #233

Closed

shimmyshimmer closed this as completed Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora' #201

kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora' #201

dsbyprateekg commented Feb 28, 2024

Jonaskouwenhoven commented Feb 28, 2024

DeanChugall commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

RonanKMcGovern commented Feb 28, 2024

DeanChugall commented Feb 28, 2024

BenjaminBossan commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 29, 2024 •

edited

Loading

danielhanchen commented Feb 29, 2024

kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora' #201

kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora' #201

Comments

dsbyprateekg commented Feb 28, 2024

Jonaskouwenhoven commented Feb 28, 2024

DeanChugall commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

RonanKMcGovern commented Feb 28, 2024

DeanChugall commented Feb 28, 2024

BenjaminBossan commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

dsbyprateekg commented Feb 28, 2024

danielhanchen commented Feb 28, 2024

dsbyprateekg commented Feb 29, 2024 • edited Loading

danielhanchen commented Feb 29, 2024

dsbyprateekg commented Feb 29, 2024 •

edited

Loading