[Feature] Add config for RoPE scaling #342

NanoCode012 · 2023-08-05T17:02:30Z

Seems like we can open two config

rope_scaling_type:
rope_scaling_factor:

Validation is done in transformer side.

Ref: https://github.com/gante/transformers/blob/30409af6e1b2b5efb6d9932b3e3b4ce20cfdb30e/src/transformers/models/llama/configuration_llama.py#L155-L174

The text was updated successfully, but these errors were encountered:

NanoCode012 · 2023-08-05T17:04:52Z

Related #245

ashercn97 · 2023-08-05T20:51:59Z

What do the inputs mean? I wanna try to use this but im just a little unsure of what they mean?

ashercn97 · 2023-08-05T20:54:26Z

@NanoCode012 Like rope_scaling_type and rope_scaling_factor

NanoCode012 · 2023-08-06T04:15:35Z

Please see #343 for more details @ashercn97 . I added readme sample. I chose to choose the default names.

ashercn97 · 2023-08-06T11:42:59Z

@NanoCode012 Okay checking out rn!

NanoCode012 · 2023-08-06T11:44:03Z

@ashercn97 ,, please let me know how it goes.. I don't have the compute nor time to run it as of now.

ashercn97 · 2023-08-06T20:53:13Z

@NanoCode012 What are the options for the two things? I think that for the factor is it greater than 1, but the only thing I saw about it somewhere else was like .5 so im a little confused, and also what are the two options for the type. Sorry one more thing whree does it go in the config file?

ashercn97 · 2023-08-06T20:58:49Z

OH WAIT I THINK I GOT IT LEMME TRY SETTING UP A RUNPOD AND RUNNING IT!

ashercn97 · 2023-08-06T21:03:10Z

Okay i got it but what do the float numbers do? Is there a resource you can point me to that has like what the number means?

ashercn97 · 2023-08-06T21:06:37Z

@NanoCode012 Do i cahnge the sequence_len thing when im doing the rope scaling? I saw somehwere something about .5 so Im doing that and it says it like doubles it or something so do i make sequence length bigger or keep the same?

NanoCode012 · 2023-08-07T04:35:39Z

@ashercn97 , An example from llongma 2 is linear, factor 2 or 4. I would say experiment with both seq_len (keep default 4k, or increase to expected max).

This is something I am also unsure on.

NanoCode012 · 2023-08-07T04:36:27Z

Regarding factor reference, please see the linked ref in first post or the one linked in the PR.

ashercn97 · 2023-08-07T19:55:09Z

okay tysm @NanoCode012 !!

NanoCode012 added the enhancement New feature or request label Aug 5, 2023

NanoCode012 mentioned this issue Aug 5, 2023

Feat: Add rope scaling #343

Merged

1 task

NanoCode012 closed this as completed in #343 Aug 12, 2023

NanoCode012 mentioned this issue Aug 13, 2023

Any special considerations for training a SuperHOT model? (context > 2048 tokens) #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add config for RoPE scaling #342

[Feature] Add config for RoPE scaling #342

NanoCode012 commented Aug 5, 2023

NanoCode012 commented Aug 5, 2023

ashercn97 commented Aug 5, 2023

ashercn97 commented Aug 5, 2023

NanoCode012 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

NanoCode012 commented Aug 6, 2023 •

edited

Loading

ashercn97 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

NanoCode012 commented Aug 7, 2023

NanoCode012 commented Aug 7, 2023

ashercn97 commented Aug 7, 2023

[Feature] Add config for RoPE scaling #342

[Feature] Add config for RoPE scaling #342

Comments

NanoCode012 commented Aug 5, 2023

NanoCode012 commented Aug 5, 2023

ashercn97 commented Aug 5, 2023

ashercn97 commented Aug 5, 2023

NanoCode012 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

NanoCode012 commented Aug 6, 2023 • edited Loading

ashercn97 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

ashercn97 commented Aug 6, 2023

NanoCode012 commented Aug 7, 2023

NanoCode012 commented Aug 7, 2023

ashercn97 commented Aug 7, 2023

NanoCode012 commented Aug 6, 2023 •

edited

Loading