-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetune LORA #2632
Finetune LORA #2632
Commits on Jul 28, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 5d124d0 - Browse repository at this point
Copy the full SHA 5d124d0View commit details -
remove unnecessary Adam(W) optimizer tensors.
reduces optimizer memory overhead from 7*modelsize to 2*modelsize. additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t. bumps training checkpoint file version, but old checkpoints can still be read. new version with less tensors is saved.
Configuration menu - View commit details
-
Copy full SHA for d39c8e6 - Browse repository at this point
Copy the full SHA d39c8e6View commit details -
Configuration menu - View commit details
-
Copy full SHA for d395b19 - Browse repository at this point
Copy the full SHA d395b19View commit details -
Configuration menu - View commit details
-
Copy full SHA for d7003a9 - Browse repository at this point
Copy the full SHA d7003a9View commit details -
implement gradient checkpointing for training
reduces memory overhead from O(n_layer) to O(sqrt(n_layer)) as explained in readme of https://github.com/cybertronai/gradient-checkpointing
Configuration menu - View commit details
-
Copy full SHA for 6e3f95b - Browse repository at this point
Copy the full SHA 6e3f95bView commit details -
Configuration menu - View commit details
-
Copy full SHA for e05e441 - Browse repository at this point
Copy the full SHA e05e441View commit details -
add and use function ggml_build_backward_expand to avoid stack overfl…
…ows with large maximum number of nodes GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);
Configuration menu - View commit details
-
Copy full SHA for ed4319e - Browse repository at this point
Copy the full SHA ed4319eView commit details -
change AdamW decay parameter to work like the torch AdamW decay param…
…eter It is now relative to Adam learning rate `alpha*sched`. Before that it was relative to `sched` only. `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
Configuration menu - View commit details
-
Copy full SHA for a80f184 - Browse repository at this point
Copy the full SHA a80f184View commit details -
change default AdamW weight decay parameter used in training to 0.1 a…
…s used in nanoGPT
Configuration menu - View commit details
-
Copy full SHA for f175ead - Browse repository at this point
Copy the full SHA f175eadView commit details -
change default AdamW weight decay parameter defined in ggml to 0.0, m…
…aking Adam default instead of AdamW btw: the default weight decay parameter for torch.optim.AdamW is 0.01
Configuration menu - View commit details
-
Copy full SHA for 97964a4 - Browse repository at this point
Copy the full SHA 97964a4View commit details -
bug fixes for cross entropy loss
ggml_cross_entropy_loss: sums where not correctly added in workload of each thread ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16 cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup. so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance.
Configuration menu - View commit details
-
Copy full SHA for 2c6985f - Browse repository at this point
Copy the full SHA 2c6985fView commit details -
fix test-grad0 for cross_entropy_loss
the second argument to cross_entropy_loss must sum up to 1 for each row
Configuration menu - View commit details
-
Copy full SHA for 2d1e6e0 - Browse repository at this point
Copy the full SHA 2d1e6e0View commit details -
dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work instead use sum(log(soft_max()*(1-eps)+eps)); use eps to avoid log(0)
Configuration menu - View commit details
-
Copy full SHA for 864e7e3 - Browse repository at this point
Copy the full SHA 864e7e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 87febee - Browse repository at this point
Copy the full SHA 87febeeView commit details -
change cross_entropy_loss to output average over all rows
this helps keeping the loss and gradients in a sane range
Configuration menu - View commit details
-
Copy full SHA for 51dc770 - Browse repository at this point
Copy the full SHA 51dc770View commit details -
improve gradient checkpointing
sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal. since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different: ``` given: n, u, v objective: minimize(a*u+b*v) where a*b=n, a>0, b>0 b=n/a minimize(a*u+v*n/a) diff(a*u+v*n/a, a) = u - (v*n/a)/a diff(a*u+v*n/a, a) == 0 u - (v*n/a)/a == 0 u == v*n/(a*a) u*a*a = v*n a*a = v*n/u a = sqrt(n*v/u) ``` this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage.
Configuration menu - View commit details
-
Copy full SHA for 3744a9b - Browse repository at this point
Copy the full SHA 3744a9bView commit details -
Configuration menu - View commit details
-
Copy full SHA for fc379a2 - Browse repository at this point
Copy the full SHA fc379a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for d0fbb7d - Browse repository at this point
Copy the full SHA d0fbb7dView commit details -
--enable-restart N Only for Adam optimizer. Enable restarts of cos-decay --disable-restart N Only for Adam optimizer. Disable restarts of cos-decay --opt-past N Number of optimization iterations to track for delta convergence test. Disabled when zero. --opt-delta N Maximum delta for delta convergence test. Disabled when <= zero. --opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero. --adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero. --adam-min-alpha N Adam minimum learning rate alpha, usually 0.1 * alpha
Configuration menu - View commit details
-
Copy full SHA for c6a18e1 - Browse repository at this point
Copy the full SHA c6a18e1View commit details -
replace memcpy with reshape operation so that the graph is not cut at…
… the input this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it
Configuration menu - View commit details
-
Copy full SHA for ce937bc - Browse repository at this point
Copy the full SHA ce937bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for ff759d9 - Browse repository at this point
Copy the full SHA ff759d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for e843d6e - Browse repository at this point
Copy the full SHA e843d6eView commit details -
add optimization callback to ggml_opt_resume_g
this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)). can be used for dynamic learning schedule and setting input data for batches before each iteration
Configuration menu - View commit details
-
Copy full SHA for bfc3119 - Browse repository at this point
Copy the full SHA bfc3119View commit details -
use optimization callback in training
allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration
Configuration menu - View commit details
-
Copy full SHA for d7aa4d9 - Browse repository at this point
Copy the full SHA d7aa4d9View commit details -
add minimum number of tensor dimensions to apply weight decay (defaul…
…t 2) this allows to not apply weight decay to bias parameters
Configuration menu - View commit details
-
Copy full SHA for e6ff072 - Browse repository at this point
Copy the full SHA e6ff072View commit details -
rename training parameter cos-decay-alpha to cos-decay-min and clarif…
…y that adam-min-alpha also applies to warmup
Configuration menu - View commit details
-
Copy full SHA for 58024d3 - Browse repository at this point
Copy the full SHA 58024d3View commit details -
fix increase of model.train_samples and model.train_tokens
now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations
Configuration menu - View commit details
-
Copy full SHA for 17a0898 - Browse repository at this point
Copy the full SHA 17a0898View commit details -
change sampling parameters for prediction after training to defaults …
…of common.h and clarify what is context for prediction and what are generated tokens
Configuration menu - View commit details
-
Copy full SHA for 24a4b09 - Browse repository at this point
Copy the full SHA 24a4b09View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1065c3b - Browse repository at this point
Copy the full SHA 1065c3bView commit details -
add conditional compilation of using F16 exp in flash attention
uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention
Configuration menu - View commit details
-
Copy full SHA for dbbc263 - Browse repository at this point
Copy the full SHA dbbc263View commit details -
Configuration menu - View commit details
-
Copy full SHA for 47055c9 - Browse repository at this point
Copy the full SHA 47055c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f6a8ab - Browse repository at this point
Copy the full SHA 0f6a8abView commit details -
remove out-commented vectorized code of opt_adam
the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead
Configuration menu - View commit details
-
Copy full SHA for 87035b9 - Browse repository at this point
Copy the full SHA 87035b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for ecdc161 - Browse repository at this point
Copy the full SHA ecdc161View commit details -
Configuration menu - View commit details
-
Copy full SHA for c1a5e11 - Browse repository at this point
Copy the full SHA c1a5e11View commit details -
Configuration menu - View commit details
-
Copy full SHA for 22cb368 - Browse repository at this point
Copy the full SHA 22cb368View commit details
Commits on Aug 6, 2023
-
Configuration menu - View commit details
-
Copy full SHA for d43af4b - Browse repository at this point
Copy the full SHA d43af4bView commit details -
add train function using automatic gradient checkpointing backward pa…
…ss and allocator
Configuration menu - View commit details
-
Copy full SHA for 2bf422e - Browse repository at this point
Copy the full SHA 2bf422eView commit details
Commits on Aug 14, 2023
-
in train function replace add_inplace by regular add
because using add_inplace seems to result in different gradients
Configuration menu - View commit details
-
Copy full SHA for fc826c8 - Browse repository at this point
Copy the full SHA fc826c8View commit details -
don't use allocate hash_map on context
because the context has no_alloc=True when using memory allocator resulting in NULL data pointers
Configuration menu - View commit details
-
Copy full SHA for d437415 - Browse repository at this point
Copy the full SHA d437415View commit details -
Configuration menu - View commit details
-
Copy full SHA for cfddc36 - Browse repository at this point
Copy the full SHA cfddc36View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0dd496c - Browse repository at this point
Copy the full SHA 0dd496cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 52c92c0 - Browse repository at this point
Copy the full SHA 52c92c0View commit details -
correctly clone view tensors by setting data pointers
without this the checkpointing would only work when being used together with memory allocator
Configuration menu - View commit details
-
Copy full SHA for 345f516 - Browse repository at this point
Copy the full SHA 345f516View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a11b75 - Browse repository at this point
Copy the full SHA 5a11b75View commit details -
swap arguments to commutative ops to be the same as in `forward_batch…
…_wo_cache_flash_attn`
Configuration menu - View commit details
-
Copy full SHA for b2f1310 - Browse repository at this point
Copy the full SHA b2f1310View commit details -
add input tensors as checkpoints
so that recursive tensor cloning of gradient checkpointing terminates on input tensors
Configuration menu - View commit details
-
Copy full SHA for 5884b43 - Browse repository at this point
Copy the full SHA 5884b43View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9716eb8 - Browse repository at this point
Copy the full SHA 9716eb8View commit details -
make sure some tensors are not reallocated by inserting new temporary…
… nodes depending on them: output and parameter gradient tensors need to be available at the end of the graph execution parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration checkpoint tensors are allocated all together to reduce memory allocator fragmentation afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs
Configuration menu - View commit details
-
Copy full SHA for 38f4438 - Browse repository at this point
Copy the full SHA 38f4438View commit details -
Configuration menu - View commit details
-
Copy full SHA for d6c5b03 - Browse repository at this point
Copy the full SHA d6c5b03View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4ed096c - Browse repository at this point
Copy the full SHA 4ed096cView commit details -
integrate unified training function which may use memory allocator
the unified training function also supports arguments whether to use flash attention and/or gradient checkpointing
Configuration menu - View commit details
-
Copy full SHA for 865c4cd - Browse repository at this point
Copy the full SHA 865c4cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e99a8d - Browse repository at this point
Copy the full SHA 3e99a8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 75baed2 - Browse repository at this point
Copy the full SHA 75baed2View commit details -
Configuration menu - View commit details
-
Copy full SHA for fe788a1 - Browse repository at this point
Copy the full SHA fe788a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for c954f41 - Browse repository at this point
Copy the full SHA c954f41View commit details -
Configuration menu - View commit details
-
Copy full SHA for 271e4d6 - Browse repository at this point
Copy the full SHA 271e4d6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f161c7 - Browse repository at this point
Copy the full SHA 6f161c7View commit details -
remove unused train params: mem_compute1_gb & mem_compute2_gb
mem_compute_gb is used for compute when automatic memory allocator is not enabled, otherwise it can be very small to only hold the tensor definitions mem_compute0_gb is used for automatic memory allocator (as long as measurement of max required size is not implemented)
Configuration menu - View commit details
-
Copy full SHA for 3794dce - Browse repository at this point
Copy the full SHA 3794dceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e280b2 - Browse repository at this point
Copy the full SHA 6e280b2View commit details -
add debug asserts in ggml_allocr_alloc to some common pitfalls when u…
…sing this function directly
Configuration menu - View commit details
-
Copy full SHA for faf3e21 - Browse repository at this point
Copy the full SHA faf3e21View commit details -
Configuration menu - View commit details
-
Copy full SHA for 098654c - Browse repository at this point
Copy the full SHA 098654cView commit details -
fix test when to create temporary backward graph
temporary backward graph is only necessary when using checkpointing
Configuration menu - View commit details
-
Copy full SHA for 3e6468b - Browse repository at this point
Copy the full SHA 3e6468bView commit details -
fix memory "leak" in optimizers
each iteration a new cplan with new memory for work data was allocated. now cplan creation only happens at the start of optimization, with each iteration reusing the cplan and its work data.
Configuration menu - View commit details
-
Copy full SHA for 5622846 - Browse repository at this point
Copy the full SHA 5622846View commit details -
reverse order of for loop in ggml_build_backward_expand to save memor…
…y when using gradient checkpointing and allocator with this loop order gradient checkpointing with allocator on 16 layer model saves 13% memory; 2 layer memory it saves 2% memory. the computation results are the same
Configuration menu - View commit details
-
Copy full SHA for 3b5515b - Browse repository at this point
Copy the full SHA 3b5515bView commit details
Commits on Aug 15, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 316b070 - Browse repository at this point
Copy the full SHA 316b070View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e059ac - Browse repository at this point
Copy the full SHA 5e059acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9eb1ef8 - Browse repository at this point
Copy the full SHA 9eb1ef8View commit details
Commits on Aug 16, 2023
-
add API functions to access remaining model parameters:
mult, head and rot
Configuration menu - View commit details
-
Copy full SHA for c0a372f - Browse repository at this point
Copy the full SHA c0a372fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 28ee0c8 - Browse repository at this point
Copy the full SHA 28ee0c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50b1e66 - Browse repository at this point
Copy the full SHA 50b1e66View commit details -
bug fixes to make finetune compile
automatic allocator does not work yet
Configuration menu - View commit details
-
Copy full SHA for be7e564 - Browse repository at this point
Copy the full SHA be7e564View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6202753 - Browse repository at this point
Copy the full SHA 6202753View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0ab2507 - Browse repository at this point
Copy the full SHA 0ab2507View commit details -
avoid stack overflow resulting from big ggml_cgraph
replace stack allocation and ggml_build_forward by ggml_new_graph in combination with ggml_build_forward_expand
Configuration menu - View commit details
-
Copy full SHA for 39a2d15 - Browse repository at this point
Copy the full SHA 39a2d15View commit details -
replace llama API functions to get model tensors by one function to g…
…et model tensor by name LLAMA_API struct ggml_tensor * llama_get_model_tensor(struct llama_model * model, const char * name);
Configuration menu - View commit details
-
Copy full SHA for 1151653 - Browse repository at this point
Copy the full SHA 1151653View commit details -
Configuration menu - View commit details
-
Copy full SHA for 79ad888 - Browse repository at this point
Copy the full SHA 79ad888View commit details -
Configuration menu - View commit details
-
Copy full SHA for 83cb9ed - Browse repository at this point
Copy the full SHA 83cb9edView commit details -
Configuration menu - View commit details
-
Copy full SHA for 83a4ad7 - Browse repository at this point
Copy the full SHA 83a4ad7View commit details -
Configuration menu - View commit details
-
Copy full SHA for f80e245 - Browse repository at this point
Copy the full SHA f80e245View commit details -
add ggml_add_cast API function
this function works like ggml_add, but accepts a data type for the resulting tensor. only supported for quantized src0 input.
Configuration menu - View commit details
-
Copy full SHA for 9198b24 - Browse repository at this point
Copy the full SHA 9198b24View commit details -
use ggml_add_cast in finetuning
lora-applied weights will now have data type F32, which improves gradients when finetuning quantized base models
Configuration menu - View commit details
-
Copy full SHA for 714fec0 - Browse repository at this point
Copy the full SHA 714fec0View commit details
Commits on Aug 17, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 0bb897c - Browse repository at this point
Copy the full SHA 0bb897cView commit details
Commits on Aug 18, 2023
-
make sure base model tensors data cannot be used in viewable operations
memory allocator would try to make lora application inplace on base model tensors. since those are memory mapped this will result in memory access violations
Configuration menu - View commit details
-
Copy full SHA for 44526cb - Browse repository at this point
Copy the full SHA 44526cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for a252111 - Browse repository at this point
Copy the full SHA a252111View commit details -
avoid keeping in memory ALL of the gradients
The problem here stems from ggml_graph_reset. This function is called in the optimization function, before each graph computation, to reset the gradients to zero. This required a unique memory slot for each gradient: allocating memory from a previosly freed memory location might lead to non-zero input gradients. During ggml_compute_backward the gradients are build stepwise by adding or substracting new values, starting from a OP_NONE tensor which needs to contain zero-values. This requires the graph reset. To avoid this I now remember in ggml_build_backward_expand the original OP_NONE gradient tensors in a hash table, which is passed to ggml_compute_backward. There instead of using add (or sub or similar) I test whether the existing gradient to be changed is a zero-valued-tensor by looking up its existence in the hash table. When it is such a zero-tensor it will not be modified, but replaced by the value to be added, otherwise the regular add (not inplace, allocator will take care of this) will be used. This way none of those zero-tensor values will be necessary in the final backward graph and more importantly they won't need a unique memory slot, just to make them zero.
Configuration menu - View commit details
-
Copy full SHA for f358204 - Browse repository at this point
Copy the full SHA f358204View commit details -
Configuration menu - View commit details
-
Copy full SHA for 011f47f - Browse repository at this point
Copy the full SHA 011f47fView commit details -
Configuration menu - View commit details
-
Copy full SHA for a0c2752 - Browse repository at this point
Copy the full SHA a0c2752View commit details -
Configuration menu - View commit details
-
Copy full SHA for 113c90f - Browse repository at this point
Copy the full SHA 113c90fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7a63d42 - Browse repository at this point
Copy the full SHA 7a63d42View commit details -
change default finetune params lora_r and lora_alpha to match the n_r…
…ank parameters of 4
Configuration menu - View commit details
-
Copy full SHA for 63cb374 - Browse repository at this point
Copy the full SHA 63cb374View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c98640 - Browse repository at this point
Copy the full SHA 6c98640View commit details -
remove unnecessary src tensor from ggml_get_rows_back
we don't need data of src[2] for computation, only to setup the correct output shape. remove dependency on src[2], so that allocator can work more freely. the computational graph is still completely determined, because the output shape is naturally included. this is similar to how ggml_reshape does it.
Configuration menu - View commit details
-
Copy full SHA for 65b0561 - Browse repository at this point
Copy the full SHA 65b0561View commit details -
remove unnecessary src tensor from ggml_repeat & ggml_repeat_back
we don't need data of src[1] for computation, only to setup the correct output shape. remove dependency on src[1], so that allocator can work more freely. the computational graph is still completely determined, because the output shape is naturally included
Configuration menu - View commit details
-
Copy full SHA for 3e47890 - Browse repository at this point
Copy the full SHA 3e47890View commit details -
allocator will only make it inplace when they are of the same type
Configuration menu - View commit details
-
Copy full SHA for 37dfb54 - Browse repository at this point
Copy the full SHA 37dfb54View commit details
Commits on Aug 20, 2023
-
mixing multiple LORA adapters is now possible
pass more than one '--lora FNAME' argument to apply more than one LORA. use '--lora-scaled FNAME S' when you want to specify a user-defined scale for an adapter.
Configuration menu - View commit details
-
Copy full SHA for d61ed6b - Browse repository at this point
Copy the full SHA d61ed6bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 27c24ff - Browse repository at this point
Copy the full SHA 27c24ffView commit details
Commits on Aug 21, 2023
-
also save latest finetune output with ITERATION="LATEST" and print wh…
…ere files are saved saving with LATEST makes it easier to resume training from the latest checkpoint the string "LATEST" can be configured with command line option "--fn-latest STR"
Configuration menu - View commit details
-
Copy full SHA for 8b4106a - Browse repository at this point
Copy the full SHA 8b4106aView commit details
Commits on Aug 23, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 77a3092 - Browse repository at this point
Copy the full SHA 77a3092View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a5f0a3 - Browse repository at this point
Copy the full SHA 1a5f0a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7df517c - Browse repository at this point
Copy the full SHA 7df517cView commit details
Commits on Aug 28, 2023
-
Merge branch 'master' into finetune-lora
# Conflicts: # examples/CMakeLists.txt # examples/train-text-from-scratch/train-text-from-scratch.cpp # ggml.c # llama.cpp # llama.h
Configuration menu - View commit details
-
Copy full SHA for b04263c - Browse repository at this point
Copy the full SHA b04263cView commit details -
Configuration menu - View commit details
-
Copy full SHA for aecc3b3 - Browse repository at this point
Copy the full SHA aecc3b3View commit details -
Configuration menu - View commit details
-
Copy full SHA for aa8016e - Browse repository at this point
Copy the full SHA aa8016eView commit details -
Configuration menu - View commit details
-
Copy full SHA for daedc6f - Browse repository at this point
Copy the full SHA daedc6fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ce92ae - Browse repository at this point
Copy the full SHA 5ce92aeView commit details -
remove prediction related code to reduce duplicated code with main
use main instead
Configuration menu - View commit details
-
Copy full SHA for 271c030 - Browse repository at this point
Copy the full SHA 271c030View commit details -
reduce large memory overhead in train-text-from-scratch
all gradients had to be pinned so that graph_reset works correctly. this is no longer necessary with the changes to ggml_compute_backward introduced in this PR.
Configuration menu - View commit details
-
Copy full SHA for 9a28bce - Browse repository at this point
Copy the full SHA 9a28bceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 49af7fb - Browse repository at this point
Copy the full SHA 49af7fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 007280c - Browse repository at this point
Copy the full SHA 007280cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1faee64 - Browse repository at this point
Copy the full SHA 1faee64View commit details -
Configuration menu - View commit details
-
Copy full SHA for a3b4529 - Browse repository at this point
Copy the full SHA a3b4529View commit details -
Configuration menu - View commit details
-
Copy full SHA for ca97583 - Browse repository at this point
Copy the full SHA ca97583View commit details -
add LLM_KV_TRAINING_TYPE to train-text-from-scratch checkpoints
so that they can be differentiated from lora finetune checkpoints
Configuration menu - View commit details
-
Copy full SHA for e030f7b - Browse repository at this point
Copy the full SHA e030f7bView commit details -
Configuration menu - View commit details
-
Copy full SHA for ecb1b20 - Browse repository at this point
Copy the full SHA ecb1b20View commit details
Commits on Aug 29, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 0564f4e - Browse repository at this point
Copy the full SHA 0564f4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6134ad4 - Browse repository at this point
Copy the full SHA 6134ad4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1425968 - Browse repository at this point
Copy the full SHA 1425968View commit details -
remove code to print data checksums which was used to verify correctn…
…ess of new gguf code
Configuration menu - View commit details
-
Copy full SHA for ebff3a1 - Browse repository at this point
Copy the full SHA ebff3a1View commit details -
omit tokenization when training is disabled, only save llama lora ada…
…pter training can be disabled by passing '-n 0' to finetune
Configuration menu - View commit details
-
Copy full SHA for 5813ac8 - Browse repository at this point
Copy the full SHA 5813ac8View commit details -
Configuration menu - View commit details
-
Copy full SHA for a6165da - Browse repository at this point
Copy the full SHA a6165daView commit details -
Configuration menu - View commit details
-
Copy full SHA for e28cf7e - Browse repository at this point
Copy the full SHA e28cf7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 794bb7e - Browse repository at this point
Copy the full SHA 794bb7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5f0a4e9 - Browse repository at this point
Copy the full SHA 5f0a4e9View commit details -
add ggml API functions ggml_unravel_index, ggml_get_i32_nd and its an…
…alogs for set and for f32 ggml_get_i32_1d, ggml_set_i32_1d, ggml_get_f32_1d, ggml_set_f32_1d now support non-contiguous tensors. in case of non-contiguous tensor, the 1d index is unraveled into a multi index using ggml_unravel_index to be passed to '_nd' function equivalent. this fixes a bug in test-grad0 which happens due to ggml_build_backward not building purely contiguous tensors anymore
Configuration menu - View commit details
-
Copy full SHA for 82c5247 - Browse repository at this point
Copy the full SHA 82c5247View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5fcfa7e - Browse repository at this point
Copy the full SHA 5fcfa7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for b1aa26f - Browse repository at this point
Copy the full SHA b1aa26fView commit details -
Configuration menu - View commit details
-
Copy full SHA for a76e66a - Browse repository at this point
Copy the full SHA a76e66aView commit details -
remove unused 'inplace' argument from ggml_compute_backward function
inplace operations to add gradients are no longer created by ggml_compute_backward use allocator to automatically make inplace operations
Configuration menu - View commit details
-
Copy full SHA for dd4e4bc - Browse repository at this point
Copy the full SHA dd4e4bcView commit details -
add missing argument 'int i0' to ggml_get_i32_nd & ggml_set_i32_nd he…
…ader declarations
Configuration menu - View commit details
-
Copy full SHA for 8a96d4c - Browse repository at this point
Copy the full SHA 8a96d4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 281245a - Browse repository at this point
Copy the full SHA 281245aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5854f51 - Browse repository at this point
Copy the full SHA 5854f51View commit details -
ggml_build_backward_expand was previously replaced by ggml_build_backward, but the assignment of forward graph to backward graph missing
Configuration menu - View commit details
-
Copy full SHA for bf70e27 - Browse repository at this point
Copy the full SHA bf70e27View commit details
Commits on Aug 30, 2023
-
Configuration menu - View commit details
-
Copy full SHA for b1709f2 - Browse repository at this point
Copy the full SHA b1709f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2392b67 - Browse repository at this point
Copy the full SHA 2392b67View commit details -
move gradient checkpointing code into ggml, new API function:
// build gradient checkpointing backward graph gb for gf using provided checkpoints // gb_tmp will contain original backward graph with rewritten backward process nodes, // but without the second forward pass nodes. GGML_API void ggml_build_backward_gradient_checkpointing( struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, struct ggml_cgraph * gb_tmp, struct ggml_tensor * * checkpoints, int n_checkpoints);
Configuration menu - View commit details
-
Copy full SHA for d487e05 - Browse repository at this point
Copy the full SHA d487e05View commit details -
Configuration menu - View commit details
-
Copy full SHA for e6b7158 - Browse repository at this point
Copy the full SHA e6b7158View commit details -
train-text-from-scratch can train (full finetune) gguf models
just pass the gguf model via `--checkpoint-in FN`. after this, to continue training, pass the generated checkpoint instead of the original gguf model. tested with smaller models, bigger models may exceed available memory. use (LORA) finetune for those.
Configuration menu - View commit details
-
Copy full SHA for fc456ed - Browse repository at this point
Copy the full SHA fc456edView commit details -
Configuration menu - View commit details
-
Copy full SHA for f3590ad - Browse repository at this point
Copy the full SHA f3590adView commit details -
Configuration menu - View commit details
-
Copy full SHA for b26bd4c - Browse repository at this point
Copy the full SHA b26bd4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4e986ac - Browse repository at this point
Copy the full SHA 4e986acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c57f9f - Browse repository at this point
Copy the full SHA 0c57f9fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4fd51c4 - Browse repository at this point
Copy the full SHA 4fd51c4View commit details
Commits on Aug 31, 2023
-
remove finetune option to disable allocator
the allocator should always be used. by making sure that it is always used it gets easier to implement automatic memory requirements computation
Configuration menu - View commit details
-
Copy full SHA for e0da168 - Browse repository at this point
Copy the full SHA e0da168View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4914f85 - Browse repository at this point
Copy the full SHA 4914f85View commit details
Commits on Sep 1, 2023
-
Configuration menu - View commit details
-
Copy full SHA for d554a70 - Browse repository at this point
Copy the full SHA d554a70View commit details -
add ggml-alloc API function 'ggml_allocr_max_size' to get max size of…
… alloc GGML_API size_t ggml_allocr_max_size(struct ggml_allocr * alloc);
Configuration menu - View commit details
-
Copy full SHA for 7e01d11 - Browse repository at this point
Copy the full SHA 7e01d11View commit details -
finetune: automatically allocate all memory and changes to command li…
…ne options remove '--n_examples N' parameter, as it no longer makes sense to call optimization process multiple times in a loop. add '--only_write_lora' command line option: will skip tokenization and training, to only write a llama.cpp comptabile LORA adapter. remove memory buffer related command line options. improve iteration console output.
Configuration menu - View commit details
-
Copy full SHA for 5bba329 - Browse repository at this point
Copy the full SHA 5bba329View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6cbf55a - Browse repository at this point
Copy the full SHA 6cbf55aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7acb124 - Browse repository at this point
Copy the full SHA 7acb124View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6809eb7 - Browse repository at this point
Copy the full SHA 6809eb7View commit details -
Configuration menu - View commit details
-
Copy full SHA for c32ad44 - Browse repository at this point
Copy the full SHA c32ad44View commit details
Commits on Sep 2, 2023
-
increase measured alloc size by tensor_alignment
ggml_allocr_reset will reduce the given size by up to tensor_alignment-1
Configuration menu - View commit details
-
Copy full SHA for 6ee12b1 - Browse repository at this point
Copy the full SHA 6ee12b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for cfe217f - Browse repository at this point
Copy the full SHA cfe217fView commit details -
Configuration menu - View commit details
-
Copy full SHA for ded6382 - Browse repository at this point
Copy the full SHA ded6382View commit details -
bug fix, probably solves the 'ggml_allocr_alloc: not enough space in …
…the buffer' issue
Configuration menu - View commit details
-
Copy full SHA for 8d982c8 - Browse repository at this point
Copy the full SHA 8d982c8View commit details -
"bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue" "alloc was freeing an externally allocated tensor, because it calculated the end of allocator memory as alloc->data + alloc->max_size instead of alloc->data + alloc->size." This is intentional to reduce the risk of freeing external tensors when measuring. Unless max_size is not properly calculated, I don't see why this is an issue.
Configuration menu - View commit details
-
Copy full SHA for 1ce7023 - Browse repository at this point
Copy the full SHA 1ce7023View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d2bdc0 - Browse repository at this point
Copy the full SHA 2d2bdc0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 80ac697 - Browse repository at this point
Copy the full SHA 80ac697View commit details
Commits on Sep 3, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 406e075 - Browse repository at this point
Copy the full SHA 406e075View commit details -
Configuration menu - View commit details
-
Copy full SHA for e07f5c5 - Browse repository at this point
Copy the full SHA e07f5c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for bdb7092 - Browse repository at this point
Copy the full SHA bdb7092View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50589ed - Browse repository at this point
Copy the full SHA 50589edView commit details
Commits on Sep 4, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 9ea2f7f - Browse repository at this point
Copy the full SHA 9ea2f7fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d3afd71 - Browse repository at this point
Copy the full SHA d3afd71View commit details -
specify number accumulation steps with '--grad-acc N'. this will simulate a bigger batch size of grad_acc*batch.
Configuration menu - View commit details
-
Copy full SHA for c1c3b0e - Browse repository at this point
Copy the full SHA c1c3b0eView commit details
Commits on Sep 5, 2023
-
Configuration menu - View commit details
-
Copy full SHA for d07b6aa - Browse repository at this point
Copy the full SHA d07b6aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 786e786 - Browse repository at this point
Copy the full SHA 786e786View commit details -
Configuration menu - View commit details
-
Copy full SHA for d375b8f - Browse repository at this point
Copy the full SHA d375b8fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 867e7c2 - Browse repository at this point
Copy the full SHA 867e7c2View commit details
Commits on Sep 6, 2023
-
improve finetune time measurement
fix printf warnings on system where int64_t is (long int). change time datatypes to double because values get big with long training times. exclude file saving from time measurement. converge faster to actual time per iteration by removing very small first duration before first iteration was performed. fix bug in output of total training time, the reported value was 1000 times to small.
Configuration menu - View commit details
-
Copy full SHA for 8c2d7e3 - Browse repository at this point
Copy the full SHA 8c2d7e3View commit details -
specify default lora rank with '--lora-r N'
'--lora-r N' will specify default rank for all tensors '--rank-wq N', etc. will override this default rank for specific tensor types.
Configuration menu - View commit details
-
Copy full SHA for c08fcf5 - Browse repository at this point
Copy the full SHA c08fcf5View commit details -
Merge branch 'master' into finetune-lora
# Conflicts: # common/common.cpp
Configuration menu - View commit details
-
Copy full SHA for 0393116 - Browse repository at this point
Copy the full SHA 0393116View commit details -
Configuration menu - View commit details
-
Copy full SHA for de6170d - Browse repository at this point
Copy the full SHA de6170dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c2c9c7 - Browse repository at this point
Copy the full SHA 0c2c9c7View commit details
Commits on Sep 9, 2023
-
support grouped-query-attention in ggml_flash_attn and ggml_flash_att…
…n_back k and v can now be repeated in q along ne[2] in forward pass just use modulo to compute k and v indices, like ik2 = iq2 % nek2. in backard pass this won't work as easy, because multiple threads will compete to accumulate to the same k->grad[:,ik1,ik2,ik3] and v->grad[:,iv1,iv2,iv3]. so we change the parallelization over q rows to be over k rows. this ensures non-overlapping (ik2,ik3) across threads. in each thread we then iterate over the number of repetitions of k/v in q to compute iq2 as iq2 = ik2 + irep*nek2. since ne2 is not the same for q,k and v we also change how the gradients are concatenated into the result tensor. additionally the offsets of gradq, gradk and gradv in the result tensor are now memory aligned. we also simplify the compute_backward part of flash_attn to use ggml_reshape instead of switching over the number of dimensions. this needs a small change to ggml_reshape, removing the assertion of second argument to be contiguous. since only the shape (ne) of the second reshape argument is of relevance, its memory layout (nb) is irrelevant -> it can very well be non-contiguous. change test-grad0 to also test for repeated k/v in q. this changes the rng and now results in small gradient differences in softmax. these solely come from using f16 exp table lookup in forward softmax: when temporarily changing softmax to use actual exp function, the reported gradient differences go away. gradient differences coming solely from f16 table lookup are acceptable. added a note to explain this.
Configuration menu - View commit details
-
Copy full SHA for d7aade7 - Browse repository at this point
Copy the full SHA d7aade7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 833a56c - Browse repository at this point
Copy the full SHA 833a56cView commit details -
fix finetune to support grouped-query-attention (using flash-attention)
note: ggml changes to ggml_out_prod are necessary to support grouped-query-attention without flash-attention.
Configuration menu - View commit details
-
Copy full SHA for 35260f7 - Browse repository at this point
Copy the full SHA 35260f7View commit details -
support broadcastable a in out_prod(a, b) and backward pass of broadc…
…asting mul_mat(a, b)
Configuration menu - View commit details
-
Copy full SHA for aea8b6b - Browse repository at this point
Copy the full SHA aea8b6bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dd32786 - Browse repository at this point
Copy the full SHA dd32786View commit details -
decouple random number generator of each operation test
when changing one test the rng of others tests is not influenced anymore
Configuration menu - View commit details
-
Copy full SHA for 9738526 - Browse repository at this point
Copy the full SHA 9738526View commit details -
Configuration menu - View commit details
-
Copy full SHA for d3aaf08 - Browse repository at this point
Copy the full SHA d3aaf08View commit details -
Configuration menu - View commit details
-
Copy full SHA for d3f1b43 - Browse repository at this point
Copy the full SHA d3f1b43View commit details -
add cgraph evaluation order member and corresponding enum type
this controls in which order ggml_build_forward visits source nodes. by default the nodes are visited left to right, i.e. src[0] first. in some cases it is beneficial for ggml-alloc to visit in a different order. two possible orders are supported: left-to-right (src[0] first) and right-to-left (src[0] last).
Configuration menu - View commit details
-
Copy full SHA for 917d287 - Browse repository at this point
Copy the full SHA 917d287View commit details -
measure max compute size for each cgraph eval order and use best order
this can bring huge memory savings: e.g. codellama-34b with n_ctx=64, n_batch=1 goes from 92927.8mb down to 4627.6 MB
Configuration menu - View commit details
-
Copy full SHA for ace9088 - Browse repository at this point
Copy the full SHA ace9088View commit details -
Merge branch 'master' into finetune-lora
# Conflicts: # examples/train-text-from-scratch/train-text-from-scratch.cpp # llama.h
Configuration menu - View commit details
-
Copy full SHA for 54b21a3 - Browse repository at this point
Copy the full SHA 54b21a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1cef459 - Browse repository at this point
Copy the full SHA 1cef459View commit details
Commits on Sep 13, 2023
-
add sample start patterns and options to force new or by default resu…
…me last shuffling
Configuration menu - View commit details
-
Copy full SHA for 0e32932 - Browse repository at this point
Copy the full SHA 0e32932View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7898652 - Browse repository at this point
Copy the full SHA 7898652View commit details -
Configuration menu - View commit details
-
Copy full SHA for ec57689 - Browse repository at this point
Copy the full SHA ec57689View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f378a7 - Browse repository at this point
Copy the full SHA 7f378a7View commit details
Commits on Sep 14, 2023
-
Configuration menu - View commit details
-
Copy full SHA for f627e2f - Browse repository at this point
Copy the full SHA f627e2fView commit details -
account for possible leading whitespace that will be added by tokenizer
e.g. '\t' will be tokenized by llama spm tokenizer to [29871, 12]
Configuration menu - View commit details
-
Copy full SHA for 2c59f7b - Browse repository at this point
Copy the full SHA 2c59f7bView commit details -
use unrolled vec_mad in out_prod
y is vec_mad result vec. x is vec_mad input vec. v is vec_mad input scalar. ggml_vec_mad_f32_unroll will internally loop over x and v with same y. GGML_VEC_MAD_UNROLL is by default defined to 32. This value is empirical optimized using performance test runs of out-prod in openllama-3b finetune with 256 context length and batch size 1. It gives 23% performance boost for out_prod. Full measurements of out-prod runtime in ms: unroll_xv unroll_yv 1 67014.643 87826.469 2 77117.552 89077.656 4 72091.311 109121.657 8 61077.543 88678.334 16 56914.67 79514.947 24 59024.595 84350.254 28 55952.446 83368.73 32 51476.658 85177.745 36 55973.792 84659.92 40 55139.616 93844.738 48 60736.392 93330.267 64 99856.878 116994.99 Second column is when unrollying yv instead of xv
Configuration menu - View commit details
-
Copy full SHA for 20cf1a4 - Browse repository at this point
Copy the full SHA 20cf1a4View commit details -
set lora_alpha to value of lora_r if it is not set via command line
otherwise only changing lora_r will change scaling of lora adapter used in prediction
Configuration menu - View commit details
-
Copy full SHA for 3a9c1d7 - Browse repository at this point
Copy the full SHA 3a9c1d7View commit details -
reshuffle original sample order instead of the previous shuffled order
otherwise resumed reshuffle will not result in same sample order
Configuration menu - View commit details
-
Copy full SHA for 0971fee - Browse repository at this point
Copy the full SHA 0971feeView commit details -
block tiling for out-prod inspired by mul-mat
block sizes are empirically optimized roughly doubles the flops of out-prod
Configuration menu - View commit details
-
Copy full SHA for d88dae2 - Browse repository at this point
Copy the full SHA d88dae2View commit details -
exclude some more known zero values from computations in flash_attn_f…
…32 & flash_attn_back_f32
Configuration menu - View commit details
-
Copy full SHA for 76804fa - Browse repository at this point
Copy the full SHA 76804faView commit details
Commits on Sep 15, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 4f2ce91 - Browse repository at this point
Copy the full SHA 4f2ce91View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc60b3f - Browse repository at this point
Copy the full SHA cc60b3fView commit details -
update train-text-from-scratch with tokenization, sample selection an…
…d shuffling from finetune
Configuration menu - View commit details
-
Copy full SHA for ab56b63 - Browse repository at this point
Copy the full SHA ab56b63View commit details
Commits on Sep 16, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 00b656f - Browse repository at this point
Copy the full SHA 00b656fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f4b1bf - Browse repository at this point
Copy the full SHA 9f4b1bfView commit details -
Configuration menu - View commit details
-
Copy full SHA for a8c8907 - Browse repository at this point
Copy the full SHA a8c8907View commit details -
move train data saving code into callback to unify code of opt_callback
train_params are still different in finetune and train-text-from-scratch, so it can't yet be moved to train.h|cpp
Configuration menu - View commit details
-
Copy full SHA for ee27333 - Browse repository at this point
Copy the full SHA ee27333View commit details -
Configuration menu - View commit details
-
Copy full SHA for e9758ae - Browse repository at this point
Copy the full SHA e9758aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for bef1e97 - Browse repository at this point
Copy the full SHA bef1e97View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7aa9ea7 - Browse repository at this point
Copy the full SHA 7aa9ea7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 48d3509 - Browse repository at this point
Copy the full SHA 48d3509View commit details -
increase train_samples by used_samples instead of number of batches
on batch can contain more than one sample when option "fill_with_next_samples" is used
Configuration menu - View commit details
-
Copy full SHA for 571dc94 - Browse repository at this point
Copy the full SHA 571dc94View commit details -
Merge branch 'master' into finetune-lora
# Conflicts: # Makefile # examples/baby-llama/baby-llama.cpp # examples/train-text-from-scratch/train-text-from-scratch.cpp # llama.cpp
Configuration menu - View commit details
-
Copy full SHA for d3e06d3 - Browse repository at this point
Copy the full SHA d3e06d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7930caf - Browse repository at this point
Copy the full SHA 7930cafView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8d82d4c - Browse repository at this point
Copy the full SHA 8d82d4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9139fec - Browse repository at this point
Copy the full SHA 9139fecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d33ec5 - Browse repository at this point
Copy the full SHA 1d33ec5View commit details -
use die("msg") instead of replace GGML_ASSERT(!"msg") or throw std::r…
…untime_error("msg")
Configuration menu - View commit details
-
Copy full SHA for 1d09965 - Browse repository at this point
Copy the full SHA 1d09965View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9db2664 - Browse repository at this point
Copy the full SHA 9db2664View commit details -
remove terminating '\0' from tokenization
(llama_tokenize is now passed the string length instead of relying on terminating '\0')
Configuration menu - View commit details
-
Copy full SHA for dd3e763 - Browse repository at this point
Copy the full SHA dd3e763View commit details -
Configuration menu - View commit details
-
Copy full SHA for 83061fb - Browse repository at this point
Copy the full SHA 83061fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8721785 - Browse repository at this point
Copy the full SHA 8721785View commit details
Commits on Sep 17, 2023
-
use new/delete for train_state instead of malloc/free
using malloc may result in seg faults when trying to assign string fields
Configuration menu - View commit details
-
Copy full SHA for ddf5ac2 - Browse repository at this point
Copy the full SHA ddf5ac2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 151bfe9 - Browse repository at this point
Copy the full SHA 151bfe9View commit details -
Configuration menu - View commit details
-
Copy full SHA for bf2ad65 - Browse repository at this point
Copy the full SHA bf2ad65View commit details -
add train option "--sample-random-offsets"
Use samples beginning at random offsets. The offset is only applied to the first sample in each batch context window. Together with "--fill-with-next-samples" this may help for training endless text generation. For example given a dataset containing samples "abcd", "ABCD", "0123". With context size of 8 and options "--fill-with-next-samples", "--no-separate-with-eos", "--no-separate-with-bos", the context windows of batches could only be filled with "abcdABCD", "ABCDabcd", "0123abcd", etc. With "--sample-random-offsets" it can also be filled with "23abcdAB", "bcd0123A", etc.
Configuration menu - View commit details
-
Copy full SHA for d1bb6fb - Browse repository at this point
Copy the full SHA d1bb6fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 56a03fa - Browse repository at this point
Copy the full SHA 56a03faView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1dbd6bc - Browse repository at this point
Copy the full SHA 1dbd6bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ed3098 - Browse repository at this point
Copy the full SHA 5ed3098View commit details -
Configuration menu - View commit details
-
Copy full SHA for b0ee563 - Browse repository at this point
Copy the full SHA b0ee563View commit details -
move some params from lora hparams into model hparams and load model …
…params from gguf this equalizes the model definition in finetune and text-from-scratch and removes the need for additional llama api functions to get model parameters
Configuration menu - View commit details
-
Copy full SHA for 934ad8d - Browse repository at this point
Copy the full SHA 934ad8dView commit details -
remove now unnecessary llama API functions to get model params that w…
…here added by this PR
Configuration menu - View commit details
-
Copy full SHA for dd94ce4 - Browse repository at this point
Copy the full SHA dd94ce4View commit details -
train-text-from-scratch: automatically allocate model tensors, remove…
… option '--mem-model N'
Configuration menu - View commit details
-
Copy full SHA for 9e10fa9 - Browse repository at this point
Copy the full SHA 9e10fa9View commit details -
Configuration menu - View commit details
-
Copy full SHA for db38d2b - Browse repository at this point
Copy the full SHA db38d2bView commit details -
Configuration menu - View commit details
-
Copy full SHA for f9b5d9b - Browse repository at this point
Copy the full SHA f9b5d9bView commit details -
Configuration menu - View commit details
-
Copy full SHA for c993246 - Browse repository at this point
Copy the full SHA c993246View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b9d974 - Browse repository at this point
Copy the full SHA 3b9d974View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ce74ee - Browse repository at this point
Copy the full SHA 5ce74eeView commit details
Commits on Sep 22, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 0ede0f4 - Browse repository at this point
Copy the full SHA 0ede0f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for b91e3dd - Browse repository at this point
Copy the full SHA b91e3ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for d38260b - Browse repository at this point
Copy the full SHA d38260bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 904c19b - Browse repository at this point
Copy the full SHA 904c19bView commit details -
add export-lora build dependency to llama
because it depends on common, which depends on llama
Configuration menu - View commit details
-
Copy full SHA for 758c46c - Browse repository at this point
Copy the full SHA 758c46cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9145c87 - Browse repository at this point
Copy the full SHA 9145c87View commit details -
Configuration menu - View commit details
-
Copy full SHA for da05205 - Browse repository at this point
Copy the full SHA da05205View commit details
Commits on Sep 24, 2023
-
improve handling of export-lora arguments
print errors and warnings when files could not be read or created
Configuration menu - View commit details
-
Copy full SHA for 2912f17 - Browse repository at this point
Copy the full SHA 2912f17View commit details -
Fix export-lora.cpp "not enough space in the context's memory pool" (#1)
* Fix export-lora.cpp "not enough space in the context's memory pool" Without this patch, export-lora would sometimes error with "not enough space in the context's memory pool (needed 656784, available 656800)". * increase required context size by 5*GGML_MEM_ALIGN instead of plain 16 --------- Co-authored-by: xaedes <xaedes@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for ad64e33 - Browse repository at this point
Copy the full SHA ad64e33View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1660658 - Browse repository at this point
Copy the full SHA 1660658View commit details
Commits on Sep 28, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 5461129 - Browse repository at this point
Copy the full SHA 5461129View commit details