Need help with training a simple network #894

Alkamist · 2024-07-19T16:28:12Z

Alkamist
Jul 19, 2024

I'm relatively new to machine learning and I can't seem to figure out how to train a simple model. I am using the model in https://github.com/ggerganov/ggml/blob/master/examples/mnist/main.cpp, and I can load in the weights and evaluate it properly, but I have no idea how I should go about training it from scratch. I found a few examples of training, but they are either based on complicated networks or use old APIs that don't exist anymore or something.

My main problem is that I don't really know the exact procedure for training and what needs to be done beforehand. Would someone be so kind as to explain the training process with this library, and maybe provide some example code for a simple network such as the one above?

I have a ton of questions and I guess I can just ask them here, maybe it will give a better idea of where I'm at:

What tensors in my model do I need to set as parameters? I'm assuming just the weights and biases.
How many contexts should I be using? What is the proper technique to figuring out how much memory they need?
Where does all of the backend stuff fit in to this? What procedure should I be going through to properly set up the backend?
How do I set up and iterate on training batches? Should I be using ggml_opt or ggml_opt_resume_g?
What exactly are ggml_set_input and ggml_set_output and when should I be using them with respect to training?
When I call ggml_opt on my loss tensor, it crashes because my loss tensor apparently doesn't have a grad, how do I get it to have a grad?

I can post code if necessary, but it is in the language Odin.

Alkamist · 2024-07-20T23:00:20Z

Alkamist
Jul 20, 2024
Author

Well I figured out how to get some basic training going. My main problem that was causing the crash was that I needed to set the weights and biases as parameters in the same context.

Another problem I was having was trying to just copy over data directly to the tensor's data. You have to set tensor values with the functions for it to work properly, as per the warning in the comments in ggml.h.

Still unclear about how exactly to use contexts so everything is fast and efficient, and I am not sure at all about how the backend stuff works, but I'll be experimenting with that at some point.

2 replies

ggerganov Jul 22, 2024
Maintainer

Thanks for the interest in the project. The training functionality in ggml is very infantile at this point. I've personally done a few experiments in the very beginning of the project for optimizing simple functions (see test2.c and test3.c) using the automatic differentiation (AD) capability of ggml. But these are really very basic examples and a lot more work is necessary to make this project suitable for efficient training.

A more advanced training example is available in llama.cpp/examples/finetune, though with time the code has become a bit outdated and I personally haven't studied it in great detail yet.

Here's an attempt to answer some of your questions:

What tensors in my model do I need to set as parameters? I'm assuming just the weights and biases.

You need to call ggml_set_param on all tensors that you are going to be training. The optimizer will optimize the function with respect to those tensors

How many contexts should I be using? What is the proper technique to figuring out how much memory they need?

Where does all of the backend stuff fit in to this? What procedure should I be going through to properly set up the backend?

It's best to study the simple and gpt-2 examples to understand how to create contexts, allocate tensors and build compute graphs

How do I set up and iterate on training batches? Should I be using ggml_opt or ggml_opt_resume_g?

Not really sure - probably look for clues in the finetune example above

What exactly are ggml_set_input and ggml_set_output and when should I be using them with respect to training?

See #727. Use ggml_set_input on tensors that are constants and can change between graph computations (e.g. KQ_mask, KQ_pos). Use ggml_set_output to prevent later calculations in the graph to override intermediate results that you would like to save till the end of the computation

When I call ggml_opt on my loss tensor, it crashes because my loss tensor apparently doesn't have a grad, how do I get it to have a grad?

The loss tensor has to be the result of at least one tensor marked with ggml_set_param (see test2.c, test3.c). Calling ggml_set_param on a tensor will initialize it's grad and then all follow up tensors that are a function of this tensor will also have grad

But in any case, don't expect everything to work and always assume that there might be stuff that does not function correctly. At the very least, some backward operators don't have CPU/GPU implementations yet.

Recommend to look at @xaedes's work as to date they have done most of the ggml training examples:

Alkamist Jul 22, 2024
Author

Thank you for your detailed answers! I appreciate you taking the time. I think I understand a bit more how to use the library, although I am still unclear on a lot of details.

I managed to get up and running with some training on the CPU, I can't get CUDA to work without crashing. For some reason my accuracy is capping around 65% with what I'm doing, and then the ADAM optimizer just quits out. Even setting the max_no_improvement very high makes no difference, it just doesn't seem to improve beyond that point. I know more is possible since I can load pretrained weights and get 98%.

I'm sure I'm probably doing something wrong, but I'll post my code here. Feel free to take a look if you are feeling curious but no worries if you don't want to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help with training a simple network #894

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Need help with training a simple network #894

Alkamist Jul 19, 2024

Replies: 1 comment · 2 replies

Alkamist Jul 20, 2024 Author

ggerganov Jul 22, 2024 Maintainer

Alkamist Jul 22, 2024 Author

Alkamist
Jul 19, 2024

Replies: 1 comment 2 replies

Alkamist
Jul 20, 2024
Author

ggerganov Jul 22, 2024
Maintainer

Alkamist Jul 22, 2024
Author