-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a simpler main example #3080
Add a simpler main example #3080
Conversation
I also found something pretty weird in the llama.cpp/examples/simple-inference/simple-inference.cpp Lines 359 to 369 in 33782a7
I'm not sure if this happens every time there's interactive input, but it definitely seems to happen when specifying a prompt. On larger models, doing an extra (apparently unneeded) eval can be pretty slow. Is there an actual reason it works like that? |
The main idea of The extra Do you get the same results as
I expect the answer is yes, but this still is not OK and we should look into |
I'm working on TUI version of main, with couple of nice features , like tty separated from stdin/stdout, so it's possible to naively pipe raw text in and out, and still have TUI with keyboard input. I just got piping to work on windows yesterday ( turns out, "con" is equivalent to "/dev/tty" on windows, and it works with native cmd :) ) I'm planning on adding notebook-like interface and chat interface, which would allow to recreate current interactive mode from If this simple Basically, if you want to simplify main, I can move the discarded functionality to TUI example. Though there's still quite a bit of work to do for TUI to be functional, I'm not sure how long it's gonna take. |
@staviq I've slowly been working on a more user friendly interface to main. It is difficult without using 3rd party libraries - moving around in the terminal in such a way that it works with most terminal emulators was very difficult to figure out - but I do have it to where you can freely edit the current input (moving around with arrow keys and home, end, etc.). It can also catch most command keys (ctrl/alt/F1-10) on Windows, Linux, and MacOS. I haven't added in what other keys do yet, but it's working in all environments and terminals I've been able to test it in. I could make a PR in the next couple days to push those features over. They wouldn't break anything as is, I just felt like - since it's a larger patch - it would be easier to understand why I'm adding so much code when you can see the more advanced interface additions working. My next step was to either add a new function or change readLine to return a more general struct/object that could contain the next line of text the user entered or the command they gave. This would allow for making I only loosely have things planned out further than that. I would have to edit main to print to the terminal by handing the tokens to a new function that print to the console so that I could track where the token boundaries are so you could edit and resume or generate new text at any point in your history instead of just your current input as is functional (but uncommitted) as of now. I don't any of see these changes at odds with your plans, but I do think there may be some overlap. And there's also some room for multiple implementations of main, as we see here. |
@DannyDaemonic You are ahead of me, because my code is still in separate poc chunks. I'd need couple more weeks probably, so don't worry about it. Do your thing, and when I see your PR, if I find something useful in my version to offer, I'll post a comment. You could probably do a draft PR, so others could add to it too, because I have a feeling more people than us tried to implement better main, or something to that extent. Edit: I hope this simple main gets accepted, so there is always a solid, stable and "don't touch it" |
The answer is actually no! // Required to match output from main example with a specific seed - but why?
if (true) {
llama_token id = llama_sample_token(ctx, NULL, grammar, params, last_tokens, candidates);
const std::string token_str = llama_token_to_piece(ctx, id);
fputs(token_str.c_str(), stdout);
fflush(stdout);
} does not change the output at all. It seems to actually need an eval with the sampled token to sync up. (In that simple code it doesn't add the token to If if I just do: if (true) {
llama_token id = 3557; // this is the id that was sampled
if (llama_eval(ctx, &id, 1, last_tokens.size(), params.n_threads)) {
LOG_TEE("%s : failed to eval\n", __func__);
return 1;
}
const std::string token_str = llama_token_to_piece(ctx, id);
fputs(token_str.c_str(), stdout);
fflush(stdout);
} the output does match up. I think the complicated logic for handling interactive mode is making it skip the logits after evaluating the prompt. I'm not sure if this also applies to interactive input too.
Great, I'll try to work on cleaning this up a bit. About the TUI stuff - something that would definitely be really useful. I've been thinking about how there's so much duplicated backend code in these examples (and I'm making it worse with this pull too). There should be a way to consolidate most of the backend stuff so examples like I think the backend could actually be really simple: it could just take token ids as input and output token ids. It wouldn't even need to implement stuff like reverse prompts itself, the approach it uses could just require an ACK or add input as a response to the tokens it outputs. If there wasn't a need to interactively add input, the ACK could happen instantly and wouldn't really affect performance. If the front end detected the reverse prompt, it just wouldn't ACK and would read input to send instead. I think it could work with callbacks or even running as an external program talking to the frontend through a pipe or something like that. Not sure if that makes sense, I can go into more detail if not. |
As a side note, I personally often find interesting projects, with examples, where somebody "unified" too much. Examples end up having another API on top of the actual API, and that makes it really hard to actually use those examples, as examples. So I think the question is, are examples here, meant to be just examples, sort of a code template, like ( at least to me ) the word "example" suggests, Or, should they be what most people seem to use them for, sort of "official" front ends ? It might be, that's just my questionable English making me misinterpret this, but I sort of see there is a slight divergency between the "intended" use of examples, and actual practical way people use them. At least in my subjective perspective. Maybe, if What do you think ? |
just a few thoughts, but generally in support of this idea! i guess in the spirit of readability, this is less readable than simple.cpp because of the logging macros and gpt_params validations. if we want to minimally demo how to use API, i think it would make sense to build functionality on top of simple.cpp into small but structurally similar files like simple-grammar.cpp & simple-lora.cpp rather than stripping functionality from main. |
It's certainly 100% possible to go too far. (And very possible for me to be the one that goes too far also. :)
"Example" is a pretty flexible concept, I think. It could mean an example of the code, it could mean an example of usage, it could be an example of a specific feature.
I definitely wouldn't presume to say what they're for or what they're supposed to be. It seems to me like Maybe I'm wrong and a reasonable competent C++ person (not a category I can claim to be part of) can look at
That's absolutely true,
I'm definitely not aiming to replace Also note that this is still a draft, I will probably do more to clean up/simplify stuff. I just wanted to see what kind of reception the idea got before putting more work into it. |
If writing a simpler example is the goal, wouldn't it make more sense to write it in much more readable plain C instead of much less friendly C++ that significantly less people can parse ? |
If you take a general pulse of this project being referenced around the web, people seem to pretty much be using main and server as official frontends. I wonder how many people even realize that main.cpp is in the examples directory. My 2c: Main and server are already diverging (eg: speculative execution shipped in main but not in server). I would suggest that more of main's functionality get encapsulated into the llama lib such that main and server end up with just interface (TUI, http etc) specific code. As it's stands, if you want to use any language bindings, you have to replicate a lot of stuff that lives in main. Doing this effort would actually fulfill the spirit of this PR in a way that is more encompassing to other projects using this lib as a lib. |
I didn't really "write" it, I modified the existing This part isn't really an argument against what you said, but how much energy doing something takes definitely is something that has some effect.
I don't think I agree with this. I used to do a lot of C, and in more recent history I'm a Rust person. I really never even touched C++ until I started contributing here ~6ish months ago. It's true that C is simpler but it's also way more tedious to do stuff and a lot more exacting and easier to shoot yourself in the foot with. The C++ code in I had a really low opinion of C++ before I started using it a bit. That improved a bit and I'd say at this point I'd definitely rather use it than plain C. (But I still vastly prefer Rust to either C variant.)
I think we can call a demonstration of using something an "example". I don't think
You make some excellent points. Also, there's probably no reason an "example" can't be a reusable library. That's probably easier to make a case for (and possibly move stuff into the actual llama.cpp lib once it's used/tested more). I'll think about this some more instead of just going forward with the cleanups required to make it merge-worthy. I do like the idea of trying to make a more general library and cleaning up some duplication in the project (and making stuff easier for other projects to access also)... The only thing is my supply of time and energy is sadly very finite. |
Ah of course, I didn't understand you restarted from it. I'm definitely not requesting anyone to go into a full rewrite effort, I agree it can be important.
It's easy to shoot oneself in the foot with every language actually.
For me it's totally obfuscated with tons of colon-colon stuff, '&<>' whatever in function arguments and cryptic names coming out of nowhere that makes it extremely hard for me to contribute even trivial changes. I continue to find it the most inelegant and least expressive language ever invented (leaving brainfuck aside of course), which, a bit like perl, is mostly write-only :-( For example I tried to hack around the EOS issues and just couldn't understand anything there after half an hour of digging, so I had to resign. But anyway the goal was not to open a language debate here. I was asking since I thought you were restarting from scratch and saw this as a good opportunity to make the code more accessible and understandable given than most modern languages still share inheritance from plain old C. |
I feel like this is maybe more of a "main is super complex" problem than a C++ problem. Also (and no disrespect intended toward whoever wrote it originally) but the logic and names of stuff seems weird and confusing to me. I see a variable called
If you want to point out some stuff in this example that you find difficult I can certainly take that kind of thing into account. I'm not going to promise specifically making changes, but having an idea of the kind of thing that people might have a tough time with can certainly help with trying to avoid that. I assume something like |
It's one of these. The C++ syntax is totally horrible for me. I need to feel like I can pronounce the code I'm reading, and in this regard, C++ with its extreme abuse of operators makes things very complicated to me. I feel like it could simply be declared as a struct or array of structs or something like this. I'm not sure because I cannot conceptualize the memory representation of this thing. Same here for this small block, I don't understand what this does:
I'm suspecting the first like is a declaration (mixing code and declaration is another horror that stroustrup brought with C++) . The second one seems to check for non-emptiness of "something" according to "some method" defined somewhere, and if not empty, calls some parse function on top of a conversion of this thing via a c_str() method which returns another type I have no idea about. All of this would be super simple in plain C, no obfuscated types, functions or whatever. Just strings, pointers, functions. A simple "git grep" reveals everything, and for what's outside it's always in "man foo". Note that I'm saying this to illustrate what makes the reading needlessly complicated to me, I'm not querying comments on the code. |
It's all a matter of perspective, I guess. To me, a vector is just an array you malloc() but with automatic realloc() and free(). And I think |
That's definitely among the difficulties. For me, imagining that I have to fire a browser for every word I don't know is really a pain.
Hehe it's very possible indeed, just as I'm biased with C, but thanks regardless for explaining and for trying ;-) |
I guess this begs the question: what is llama.cpp? Is it
I would push use towards (2). I can justify putting our engineers (at osmos.io) to help here if the project heads in that direction as it also provides some assurance of ROI (examples after all can be dropped anytime). End of the day though @ggerganov is the captain of this ship and I would love to hear his take :) |
Why not to trust the description in the main README? :) https://github.com/ggerganov/llama.cpp#description If it can run models, be used to test GGML features and we're learning stuff that seems like it fits the project's goals. |
33782a7
to
d75698c
Compare
#elif defined (_WIN32) | ||
#define WIN32_LEAN_AND_MEAN | ||
#ifndef NOMINMAX | ||
#define NOMINMAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we use NOMINMAX everywhere instead of including system headers before library headers, and using pre-defining it in the build script.#undef
if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know anything about the Windows stuff, this part was just copied over from the main
example. If there's a better way, I'm certainly open to applying it but I don't have a way to test that it doesn't break anything.
llama_free(ctx); | ||
llama_free_model(model); | ||
|
||
if (grammar != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This null check is redundant - llama_grammar_free just calls delete
, which is explicitly documented to accept NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the caller can assume that though? Ideally if that function got changed to do more stuff in the future, it would check for NULL
though. I'm inclined to leave this as is (was also copied from main
) just because it doesn't hurt anything and is more future proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if I were in charge of the API I would guarantee that all *_free()
functions will accept NULL, just like free()
, delete
, and functions like GLib's g_free()
do, simply because this pattern of freeing an optional object at the end of a function is so common.
I imagine With time some of the examples can become part of the library and exposed to third-party projects through the API, but we don't want to bloat the library with too much stuff that can become obsolete at some point. For example, I don't see a clean way to integrate the The In any case, we can do a lot of things with this project because the field offers many possibilities and as long as people are interested and enjoying it, we will do them. We just need to keep things simple, avoid over-engineering and try to keep the door open for the new things |
That NOMINMAX thing started something like 10 years ago, because windows screwed with standard library, redefining min and max, breaking lots of things in a very very unintuitive way, and since it used to be that windows.h was randomly included in standard headers, back then that caused a lot of problems depending on include order. Somebody posted that solution somewhere on the internet, and it made its way into a lot of popular projects, and it became a boilerplate code to "fix windows compatibility", and lots of people, especially when porting native Linux projects to Windows, used that to the point it became muscle memory :) Though things certainly changed in the past 10 years, and if Cebtenzzre says so, I believe him :) |
Oh, I thought it had to do with MIN and MAX, not min and max. Those aren't defined in Windows 10 as far as I can tell, but they are on Windows 7, which I think we still support, despite being EOL as of 2020. So, we should use NOMINMAX, but I would prefer if we did this in the build script - I'm not a fan of boilerplate. |
Yes, the problem involved std::min/std::max, though that was really long time ago when I originally had such problem and read about that solution, so this might not be accurate currently. |
A clean approach to portability usually consists in moving all these ifdefs to their own file. For example, include "compat.h", and put all that OS-specific stuff there. It allows to keep the code clean and to define the macros that are already provided by some OS and not other ones, that you're willing to rely on. Usually such files end up full of ifndef/define/endif and that doesn't look ugly because it's expected there, and it keeps the rest pretty clean. |
@alitariq4589 Please don't post the same comment in a million pull requests. You cause everyone who's interacted with the pull to get a notification. It's not even clear what "verify this patch" means. Verify what? That it's safe? That it works? You can generally just read the discussion in the pull to get information about its current status. |
@KerfuffleV2 Sorry about that. I was setting up the CI. The comment is generated by the bot at the CI platform. I am deleting all of these comments |
Reopening since there was clearly interest in this PR at one point. @ggerganov does this look like something we would want to work towards merging? |
GG is pretty busy, but looking at the number of thumbs up, it appears there is some interest in a simplified main. FYI, you may want to rebase to be on top of the latest changes to this repo as a lot has changed since then. |
Sorry for letting this PR stale. A simpler |
@KerfuffleV2 I think your effort will have a real use case in decoupling testing from public usage as I've at least observed in #7534 . How much has diverged and how easy would it be for you to synchronizer this PR (or do you need to make a new PR)? The key philosophy of this simpler main (to be mentioned in the readme for this simpler main) would be to focus on enabling tighter test/development loop and to have sane defaults that is focused more on enabling debuggability and observability (e.g. special tokens enabled by default). (+automated tests?) |
Is there interest in something like this? Personally, I find the existing
main
example extremely difficult to understand. It's mainly because of the complicated logic involved in interactive mode.This pull is based on the
main
example but removes:It also restructures the code to be a bit more modular rather than just having everything in the
main
function.I know there's a
simple
example, but it's a bit too simple and maybe not have enough features for testing out some stuff. I started this before the sampling stuff got moved to common so there's less of a use case now I guess.Anyway, just throwing it out there. Feel free to close if it doesn't seem useful. It could use a bit more work (and I don't know about the current name) but I don't want to invest more time into it unless it actually seems like it could get merged.