Add a simpler main example #3080

KerfuffleV2 · 2023-09-08T07:22:43Z

Is there interest in something like this? Personally, I find the existing main example extremely difficult to understand. It's mainly because of the complicated logic involved in interactive mode.

This pull is based on the main example but removes:

Interactive mode
Classifier Free Guidance
Infinite context

It also restructures the code to be a bit more modular rather than just having everything in the main function.

I know there's a simple example, but it's a bit too simple and maybe not have enough features for testing out some stuff. I started this before the sampling stuff got moved to common so there's less of a use case now I guess.

Anyway, just throwing it out there. Feel free to close if it doesn't seem useful. It could use a bit more work (and I don't know about the current name) but I don't want to invest more time into it unless it actually seems like it could get merged.

KerfuffleV2 · 2023-09-08T07:25:14Z

I also found something pretty weird in the main example. It seems like it skips the logits after evaluating the prompt and does an extra llama_eval:

llama.cpp/examples/simple-inference/simple-inference.cpp

Lines 359 to 369 in 33782a7

    
           // Required to match output from main example with a specific seed - but why? 
        
           if (false) { 
        
               llama_token id = llama_sample_token(ctx, NULL, grammar, params, last_tokens, candidates); 
        
               if (llama_eval(ctx, &id, 1, last_tokens.size(), params.n_threads)) { 
        
                   LOG_TEE("%s : failed to eval\n", __func__); 
        
                   return 1; 
        
               } 
        
               const std::string token_str = llama_token_to_piece(ctx, id); 
        
               fputs(token_str.c_str(), stdout); 
        
               fflush(stdout); 
        
           }

I'm not sure if this happens every time there's interactive input, but it definitely seems to happen when specifying a prompt. On larger models, doing an extra (apparently unneeded) eval can be pretty slow. Is there an actual reason it works like that?

ggerganov · 2023-09-08T11:13:39Z

The main idea of examples is to demonstrate different ways of using llama.cpp. I think as long as we have the man power to maintain the examples, we can add as many as we like to. So I don't see a problem with adding a simpler main now. I also don't like that main has become very complicated and probably it's time to refactor it or have a simpler alternative like the one in this PR.

The extra llama_eval is not needed and if main is doing it then it looks like a mistake.

Do you get the same results as main if you just do this:

 // Required to match output from main example with a specific seed - but why? 
 if (true) { 
     llama_token id = llama_sample_token(ctx, NULL, grammar, params, last_tokens, candidates); 
     const std::string token_str = llama_token_to_piece(ctx, id); 
     fputs(token_str.c_str(), stdout); 
     fflush(stdout); 
 }

I expect the answer is yes, but this still is not OK and we should look into main and see why it is doing it.

staviq · 2023-09-08T15:57:20Z

I'm working on TUI version of main, with couple of nice features , like tty separated from stdin/stdout, so it's possible to naively pipe raw text in and out, and still have TUI with keyboard input. I just got piping to work on windows yesterday ( turns out, "con" is equivalent to "/dev/tty" on windows, and it works with native cmd :) )

I'm planning on adding notebook-like interface and chat interface, which would allow to recreate current interactive mode from main too.

If this simple main PR gets accepted, I can try making the TUI backwards compatible with main arguments etc.

Basically, if you want to simplify main, I can move the discarded functionality to TUI example.

Though there's still quite a bit of work to do for TUI to be functional, I'm not sure how long it's gonna take.

DannyDaemonic · 2023-09-08T19:45:00Z

@staviq I've slowly been working on a more user friendly interface to main. It is difficult without using 3rd party libraries - moving around in the terminal in such a way that it works with most terminal emulators was very difficult to figure out - but I do have it to where you can freely edit the current input (moving around with arrow keys and home, end, etc.). It can also catch most command keys (ctrl/alt/F1-10) on Windows, Linux, and MacOS. I haven't added in what other keys do yet, but it's working in all environments and terminals I've been able to test it in.

I could make a PR in the next couple days to push those features over. They wouldn't break anything as is, I just felt like - since it's a larger patch - it would be easier to understand why I'm adding so much code when you can see the more advanced interface additions working.

My next step was to either add a new function or change readLine to return a more general struct/object that could contain the next line of text the user entered or the command they gave. This would allow for making Ctrl-Q and the vi-like Esc : q sequence to quit. (Or allow us to change the temp or even switch models with a :command.) At some point I would have had to start rewriting main, but for these changes, I wouldn't have to change anything about the overall structure of main.

I only loosely have things planned out further than that. I would have to edit main to print to the terminal by handing the tokens to a new function that print to the console so that I could track where the token boundaries are so you could edit and resume or generate new text at any point in your history instead of just your current input as is functional (but uncommitted) as of now.

I don't any of see these changes at odds with your plans, but I do think there may be some overlap. And there's also some room for multiple implementations of main, as we see here.

staviq · 2023-09-08T20:09:19Z

@DannyDaemonic You are ahead of me, because my code is still in separate poc chunks. I'd need couple more weeks probably, so don't worry about it.

Do your thing, and when I see your PR, if I find something useful in my version to offer, I'll post a comment.

You could probably do a draft PR, so others could add to it too, because I have a feeling more people than us tried to implement better main, or something to that extent.

Edit: I hope this simple main gets accepted, so there is always a solid, stable and "don't touch it" main, easier to maintain and update, and we could have slightly less restrictions on "advanced" main, somewhat like server is. But that's just my subjective and personal concept of it.

KerfuffleV2 · 2023-09-08T20:59:50Z

@ggerganov

I expect the answer is yes, but this still is not OK

The answer is actually no!

 // Required to match output from main example with a specific seed - but why? 
 if (true) { 
     llama_token id = llama_sample_token(ctx, NULL, grammar, params, last_tokens, candidates); 
     const std::string token_str = llama_token_to_piece(ctx, id); 
     fputs(token_str.c_str(), stdout); 
     fflush(stdout); 
 }

does not change the output at all. It seems to actually need an eval with the sampled token to sync up. (In that simple code it doesn't add the token to last_tokens so it would probably get desynched eventually due to sampling differences).

If if I just do:

    if (true) {
        llama_token id = 3557; // this is the id that was sampled
        if (llama_eval(ctx, &id, 1, last_tokens.size(), params.n_threads)) {
            LOG_TEE("%s : failed to eval\n", __func__);
            return 1;
        }
        const std::string token_str = llama_token_to_piece(ctx, id);
        fputs(token_str.c_str(), stdout);
        fflush(stdout);
    }

the output does match up.

I think the complicated logic for handling interactive mode is making it skip the logits after evaluating the prompt. I'm not sure if this also applies to interactive input too.

So I don't see a problem with adding a simpler main now.

Great, I'll try to work on cleaning this up a bit.

About the TUI stuff - something that would definitely be really useful. I've been thinking about how there's so much duplicated backend code in these examples (and I'm making it worse with this pull too). There should be a way to consolidate most of the backend stuff so examples like main, a TUI, server, etc only have to worry about frontend stuff.

I think the backend could actually be really simple: it could just take token ids as input and output token ids. It wouldn't even need to implement stuff like reverse prompts itself, the approach it uses could just require an ACK or add input as a response to the tokens it outputs. If there wasn't a need to interactively add input, the ACK could happen instantly and wouldn't really affect performance. If the front end detected the reverse prompt, it just wouldn't ACK and would read input to send instead. I think it could work with callbacks or even running as an external program talking to the frontend through a pipe or something like that.

Not sure if that makes sense, I can go into more detail if not.

staviq · 2023-09-08T22:27:37Z

duplicated backend code in these examples

As a side note, I personally often find interesting projects, with examples, where somebody "unified" too much. Examples end up having another API on top of the actual API, and that makes it really hard to actually use those examples, as examples.

So I think the question is, are examples here, meant to be just examples, sort of a code template, like ( at least to me ) the word "example" suggests,

Or, should they be what most people seem to use them for, sort of "official" front ends ?

It might be, that's just my questionable English making me misinterpret this, but I sort of see there is a slight divergency between the "intended" use of examples, and actual practical way people use them. At least in my subjective perspective.

Maybe, if main, server etc are meant to be more of a front-end, rather than an example, something as simple as moving them out of examples directory would suffice to clear this up ?

What do you think ?

kchro3 · 2023-09-08T22:43:46Z

just a few thoughts, but generally in support of this idea!

i guess in the spirit of readability, this is less readable than simple.cpp because of the logging macros and gpt_params validations. if we want to minimally demo how to use API, i think it would make sense to build functionality on top of simple.cpp into small but structurally similar files like simple-grammar.cpp & simple-lora.cpp rather than stripping functionality from main.

KerfuffleV2 · 2023-09-08T23:43:10Z

@staviq

As a side note, I personally often find interesting projects, with examples, where somebody "unified" too much.

It's certainly 100% possible to go too far. (And very possible for me to be the one that goes too far also. :)

So I think the question is, are examples here, meant to be just examples, sort of a code template, like ( at least to me ) the word "example" suggests,

"Example" is a pretty flexible concept, I think. It could mean an example of the code, it could mean an example of usage, it could be an example of a specific feature.

What do you think ?

I definitely wouldn't presume to say what they're for or what they're supposed to be. It seems to me like main and maybe server also are too complicated to really be used as a code template though. Also, most of their complexity doesn't really come from GGML or LLaMA (or a different model) but the details of doing their frontend stuff and supporting a wide variety of options.

Maybe I'm wrong and a reasonable competent C++ person (not a category I can claim to be part of) can look at main and understand what it's doing relatively easily?

@kchro3

i guess in the spirit of readability, this is less readable than simple.cpp because of the logging macros and gpt_params validations.

That's absolutely true, simple seems like a minimal example. This pull supports most of the features of main. I wouldn't replace the simple with this because a minimal example is very useful.

rather than stripping functionality from main.

I'm definitely not aiming to replace main here either. It's possible a more understandable main could start with this as the base that would be a separate project. What I'm looking to do is provide something that has most of the features of main while being relatively simple and readable.

Also note that this is still a draft, I will probably do more to clean up/simplify stuff. I just wanted to see what kind of reception the idea got before putting more work into it.

wtarreau · 2023-09-10T08:31:10Z

If writing a simpler example is the goal, wouldn't it make more sense to write it in much more readable plain C instead of much less friendly C++ that significantly less people can parse ?

kiratp · 2023-09-10T08:45:14Z

If you take a general pulse of this project being referenced around the web, people seem to pretty much be using main and server as official frontends. I wonder how many people even realize that main.cpp is in the examples directory.

My 2c: Main and server are already diverging (eg: speculative execution shipped in main but not in server). I would suggest that more of main's functionality get encapsulated into the llama lib such that main and server end up with just interface (TUI, http etc) specific code.

As it's stands, if you want to use any language bindings, you have to replicate a lot of stuff that lives in main.

Doing this effort would actually fulfill the spirit of this PR in a way that is more encompassing to other projects using this lib as a lib.

KerfuffleV2 · 2023-09-10T09:14:38Z

@wtarreau

If writing a simpler example is the goal, wouldn't it make more sense to write it

I didn't really "write" it, I modified the existing main example. There's definitely a big difference in the level of effort required to write something from scratch compared to refactoring existing code.

This part isn't really an argument against what you said, but how much energy doing something takes definitely is something that has some effect.

in much more readable plain C instead of much less friendly C++ that significantly less people can parse ?

I don't think I agree with this. I used to do a lot of C, and in more recent history I'm a Rust person. I really never even touched C++ until I started contributing here ~6ish months ago. It's true that C is simpler but it's also way more tedious to do stuff and a lot more exacting and easier to shoot yourself in the foot with.

The C++ code in llama.cpp (that I've seen at least) also basically just uses C++ to take advantage of convenience features. It's not object oriented. I don't think it's difficult at all for someone coming from C to understand and the small effort of learning a few new things is probably outweighed by the convenience of having access to stuff like hash maps and vectors and strings (that aren't just a list of characters with a terminator).

I had a really low opinion of C++ before I started using it a bit. That improved a bit and I'd say at this point I'd definitely rather use it than plain C. (But I still vastly prefer Rust to either C variant.)

@kiratp

I wonder how many people even realize that main.cpp is in the examples directory.

I think we can call a demonstration of using something an "example". I don't think main is much of an example from the code standpoint.

Doing this effort would actually fulfill the spirit of this PR in a way that is more encompassing to other projects using this lib as a lib.

You make some excellent points. Also, there's probably no reason an "example" can't be a reusable library. That's probably easier to make a case for (and possibly move stuff into the actual llama.cpp lib once it's used/tested more).

I'll think about this some more instead of just going forward with the cleanups required to make it merge-worthy. I do like the idea of trying to make a more general library and cleaning up some duplication in the project (and making stuff easier for other projects to access also)... The only thing is my supply of time and energy is sadly very finite.

wtarreau · 2023-09-10T09:34:04Z

If writing a simpler example is the goal, wouldn't it make more sense to write it

I didn't really "write" it, I modified the existing main example. There's definitely a big difference in the level of effort required to write something from scratch compared to refactoring existing code.

Ah of course, I didn't understand you restarted from it. I'm definitely not requesting anyone to go into a full rewrite effort, I agree it can be important.

in much more readable plain C instead of much less friendly C++ that significantly less people can parse ?

I don't think I agree with this. I used to do a lot of C, and in more recent history I'm a Rust person. I really never even touched C++ until I started contributing here ~6ish months ago. It's true that C is simpler but it's also way more tedious to do stuff and a lot more exacting and easier to shoot yourself in the foot with.

It's easy to shoot oneself in the foot with every language actually.

The C++ code in llama.cpp (that I've seen at least) also basically just uses C++ to take advantage of convenience features. It's not object oriented. I don't think it's difficult at all for someone coming from C to understand and the small effort of learning a few new things is probably outweighed by the convenience of having access to stuff like hash maps and vectors and strings (that aren't just a list of characters with a terminator).

For me it's totally obfuscated with tons of colon-colon stuff, '&<>' whatever in function arguments and cryptic names coming out of nowhere that makes it extremely hard for me to contribute even trivial changes. I continue to find it the most inelegant and least expressive language ever invented (leaving brainfuck aside of course), which, a bit like perl, is mostly write-only :-( For example I tried to hack around the EOS issues and just couldn't understand anything there after half an hour of digging, so I had to resign. But anyway the goal was not to open a language debate here. I was asking since I thought you were restarting from scratch and saw this as a good opportunity to make the code more accessible and understandable given than most modern languages still share inheritance from plain old C.

KerfuffleV2 · 2023-09-10T11:11:59Z

For example I tried to hack around the EOS issues and just couldn't understand anything there after half an hour of digging

I feel like this is maybe more of a "main is super complex" problem than a C++ problem. Also (and no disrespect intended toward whoever wrote it originally) but the logic and names of stuff seems weird and confusing to me. I see a variable called embd_inp and I think it's going to be something to do with embeddings but it... it's not.

For me it's totally obfuscated with tons of colon-colon stuff, '&<>' whatever in function arguments and cryptic names coming out of nowhere

If you want to point out some stuff in this example that you find difficult I can certainly take that kind of thing into account. I'm not going to promise specifically making changes, but having an idea of the kind of thing that people might have a tough time with can certainly help with trying to avoid that.

I assume something like std::vector<llama_token> input_tokens; isn't what you're talking about. Correct?

wtarreau · 2023-09-10T11:24:59Z

I assume something like std::vector<llama_token> input_tokens; isn't what you're talking about. Correct?

It's one of these. The C++ syntax is totally horrible for me. I need to feel like I can pronounce the code I'm reading, and in this regard, C++ with its extreme abuse of operators makes things very complicated to me. I feel like it could simply be declared as a struct or array of structs or something like this. I'm not sure because I cannot conceptualize the memory representation of this thing. Same here for this small block, I don't understand what this does:

    grammar_parser::parse_state parsed_grammar;
    if (!params.grammar.empty()) {
        parsed_grammar = grammar_parser::parse(params.grammar.c_str());

I'm suspecting the first like is a declaration (mixing code and declaration is another horror that stroustrup brought with C++) . The second one seems to check for non-emptiness of "something" according to "some method" defined somewhere, and if not empty, calls some parse function on top of a conversion of this thing via a c_str() method which returns another type I have no idea about. All of this would be super simple in plain C, no obfuscated types, functions or whatever. Just strings, pointers, functions. A simple "git grep" reveals everything, and for what's outside it's always in "man foo".

Note that I'm saying this to illustrate what makes the reading needlessly complicated to me, I'm not querying comments on the code.

cebtenzzre · 2023-09-10T15:36:36Z

The C++ syntax is totally horrible for me.

It's all a matter of perspective, I guess. To me, a vector is just an array you malloc() but with automatic realloc() and free(). And I think if (!params.grammar.empty()) is more intuitive than if (*params.grammar != 0). And it may not be in the man pages, but there's always cppreference. But I've been programming in C++ for a decade, so I'm probably biased.

wtarreau · 2023-09-10T16:59:15Z

And it may not be in the man pages, but there's always [cppreference]

That's definitely among the difficulties. For me, imagining that I have to fire a browser for every word I don't know is really a pain.

But I've been programming in C++ for a decade, so I'm probably biased.

Hehe it's very possible indeed, just as I'm biased with C, but thanks regardless for explaining and for trying ;-)

kiratp · 2023-09-10T19:22:49Z

@KerfuffleV2

I think we can call a demonstration of using something an "example". I don't think main is much of an example from the code standpoint.

I guess this begs the question: what is llama.cpp? Is it

A high performance library that outputs logits and that's it? I.e Tensorflow Serving but supports many platforms
A "batteries included" LLM inference library's that includes all the state of the art steps that come after the logits (all the sampling techniques etc.)

I would push use towards (2). I can justify putting our engineers (at osmos.io) to help here if the project heads in that direction as it also provides some assurance of ROI (examples after all can be dropped anytime).

End of the day though @ggerganov is the captain of this ship and I would love to hear his take :)

KerfuffleV2 · 2023-09-10T20:21:31Z

I guess this begs the question: what is llama.cpp?

Why not to trust the description in the main README? :)

https://github.com/ggerganov/llama.cpp#description

If it can run models, be used to test GGML features and we're learning stuff that seems like it fits the project's goals.

examples/simple-inference/simple-inference.cpp

cebtenzzre · 2023-09-13T18:08:13Z

examples/simple-inference/simple-inference.cpp

+#elif defined (_WIN32)
+#define WIN32_LEAN_AND_MEAN
+#ifndef NOMINMAX
+#define NOMINMAX


I don't understand why we use NOMINMAX everywhere instead of ~~including system headers before library headers, and using #undef if necessary.~~ pre-defining it in the build script.

I don't really know anything about the Windows stuff, this part was just copied over from the main example. If there's a better way, I'm certainly open to applying it but I don't have a way to test that it doesn't break anything.

examples/simple-inference/simple-inference.cpp

cebtenzzre · 2023-09-13T18:12:31Z

examples/simple-inference/simple-inference.cpp

+    llama_free(ctx);
+    llama_free_model(model);
+
+    if (grammar != NULL) {


This null check is redundant - llama_grammar_free just calls delete, which is explicitly documented to accept NULL.

I'm not sure the caller can assume that though? Ideally if that function got changed to do more stuff in the future, it would check for NULL though. I'm inclined to leave this as is (was also copied from main) just because it doesn't hurt anything and is more future proof.

Well, if I were in charge of the API I would guarantee that all *_free() functions will accept NULL, just like free(), delete, and functions like GLib's g_free() do, simply because this pattern of freeing an optional object at the end of a function is so common.

ggerganov · 2023-09-13T19:00:28Z

I guess this begs the question: what is llama.cpp?

I imagine llama.cpp mainly as an experimentation playground where we explore new strategies for LLM inference, we optimize things and we come up with fun ideas for new applications. Some of the core ggml developments eventually get synced in the ggml library and these can be used in other projects. I like that llama.cpp already finds application in third-party projects as an "inference engine" which is great and we should try to support this by keeping the llama.cpp API stable and efficient.

With time some of the examples can become part of the library and exposed to third-party projects through the API, but we don't want to bloat the library with too much stuff that can become obsolete at some point. For example, I don't see a clean way to integrate the main and speculative examples into the API and there is probably no point in doing so as they can be replicated in a few lines of code.

The server example is a great collaboration result that I'm amazed to see happening and I almost have no idea how the implementation is done and how it works, because I haven't been following it very closely. I guess it is not something one would want to use in production, but it is likely a very good starting point for building similar functionality.

In any case, we can do a lot of things with this project because the field offers many possibilities and as long as people are interested and enjoying it, we will do them. We just need to keep things simple, avoid over-engineering and try to keep the door open for the new things

staviq · 2023-09-13T19:57:51Z

I don't really know anything about the Windows stuff, this part was just copied over from the main example. If there's a better way, I'm certainly open to applying it but I don't have a way to test that it doesn't break anything.

That NOMINMAX thing started something like 10 years ago, because windows screwed with standard library, redefining min and max, breaking lots of things in a very very unintuitive way, and since it used to be that windows.h was randomly included in standard headers, back then that caused a lot of problems depending on include order. Somebody posted that solution somewhere on the internet, and it made its way into a lot of popular projects, and it became a boilerplate code to "fix windows compatibility", and lots of people, especially when porting native Linux projects to Windows, used that to the point it became muscle memory :)

Though things certainly changed in the past 10 years, and if Cebtenzzre says so, I believe him :)

cebtenzzre · 2023-09-13T21:11:45Z

windows screwed with standard library, redefining min and max

Oh, I thought it had to do with MIN and MAX, not min and max. Those aren't defined in Windows 10 as far as I can tell, but they are on Windows 7, which I think we still support, despite being EOL as of 2020. So, we should use NOMINMAX, but I would prefer if we did this in the build script - I'm not a fan of boilerplate.

staviq · 2023-09-13T23:08:04Z

windows screwed with standard library, redefining min and max

Oh, I thought it had to do with MIN and MAX, not min and max. Those aren't defined in Windows 10 as far as I can tell, but they are on Windows 7, which I think we still support, despite being EOL as of 2020. So, we should use NOMINMAX, but I would prefer if we did this in the build script - I'm not a fan of boilerplate.

Yes, the problem involved std::min/std::max, though that was really long time ago when I originally had such problem and read about that solution, so this might not be accurate currently.

wtarreau · 2023-09-14T02:50:51Z

A clean approach to portability usually consists in moving all these ifdefs to their own file. For example, include "compat.h", and put all that OS-specific stuff there. It allows to keep the code clean and to define the macros that are already provided by some OS and not other ones, that you're willing to rely on. Usually such files end up full of ifndef/define/endif and that doesn't look ugly because it's expected there, and it keeps the rest pretty clean.

KerfuffleV2 · 2023-09-15T13:48:21Z

@alitariq4589 Please don't post the same comment in a million pull requests. You cause everyone who's interacted with the pull to get a notification.

It's not even clear what "verify this patch" means. Verify what? That it's safe? That it works?

You can generally just read the discussion in the pull to get information about its current status.

alitariq4589 · 2023-09-15T13:51:26Z

@KerfuffleV2 Sorry about that. I was setting up the CI. The comment is generated by the bot at the CI platform. I am deleting all of these comments

cebtenzzre · 2023-12-02T12:40:41Z

Reopening since there was clearly interest in this PR at one point. @ggerganov does this look like something we would want to work towards merging?

mofosyne · 2024-05-25T14:08:36Z

GG is pretty busy, but looking at the number of thumbs up, it appears there is some interest in a simplified main.
Just checking if this is still a living ticket. Still interested in developing this further?

FYI, you may want to rebase to be on top of the latest changes to this repo as a lot has changed since then.

ggerganov · 2024-05-26T13:32:23Z

Sorry for letting this PR stale. A simpler main could be useful. Let's close this PR for now, and we might start a new one from scratch. Thanks

mofosyne · 2024-05-27T04:16:21Z

@KerfuffleV2 I think your effort will have a real use case in decoupling testing from public usage as I've at least observed in #7534 . How much has diverged and how easy would it be for you to synchronizer this PR (or do you need to make a new PR)?

The key philosophy of this simpler main (to be mentioned in the readme for this simpler main) would be to focus on enabling tighter test/development loop and to have sane defaults that is focused more on enabling debuggability and observability (e.g. special tokens enabled by default). (+automated tests?)

KerfuffleV2 added 2 commits September 13, 2023 05:56

Add a simpler main example

62c5c6f

Clean up interrupt handling, avoid globals

d75698c

KerfuffleV2 force-pushed the feat-simple-main-example branch from 33782a7 to d75698c Compare September 13, 2023 11:58

cebtenzzre reviewed Sep 13, 2023

View reviewed changes

Apply some code cleanup suggestions. Thanks!

c7e1427

KerfuffleV2 mentioned this pull request Nov 2, 2023

Implementation of a sequence repetition penalty sampler #2593

Draft

KerfuffleV2 closed this Nov 29, 2023

cebtenzzre reopened this Dec 2, 2023

mofosyne added examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 10, 2024

Merge branch 'master' into feat-simple-main-example

362f71e

mofosyne added the obsolete? Marker for potentially obsolete PR label May 25, 2024

ggerganov closed this May 26, 2024

mofosyne mentioned this pull request May 27, 2024

main: replace --no-special with --special (and also set control token output to stdout to off by default) #7534

Merged

Add a simpler main example #3080

Add a simpler main example #3080

Conversation

KerfuffleV2 commented Sep 8, 2023

KerfuffleV2 commented Sep 8, 2023

ggerganov commented Sep 8, 2023

staviq commented Sep 8, 2023

DannyDaemonic commented Sep 8, 2023 • edited Loading

staviq commented Sep 8, 2023 • edited Loading

KerfuffleV2 commented Sep 8, 2023

staviq commented Sep 8, 2023

kchro3 commented Sep 8, 2023

KerfuffleV2 commented Sep 8, 2023

wtarreau commented Sep 10, 2023

kiratp commented Sep 10, 2023

KerfuffleV2 commented Sep 10, 2023

wtarreau commented Sep 10, 2023

KerfuffleV2 commented Sep 10, 2023

wtarreau commented Sep 10, 2023

cebtenzzre commented Sep 10, 2023

wtarreau commented Sep 10, 2023

kiratp commented Sep 10, 2023

KerfuffleV2 commented Sep 10, 2023

cebtenzzre Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

KerfuffleV2 Sep 13, 2023

Choose a reason for hiding this comment

cebtenzzre Sep 13, 2023

Choose a reason for hiding this comment

KerfuffleV2 Sep 13, 2023

Choose a reason for hiding this comment

cebtenzzre Sep 13, 2023

Choose a reason for hiding this comment

ggerganov commented Sep 13, 2023 • edited Loading

staviq commented Sep 13, 2023

cebtenzzre commented Sep 13, 2023

staviq commented Sep 13, 2023

wtarreau commented Sep 14, 2023

KerfuffleV2 commented Sep 15, 2023

alitariq4589 commented Sep 15, 2023

cebtenzzre commented Dec 2, 2023

mofosyne commented May 25, 2024

ggerganov commented May 26, 2024

mofosyne commented May 27, 2024 • edited Loading

DannyDaemonic commented Sep 8, 2023 •

edited

Loading

staviq commented Sep 8, 2023 •

edited

Loading

cebtenzzre Sep 13, 2023 •

edited

Loading

ggerganov commented Sep 13, 2023 •

edited

Loading

mofosyne commented May 27, 2024 •

edited

Loading