-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cohere API as available language model #395
base: master
Are you sure you want to change the base?
Conversation
Could consider using standard gitignore file for python in full rewrite, e.g. https://github.com/github/gitignore/blob/main/Python.gitignore
Not yet functional or tested.
Also ensures empty context tokens work
Note that the API does not accept empty context, thus adding a newline to get generations with empty context, ideally this doesn't affect distribution much
Thank you for this contribution! We collaborate with Cohere and can look at getting some credits for testing.
This is weird, I rather thought they did. I’m going to investigate this a bit.
I think it’s probably a better idea to raise an exception and inform the user that the CohereAPI doesn’t permit this. I’m worried that using a new-line token as a substitute for an empty context can introduce hidden gotchas into other people’s experiments.
I think this is likely the most robust approach, especially if they start using different tokenizers for different models the way OpenAI does.
I’m not sure, I’ll find out. @jon-tow @haileyschoelkopf do you happen to know? |
Great, thanks for the quick response @StellaAthena!
About this: I raised this in the cohere discord (here) and the dual-call (with one greedy) was the method suggested there.
Sounds good. I will need to make some hanges to fully remove the original tokenizer used in the GPT-3 model.
Makes sense, I will change this. Maybe I should leave the adding-a-newline hack as an option when instantiating the model class, but disable it by default? I can't really judge how often this edge case would be relevant - just saw it in the tests for other models. |
I'm not entirely sure, but I'd guess that if you're worried about timeouts you'd want to make completion calls on shorter prompts first to avoid any network issues that could result from slow sampling on large sequences. This way you also get to cache results immediately. |
Just catching up with this discussion--re: the empty context issue: I'm pretty sure that in other models the solution to empty input context is to pass in only the EOT token instead of an empty string. The times when you want to input an empty string as context is for perplexity calculation right? Sorry if I've missed something, will proceed to read through your code more fully, this was just my first instinct here. |
@haileyschoelkopf good point about the EOT token being passed, I hadn't noticed this code before! However, for the cohere API, we give the prompt as a string and not as tokens. So we don't use the encoded context but rather the original string. That's why the code you mentioned doesn't actually change much in the cohere class. I am not sure if it could be applied. But it's good that you're mentioning the use-case for word-perplexity. Maybe then the EDIT: I have now fixed some things (in the commits below):
EDIT 2:
|
This replaces old transformer tokenizer with API calls. Also removes tokenisation were not strictly necessary.
Removes one of two newlines at end of file
That would be brilliant, thanks @LauraRuis! What tests would be good/necessary/recommended to run @haileyschoelkopf? Not familiar enough to make a judgement on this myself. |
Thanks @LauraRuis for the help! How would perplexity task: loglikelihood / mcqa: greedy gen: Sound? If this is too many to run / too many credits required then not a problem! Just wanted to try to hit at least one task of each type. |
I'm getting a notimplementederror from line 198 in base.py
I called:
Am I calling the wrong thing perhaps @rdnfn? |
unlike the gpt3 model we need a custom loglikelihood_rolling implementation because we don't know the eot token, thus we use a separate prefix token. Preferred setting separate prefix token to avoid confusion. This also adds the capability to _loglikelihood_tokens to use tokens instead of str requests
Thanks for catching this error @LauraRuis, and sorry for the delay! I missed a detail about the Note that this fix now relies again on the newline hack: a newline string ("\n") is used as the context (only) when computing the loglikelihood of the first token in a rolling_loglikelihood computation. I can't use the end-of-text token because I could not figure out what Cohere's end-of-text token is in either string of numerical form. |
Thanks @rdnfn ! However, now I'm getting an internal server error:
I think it has something to do with the fact that the input exceeds the maximum length and needs to be truncated. If I only pass the
It does work. |
@LauraRuis thanks for running the test again! That's a quite confusing error! The truncation should be set by the Is there any chance that the eval harness overwrites my default to be something else (I set it with a |
No it's set to "START" as you say. Here's the first context + continuation that immediately fails:
|
@rdnfn In case you’re unaware, you can pass arbitrary names arguments to the model constructor using I agree with you that it seems like passing |
The command I'm running is:
If I print out the command and try it independently it fails after a really long time with a code 500 (internal server error):
You can reproduce this error more quickly by setting To me it seems like truncating is the issue. E.g. if I run a prompt that has 1 token too many without the
The response is:
And if I run with the truncate set to "START" which should handle this, I get the internal server error again:
If I take this out of the code and just run independently, the same thing happens. Changing to truncate="END" does not help. Once the text doesn't need to be truncated anymore, it does work again. |
Thanks @StellaAthena and @LauraRuis for looking into this further! I was able to reproduce the error with an example similar to Laura's. This seems to be a bug inside Cohere's API and I reported it on Cohere's discord (see here). Let's see if they might come up with a quick fix for this issue. I would first wait for their response. But if there is no quick fix from their side, we could change |
Okay, looks like we're in a holding pattern until Cohere fixes their API then. Thanks for looking into this. |
Good news: cohere have now confirmed that the bug should be fixed. Some prelim tests on my side confirmed this. @LauraRuis could you try to run the tests again? |
@LauraRuis are we good to merge? |
Sorry, running tests on the tasks Hailey suggested now. They are incredibly slow so will come back here once done. (Incredibly slow meaning each many API requests take >>a minute) |
Yikes. Is this an inefficiency on our end or a limitation of the API? |
API's side. Also, the code now works for the first two calls in Wikitext, but he same error is back for the third call. We'll have to go back to Cohere and see what's going on. Error:
|
Following up on this PR! Did we ever hear back from Cohere? If not, I can test out this PR again myself since Cohere provides trial API access. |
Thanks for following up @haileyschoelkopf! We are currently still waiting to hear back from Cohere on their discord, just sent a follow-up reminder myself over there (you can find the bug report conversation here, requires access to "cohere community" discord). EDIT: Good news! The issues on cohere's side have now been resolved and with two small fixes this all appears to work now on wikitext. Will run some more tests soon (beyond wikitext) to confirm. |
also prevent tokenisation of strings that too long
@rdnfn Following up, how did the tests go? |
Hi @StellaAthena thanks for following-up! I was unable to run further tests at the time for unrelated reasons, and unfortunately no longer have time/capacity to further debug and test this implementation. Happy for somebody else to pick this PR up if they are keen on the Cohere integration, otherwise we can close the PR if there is no further interest. |
I'm pretty keen on it and can see about handing off testing to someone else. Thanks! |
This PR adds the Cohere API as an available language model in
lm_eval/models/cohere_lm.py
, addressing https://github.com/EleutherAI/lm-eval2/issues/12. It also includes minimal test for the new model intests/test_models.py
.Key differences between Cohere (v4.1.0) and OpenAI (v0.27.1) Python APIs:
co.generate( )
method, but can return tokens and their loglikelihoods.Open questions:
gpt3.py
LM class do loglikelihood in descending order of overall tokenlength, but the reverse seems to be true for greedy generation? (the _collate(x) function returns a negative token legnth in the first case, and positive in the latter)Other notes:
Thanks for taking a look at this!