bigcode-project / bigcode-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 219
Star 825

Code
Issues 49
Pull requests 28
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: bigcode-project/bigcode-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

49 Open 98 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Could you share a completed file of generations_mbppplus.json

#287 opened Nov 13, 2024 by marybloodyzz

mbpp pass@1 is lower than mbppplus pass@1 on starcoder2?

#285 opened Nov 6, 2024 by Liuzzyg

Continuing / Extending Previous Results from Generating and Evaluating?

#284 opened Nov 1, 2024 by RylanSchaeffer

Docker image for multiple evalulation broken

#283 opened Oct 27, 2024 by Extirpater

[REQUEST] Apply tokenizer chat template for HumanEvalPack

#282 opened Oct 25, 2024 by timrbula

[REQUEST] support model-parallel evaluation for big models

#278 opened Oct 18, 2024 by tarahjjeon

trust_remote_code is not been passed to dataset

#276 opened Sep 24, 2024 by login256

ValueError: Infilling not yet supported for:/Meta-Llama-3.1-8B

#275 opened Sep 23, 2024 by kbmlcoding

Evaluation result of bigcode/starcoder2-3b on gsm8k_pal does not matched the paper

#272 opened Sep 13, 2024 by nongfang55

Evaluating a Model with a Local Dataset in an Offline Environment

#271 opened Sep 12, 2024 by ankush13r

Any pypi package for this tool?

#270 opened Sep 10, 2024 by zhimin-z

Unable to execute the MultiPL-E task for python language

#268 opened Sep 4, 2024 by manthan0227

ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers'

#267 opened Sep 3, 2024 by xinghuang2050

What is fine-tuning in task submission?

#266 opened Aug 29, 2024 by zhimin-z

Error of testing codegeex4-all-9b model

#264 opened Aug 21, 2024 by Gumingbro

Humaneval and MBPP results of deepseek-6.7b-coder-instruct are lower than offical report of Deepseek team

#262 opened Aug 9, 2024 by jessyford

HumanEval evaluation results mismatch for Codegemma-2b

#261 opened Jul 26, 2024 by berserank

[Possibly system specific] Wild (12% vs 20%) run-to-run swings in multiple-cpp reported scores

#258 opened Jul 18, 2024 by alat-rights

Need some context for certain args for Instruct Human Eval

#256 opened Jul 18, 2024 by teknium1

The evaluation results are inconsistent across different GPUs

#252 opened Jul 8, 2024 by DonteFlynn

Using the humanevalpack to test the ChatGLM3 model results in an abnormal score.

#251 opened Jul 5, 2024 by burger-pb

MBPP Llama3-8B-Instruct lower pass@1 score expected

#246 opened Jun 16, 2024 by YangZhou08

Using custom prompts and postprocessing

#245 opened Jun 14, 2024 by anil-gurbuz

API-based evaluation support (humanevalpack_openai.py is too old)

#234 opened May 10, 2024 by s-natsubori

If I want to add my own designed prompts before each question, how should I modify the code

#230 opened Apr 27, 2024 by ALLISWELL8

Previous 1 2 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly