`🛠️ DI-Bench`: Benchmarking Large Language Models on Dependency Inference with Testable Repositories

🚀 Quick Start

Ensure that Docker engine is installed and running on your machine.

Important

Our testing infrastructure requires ⚙️sysbox (a Docker runtime) to be installed on your system to ensure isolation and security.

# Suggested Python version: 3.10
pip install ".[eval,llm,pattern]"

# Used for authentication in the local CI runner to enable downloading actions from GitHub, requiring 0 permission
export GITHUB_TOKEN=<your_github_token>

⬇️ Download DI-Bench Dataset

Dataset release page

After downloading the dataset, extract the *.tar.gz into the data directory: .cache/repo-data/{language}. Replace {language} with python, rust, csharp, or javascript.

mkdir -p .cache/repo-data
tar -xvzf .cache/dibench-regular-python.tar.gz -C .cache/repo-data
# ...

Each repository instance's data can be found in .cache/repo-data/{language}/{instance_id}.

😎 Evaluation

Evaluate the correctness of inferred dependencies by checking if the project's tests pass.

dibench.eval \
    --result_dir [results_dir] \ # the root of generated results, e.g. tests/data/example-results
    --repo_instances_dir [repo_instances_dir] \ # extracted repo data path
    --dataset_name_or_path [regular_dataset_path/large_dataset_path] # *.jsonl

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
dibench		dibench
docs		docs
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
eval.sh		eval.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`🛠️ DI-Bench`: Benchmarking Large Language Models on Dependency Inference with Testable Repositories

🚀 Quick Start

⬇️ Download DI-Bench Dataset

😎 Evaluation

📃 Documentations

About

Releases

Packages

Contributors 3

Languages

License

microsoft/DI-Bench

Folders and files

Latest commit

History

Repository files navigation

🛠️ DI-Bench: Benchmarking Large Language Models on Dependency Inference with Testable Repositories

🚀 Quick Start

⬇️ Download DI-Bench Dataset

😎 Evaluation

📃 Documentations

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`🛠️ DI-Bench`: Benchmarking Large Language Models on Dependency Inference with Testable Repositories

Packages