LTLBench

This is the source code for the paper LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models.

It consists of both the code of the Dataset Construction Pipeline and the code of the Experiments and Analyses.

Prerequisites

Download this repository;
Install the required packages by running pip install -r requirements.txt;
(Optional) Create a .env file in the root directory and set a OPENAI_KEY variable with your OpenAI API key in it, if you want to evaluate the OpenAI models;
(Optional) Download Ollama if you want to evaluate the models available on Ollama.

Run the Pipeline to Generate the Dataset

Run the following command to generate LTLBench dataset:

python -m src.main generate -c 2000 -e 3 -l 3 -s 1

For the command above, -c is the number of formulas to generate, -e is the number of events, -l is the number of operators, and -s is the random seed.

Run the following command to batch-generate additional datasets:

python -m src.main batch-generate -c 300 -e 2 -l 1,2,3,4,5,7,9 -s 1

or

python -m src.main batch-generate -c 300 -e 2,3,4,5,7,9 -l 2 -s 1

For the commands above, for both -e and -l, you can pass a comma-separated list of values for the number of events and operators.

Run the Experiments

Note: Before running the experiments, make sure to set your OpenAI API key in the .env file and also to download Ollama and pull the models you want.

Run the following command to evaluate a model on the LTLBench dataset:

python -m src.main evaluate -c 2000 -e 3 -l 3 -m gpt-3.5-turbo

For the command above, you can change gpt-3.5-turbo to any other available model while you should also have a look at the code in src/models/choose.py to see the available models. If you want to evaluate models not available in the code, you can slightly modify the code to include them which should not be troublesome.

Run the following command to batch-evaluate a model on the LTLBench dataset:

python -m src.main batch-evaluate -c 300 -e 2 -l 1,2,3,4,5,7,9 -m gpt-3.5-turbo

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
results		results
src		src
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LTLBench

Prerequisites

Run the Pipeline to Generate the Dataset

Run the Experiments

About

Releases

Packages

Languages

RutaTang/LTLBench

Folders and files

Latest commit

History

Repository files navigation

LTLBench

Prerequisites

Run the Pipeline to Generate the Dataset

Run the Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages