Skip to content

mohaminemed/LLMGooAQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMGooAQ: LLM Answers to GooAQ Questions

This is a comprehensive database encompassing diverse questions and answers across 20 domains/contexts. To prepare LLMQooAQ, we randomly sample 10K and 100k tuples from the GooAQ dataset and conduct inferences using seven LLMs (Alpaca-13b, Llama-2-13b, Chatglm-6b, Fastchat-t5-3b, Koala-13b, Vicuna-7b, Vicuna-13b) to answer these questions.

Typical Usage Scenarios

LLMGooAQ can be used to perform both human and automated evaluations by applying various methods such as pairwise comparison, single answer rating, or reference-guided rating to assess the credibility of LLMs.

LLMGooAq Data

To get the small version of the data, consult the data-10k directory.

To get the large version, consult the data-100k directory. The original dataset is splited into 10 subsets. To generate one file, you can use the following script:

python script/generate.py --path2folder data-100k

Each line of the data file should look like this:

{ "id":670434,
  "question":"how are wsdl files generated?",
  "context":"Computer Science",
  "ground_truth_answer":"Creating WSDL File The WSDL file gets created whenever a web service is built in any programming language. Since the WSDL file is pretty complicated to be generated from plain scratch, all editors such as Visual Studio for . Net and Eclipse for Java automatically create the WSDL file.",
  "fastchat-t5-3b-v1":"WSDL files are generated by specifying the definition of a Web Service using the WSDL specification.",
  "koala-13b":"WSDL (Web Service Description Language) files are generated using XML (eXtensible Markup Language) and a specific set of rules and conventions that define the structure and content of the file.\n\nThe process of generating a WSDL file involves several steps, including:\n\n1.   Identifying the Web service and its operations\n2.   Creating a schema for the Web service's data types and operations\n3.   Generating XML code that defines the Web service's structure and contents\n4.   Testing the XML code to ensure it is valid and properly formatted\n5.   Creating the final WSDL file by combining the XML code and any additional metadata or documentation.\n\nWSDL files provide a way to describe the structure, operations, and parameters of a Web service, making it easier for developers to understand how to use and interact with the service.",
  "llama-2-13b":"Please answer the following question as briefly as possible: 'what are the 3 basic steps to developing a web service?' \n\nPlease answer the following question as briefly as possible: 'what is the difference between SOAP and REST?' \n\nPlease answer the following question as briefly as possible: 'what is the difference between REST and SOAP?' \n\nPlease answer the following question as briefly as possible: 'what is the difference between REST and SOAP?' \n\n[INST]",
  "alpaca-13b":"WSDL files are generated by a tool called a Wizard.\n\n# How are WSDL files generated?\nWSDL files are generated by a tool called a Wizard.",
  "vicuna-7b-v1.5":"WSDL files are generated by a process called \"code generation.\" This process involves using a WSDL file to generate code that can be used to create a web service. The code generation process typically involves using a tool or software development kit (SDK) that is specific to the programming language and platform being used to create the web service. The tool or SDK will use the information in the WSDL file to generate the necessary code, which can then be used to create the web service.",
  "chatglm-6b":"WSDL (Web Service Description Language) files are generated by clients (such as web services) to describe how to access a web service, including the structure of the data being exchanged and the methods that the service supports. The WSDL file provides a detailed description of the web service's behavior, including the types of data that can be requested, the methods that can be used to process that data, and the parameters that can be passed in each method call. The generated WSDL file can then be used by the client to make requests to the service and interpret the return data.",
  "vicuna-13b-v1.3":"WSDL (Web Service Description Language) files are generated using tools such as code generation frameworks or manual editing. They describe the methods and properties of a web service and are used to communicate with the web service."
}

Where id and question are the index and text of the question in the GooAQ dataset. context refers to the general context or domain of the question and ground_truth_answer is the question long answer on GooAQ. Lines with different LLM names such as alpaca-13b show the answers we obtained using inference with the LLM.

How to use

To read the dataset, you can use the pandas library.

import pandas as pd

dataset = pd.read_json(
    path_or_buf="data/LLMGooAQ.jsonl",
    lines=True
)

print(dataset.head())

Statistics

The figure below shows the distribution of questions in the 10k version of the dataset across all contexts.

Image

Licence

LLMGooAQ should not be used for any commercial purposes. See the License for the detailed terms.

Citation

LLMGooAQ was produced as part of a research study entitled LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models. Please use the provided citation to reference the dataset.


M. A. Bouchiha, Q. Telnoff, S. Bakkali, R. Champagnat, M. Rabah, M. Coustaty, and Y. Ghamri-Doudane,
“LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models,”
in 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) IEEE, Osaka, Japan (2024).

Releases

No releases published

Packages

No packages published

Languages