Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

[NeuralChat] RAG evaluation #1333

Open
wants to merge 158 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
f820019
add retrieval dataset construction codes
Liangyx2 Mar 1, 2024
06f8162
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 1, 2024
5ef0332
Update llm_generate_raw_data.py
Liangyx2 Mar 1, 2024
ee1db83
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
89597f2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 1, 2024
b132d66
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
8e955ce
update
Liangyx2 Mar 1, 2024
635b906
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 1, 2024
d7d3d03
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
c9fec02
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
5e32113
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
f67622c
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
f2e344a
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 Mar 1, 2024
383e5b3
Update prompt.py
Liangyx2 Mar 4, 2024
81014d1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 4, 2024
4b7bec7
Update llm_generate_raw_data.py
Liangyx2 Mar 4, 2024
0df51a6
Update llm_generate_raw_data.py
Liangyx2 Mar 4, 2024
95b16bd
Update retrieval_dataset_construction.py
Liangyx2 Mar 4, 2024
80dd21b
Update llm_generate_raw_data.py
Liangyx2 Mar 4, 2024
f495b22
Update mine_hard_negatives_check_similarity.py
Liangyx2 Mar 4, 2024
593dee3
add test_evaluation.py to nightly test
Liangyx2 Mar 4, 2024
cf59b18
Update and rename requirements.txt to requirements_cpu.txt
Liangyx2 Mar 4, 2024
40e0b0e
Create requirements_cuda.txt
Liangyx2 Mar 4, 2024
bf1b1aa
Update requirements.txt
Liangyx2 Mar 4, 2024
5552ebc
Update retrieval_dataset_construction.py
Liangyx2 Mar 4, 2024
d3b7579
Update llm_generate_raw_data.py
Liangyx2 Mar 4, 2024
f500b2b
Update retrieval_dataset_construction.py
Liangyx2 Mar 4, 2024
b65c4bf
Update llm_generate_raw_data.py
Liangyx2 Mar 4, 2024
c43ab73
Update test_evaluation.py
Liangyx2 Mar 4, 2024
feda3c0
Update retrieval_dataset_construction.py
Liangyx2 Mar 4, 2024
1c2c22c
Update mine_hard_negatives_check_similarity.py
Liangyx2 Mar 4, 2024
55a5cda
add README.md
Liangyx2 Mar 6, 2024
7a74f86
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
39754d0
Update README.md
Liangyx2 Mar 7, 2024
d7e95f0
add evaluate_retrieval.py
Liangyx2 Mar 8, 2024
186ab43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 8, 2024
1496219
Update test_evaluation.py
Liangyx2 Mar 11, 2024
03a768e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 11, 2024
128d587
Update test_evaluation.py
Liangyx2 Mar 11, 2024
25177bd
Merge branch 'main' into yuxiang/evaluation
XuehaoSun Mar 11, 2024
705752a
add README.md
Liangyx2 Mar 11, 2024
675fe2e
Update prompt.py
Liangyx2 Mar 12, 2024
988e542
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 12, 2024
d0c3c34
add llm_generate_truth.py and data
Liangyx2 Mar 12, 2024
be1106b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 12, 2024
48788d4
add ragas_evaluation.py
Liangyx2 Mar 12, 2024
54cc6c0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 12, 2024
e1b5585
Create requirements.txt
Liangyx2 Mar 12, 2024
88a4293
Update llm_generate_truth.py
Liangyx2 Mar 12, 2024
83060f9
Update evaluate_retrieval.py
Liangyx2 Mar 12, 2024
76b1175
Update ragas_evaluation.py
Liangyx2 Mar 12, 2024
b775095
Update test_evaluation.py
Liangyx2 Mar 12, 2024
edbb32c
Update llm_generate_truth.py
Liangyx2 Mar 12, 2024
8962abf
Update README.md
Liangyx2 Mar 14, 2024
2ef4e05
Update README.md
Liangyx2 Mar 14, 2024
d2ab7d8
add README.md
Liangyx2 Mar 14, 2024
bcdf209
Update README.md
Liangyx2 Mar 14, 2024
102649b
Update README.md
Liangyx2 Mar 14, 2024
36a28a4
Update README.md
Liangyx2 Mar 14, 2024
548fdd9
Add files via upload
Liangyx2 Mar 15, 2024
36448ea
Delete intel_extension_for_transformers/neural_chat/tests/ci/tools/te…
Liangyx2 Mar 15, 2024
26e3e9d
Update requirements.txt
Liangyx2 Mar 15, 2024
e4793d3
Update README.md
Liangyx2 Mar 15, 2024
0569b54
Update hn_mine.py
Liangyx2 Mar 15, 2024
2d15ec0
Update README.md
Liangyx2 Mar 15, 2024
e8127e9
Update ragas_evaluation.py
Liangyx2 Mar 18, 2024
321e9b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 18, 2024
f9b4dab
Update requirements.txt
Liangyx2 Mar 18, 2024
76dc219
Update README.md
Liangyx2 Mar 18, 2024
b9db553
Update README.md
Liangyx2 Mar 18, 2024
d7b68cb
Update README.md
Liangyx2 Mar 18, 2024
48de606
Update requirements.txt
Liangyx2 Mar 18, 2024
415ebc8
Update ragas_evaluation.py
Liangyx2 Mar 18, 2024
f03badd
Update test_evaluation.py
Liangyx2 Mar 18, 2024
2b92e74
Update README.md
Liangyx2 Mar 18, 2024
9091729
Update retrieval_dataset_construction.py
Liangyx2 Mar 18, 2024
be32736
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 18, 2024
2c4f452
Update hn_mine.py
Liangyx2 Mar 18, 2024
c48f66a
Update llm_generate_raw_data.py
Liangyx2 Mar 18, 2024
654c44a
Update mine_hard_negatives_check_similarity.py
Liangyx2 Mar 18, 2024
5208c98
Update hn_mine.py
Liangyx2 Mar 18, 2024
ace1090
Update test_evaluation.py
Liangyx2 Mar 18, 2024
83f10e9
Update ragas_evaluation.py
Liangyx2 Mar 18, 2024
ac0aef1
Update README.md
Liangyx2 Mar 18, 2024
8deaabd
Update README.md
Liangyx2 Mar 19, 2024
2eb084c
Update README.md
Liangyx2 Mar 19, 2024
510e801
Update README.md
Liangyx2 Mar 19, 2024
dd1f37c
Update README.md
Liangyx2 Mar 19, 2024
ed95d2d
Update prompt.py
Liangyx2 Mar 19, 2024
e253f41
Update ragas_evaluation.py
Liangyx2 Mar 19, 2024
fc0b6b9
add evaluate_retrieval_auto.py
Liangyx2 Mar 20, 2024
6f081b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 20, 2024
746adec
Update evaluate_retrieval_auto.py
Liangyx2 Mar 21, 2024
100322e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 21, 2024
5e07789
Update evaluate_retrieval.py
Liangyx2 Mar 21, 2024
0a2f742
Update ragas_evaluation.py
Liangyx2 Mar 21, 2024
1752684
Update test_evaluation.py
Liangyx2 Mar 21, 2024
2a2238e
Update ragas_evaluation.py
Liangyx2 Mar 22, 2024
e8f0f9c
Update README.md
Liangyx2 Mar 22, 2024
8d65078
Update and rename evaluate_retrieval_auto.py to evaluate_retrieval_be…
Liangyx2 Mar 22, 2024
a951a89
Update evaluate_retrieval_benchmark.py
Liangyx2 Mar 25, 2024
13921f6
add retrieval_benchmark.py
Liangyx2 Mar 25, 2024
02c0813
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 25, 2024
d212d66
Update retrieval_benchmark.py
Liangyx2 Mar 25, 2024
20529a4
add ragas_benchmark ragas_evaluation_benchmark
Liangyx2 Mar 26, 2024
5026421
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 26, 2024
cfa7d9c
Update retrieval_benchmark.py
Liangyx2 Mar 26, 2024
8d1215e
Update evaluate_retrieval_benchmark.py
Liangyx2 Mar 26, 2024
3458a8e
Update retrieval_benchmark.py
Liangyx2 Mar 26, 2024
4effd37
Update ragas_evaluation_benchmark.py
Liangyx2 Mar 26, 2024
3c38ae6
Update ragas_benchmark.py
Liangyx2 Mar 26, 2024
b02da07
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 26, 2024
a2a7de1
Update ragas_evaluation_benchmark.py
Liangyx2 Mar 26, 2024
4191f4b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 26, 2024
35b2d7d
Update evaluate_retrieval_benchmark.py
Liangyx2 Mar 27, 2024
56037b9
Update ragas_evaluation_benchmark.py
Liangyx2 Mar 27, 2024
de44f0d
add retrieval_benchmark.sh
Liangyx2 Mar 27, 2024
67456e4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2024
2a91336
add ragas_benchmark.sh
Liangyx2 Mar 27, 2024
8f05a34
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2024
c64ca3c
add data.txt
Liangyx2 Mar 27, 2024
fbef1f6
Update ragas_benchmark.sh
Liangyx2 Mar 27, 2024
f50aeb4
Update ragas_evaluation_benchmark.py
Liangyx2 Mar 28, 2024
84aea7c
Update ragas_benchmark.sh
Liangyx2 Mar 28, 2024
ad1814a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2024
932562d
Update and rename ragas_benchmark.py to ragas_superbenchmark.py
Liangyx2 Mar 28, 2024
50d8c83
Update evaluate_retrieval_benchmark.py
Liangyx2 Mar 28, 2024
a4ea5dd
Update retrieval_benchmark.sh
Liangyx2 Mar 28, 2024
6e29d43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2024
702f9a9
Update and rename retrieval_benchmark.py to retrieval_superbenchmark.py
Liangyx2 Mar 28, 2024
0452526
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2024
008a892
add README.md
Liangyx2 Mar 28, 2024
5303837
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2024
8957b18
Update README.md
Liangyx2 Mar 28, 2024
96f477c
Update README.md
Liangyx2 Mar 29, 2024
c99856d
Update README.md
Liangyx2 Mar 29, 2024
19dfb93
Update README.md
Liangyx2 Apr 1, 2024
99940f3
Update README.md
Liangyx2 Apr 1, 2024
464d52b
Update README.md
Liangyx2 Apr 1, 2024
da2e829
Update README.md
Liangyx2 Apr 1, 2024
3ce2cb2
Update README.md
Liangyx2 Apr 1, 2024
268d89c
Update README.md
Liangyx2 Apr 1, 2024
40fc2e9
Update README.md
Liangyx2 Apr 1, 2024
13bb3b8
Update README.md
Liangyx2 Apr 1, 2024
763bd1d
add config file form rag evaluation
xmx-521 Apr 10, 2024
092e951
complete config superbenchmark
xmx-521 Apr 15, 2024
e931143
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2024
f0a0cd6
Merge branch 'main' into yuxiang/evaluation
XuhuiRen May 8, 2024
895075b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2024
6b60154
Create test_evaluation.py in CI
Liangyx2 May 10, 2024
c73a68f
Update requirements.txt
Liangyx2 May 11, 2024
c6f8906
Merge branch 'main' into yuxiang/evaluation
Liangyx2 May 11, 2024
7c80ce2
Merge branch 'main' into yuxiang/evaluation
VincyZhang May 13, 2024
576ce57
Merge branch 'main' into yuxiang/evaluation
Liangyx2 May 14, 2024
2a3ddd9
Merge branch 'main' into yuxiang/evaluation
Liangyx2 May 15, 2024
b4c0e67
Update ragas_evaluation_benchmark.py
Liangyx2 Jun 3, 2024
e75bbe4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 3, 2024
a0853a8
Merge branch 'main' into yuxiang/evaluation
Liangyx2 Jun 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions intel_extension_for_transformers/neural_chat/prompts/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,3 +321,39 @@ def generate_sqlcoder_prompt(qurey, metadata_file):
qurey=qurey, table_metadata_string=table_metadata_string
)
return prompt

QUERYGENERATE_PROMPT = """
Task: You are asked to act as a human annotator. Your role is to generate 2 specific, open-ended questions based on the provided context.
Each question should aim to extract or clarify key information from the context, focusing on a single aspect or detail.
The questions must be directly related to the context to form a query-positive pair, suitable for use in constructing a retrieval dataset.
---
Requirements:
1. Questions should be based on the keywords, such as phrases at the beginning, phrases before colon, and recurring phrases in the context.
2. Use the terms in the context instead of pronouns.
---
Desired format:
1. <question_1>
2. <question_2>
---
### Context:
{context}
---
Generated questions:
"""

TRUTHGENERATE_PROMPT = """
Task: You are asked to act as a human annotator. Your role is to generate the right answer based on the context and question provided.
Answers should aim to extract or clarify the key information of the question from the context, focusing on a single aspect or detail.
The answer must be directly related to the context and the question, suitable for use in constructing a synthetic retrieval evaluation dataset.
---
Desired format:
1. <ground_truth>
---
### Question:
{question}
---
### Context:
{context}
---
Generated ground_truth:
"""
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# !/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import unittest, os, shutil
from unittest.mock import patch
from intel_extension_for_transformers.neural_chat.tools.evaluation.data_augmentation import retrieval_dataset_construction, llm_generate_truth
from intel_extension_for_transformers.neural_chat.tools.evaluation.retriever import evaluate_retrieval
from intel_extension_for_transformers.neural_chat.tools.evaluation.framework import ragas_evaluation

class TestEvaluation(unittest.TestCase):
def setUp(self) -> None:
if os.path.exists("data"):
shutil.rmtree("data", ignore_errors=True)
if os.path.exists("ground_truth.jsonl"):
os.remove("ground_truth.jsonl")
if os.path.exists("output"):
shutil.rmtree("output", ignore_errors=True)
return super().setUp()

def tearDown(self) -> None:
if os.path.exists("data"):
shutil.rmtree("data", ignore_errors=True)
if os.path.exists("ground_truth.jsonl"):
os.remove("ground_truth.jsonl")
if os.path.exists("output"):
shutil.rmtree("output", ignore_errors=True)
return super().tearDown()

def test_retrieval_dataset_construction(self):
path = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/assets/docs/retrieve_multi_doc/"
if os.path.exists(path):
input_path=path
else:
input_path='../assets/docs/retrieve_multi_doc/'
argv = ['--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \
'--embedding_model', '/tf_dataset2/inc-ut/gte-base', \
'--input', input_path, \
'--output', './data', \
'--range_for_sampling', '2-2', \
'--negative_number', '1']
with patch('sys.argv', ['python retrieval_dataset_construction.py'] + argv):
retrieval_dataset_construction.main()
self.assertTrue(os.path.exists("./data/minedHN_split.jsonl"))

def test_llm_generate_truth(self):
path = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl"
if os.path.exists(path):
input_path=path
else:
input_path='../tools/evaluation/data_augmentation/example.jsonl'
argv = ['--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \
'--input', input_path, \
'--output', 'ground_truth.jsonl']
with patch('sys.argv', ['python llm_generate_truth.py'] + argv):
llm_generate_truth.main()
self.assertTrue(os.path.exists("ground_truth.jsonl"))

def test_evaluate_retrieval(self):
path1 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/candidate_context.jsonl"
if os.path.exists(path1):
index_file_jsonl_path=path1
else:
index_file_jsonl_path='../tools/evaluation/data_augmentation/candidate_context.jsonl'
path2 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl"
if os.path.exists(path2):
query_file_jsonl_path=path2
else:
query_file_jsonl_path='../tools/evaluation/data_augmentation/example.jsonl'
argv = ['--index_file_jsonl_path', index_file_jsonl_path, \
'--query_file_jsonl_path', query_file_jsonl_path, \
'--embedding_model', '/tf_dataset2/inc-ut/gte-base']
with patch('sys.argv', ['python evaluate_retrieval.py'] + argv):
result = evaluate_retrieval.main()
self.assertIsNotNone(result)

def test_ragas_evaluation(self):
path1 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/answer.jsonl"
if os.path.exists(path1):
answer_file_path=path1
else:
answer_file_path='../tools/evaluation/data_augmentation/answer.jsonl'
path2 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/ground_truth.jsonl"
if os.path.exists(path2):
ground_truth_file_path=path2
else:
ground_truth_file_path='../tools/evaluation/data_augmentation/ground_truth.jsonl'
argv = ['--answer_file', answer_file_path, \
'--ground_truth_file', ground_truth_file_path, \
'--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \
'--embedding_model', '/tf_dataset2/inc-ut/gte-base']
with patch('sys.argv', ['python ragas_evaluation.py'] + argv):
result = ragas_evaluation.main()
self.assertIsNotNone(result)

if __name__ == '__main__':
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# !/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2023 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import unittest, os, shutil
from unittest.mock import patch
from intel_extension_for_transformers.neural_chat.tools.evaluation.data_augmentation import retrieval_dataset_construction, llm_generate_truth
from intel_extension_for_transformers.neural_chat.tools.evaluation.retriever import evaluate_retrieval
from intel_extension_for_transformers.neural_chat.tools.evaluation.framework import ragas_evaluation

class TestEvaluation(unittest.TestCase):
def setUp(self) -> None:
if os.path.exists("data"):
shutil.rmtree("data", ignore_errors=True)
if os.path.exists("ground_truth.jsonl"):
os.remove("ground_truth.jsonl")
if os.path.exists("output"):
shutil.rmtree("output", ignore_errors=True)
return super().setUp()

def tearDown(self) -> None:
if os.path.exists("data"):
shutil.rmtree("data", ignore_errors=True)
if os.path.exists("ground_truth.jsonl"):
os.remove("ground_truth.jsonl")
if os.path.exists("output"):
shutil.rmtree("output", ignore_errors=True)
return super().tearDown()

def test_retrieval_dataset_construction(self):
path = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/assets/docs/retrieve_multi_doc/"
if os.path.exists(path):
input_path=path
else:
input_path='../assets/docs/retrieve_multi_doc/'
argv = ['--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \
'--embedding_model', '/tf_dataset2/inc-ut/gte-base', \
'--input', input_path, \
'--output', './data', \
'--range_for_sampling', '2-2', \
'--negative_number', '1']
with patch('sys.argv', ['python retrieval_dataset_construction.py'] + argv):
retrieval_dataset_construction.main()
self.assertTrue(os.path.exists("./data/minedHN_split.jsonl"))

def test_llm_generate_truth(self):
path = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl"
if os.path.exists(path):
input_path=path
else:
input_path='../tools/evaluation/data_augmentation/example.jsonl'
argv = ['--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \
'--input', input_path, \
'--output', 'ground_truth.jsonl']
with patch('sys.argv', ['python llm_generate_truth.py'] + argv):
llm_generate_truth.main()
self.assertTrue(os.path.exists("ground_truth.jsonl"))

def test_evaluate_retrieval(self):
path1 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/candidate_context.jsonl"
if os.path.exists(path1):
index_file_jsonl_path=path1
else:
index_file_jsonl_path='../tools/evaluation/data_augmentation/candidate_context.jsonl'
path2 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl"
if os.path.exists(path2):
query_file_jsonl_path=path2
else:
query_file_jsonl_path='../tools/evaluation/data_augmentation/example.jsonl'
argv = ['--index_file_jsonl_path', index_file_jsonl_path, \
'--query_file_jsonl_path', query_file_jsonl_path, \
'--embedding_model', '/tf_dataset2/inc-ut/gte-base']
with patch('sys.argv', ['python evaluate_retrieval.py'] + argv):
result = evaluate_retrieval.main()
self.assertIsNotNone(result)

def test_ragas_evaluation(self):
path1 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/answer.jsonl"
if os.path.exists(path1):
answer_file_path=path1
else:
answer_file_path='../tools/evaluation/data_augmentation/answer.jsonl'
path2 = \
"/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/ground_truth.jsonl"
if os.path.exists(path2):
ground_truth_file_path=path2
else:
ground_truth_file_path='../tools/evaluation/data_augmentation/ground_truth.jsonl'
argv = ['--answer_file', answer_file_path, \
'--ground_truth_file', ground_truth_file_path, \
'--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \
'--embedding_model', '/tf_dataset2/inc-ut/gte-base']
with patch('sys.argv', ['python ragas_evaluation.py'] + argv):
result = ragas_evaluation.main()
self.assertIsNotNone(result)

if __name__ == '__main__':
unittest.main()
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ langid
librosa
lm-eval
markdown
modelscope
neural-compressor
neural_speed==1.0a0
num2words
Expand All @@ -64,6 +65,7 @@ python-docx
python-multipart
pyyaml
qdrant-client==1.8.2
ragas==0.1.7
rank_bm25
resampy==0.3.1
rouge_score
Expand Down
Loading
Loading