QA-generator

根据统计学语料生成高质量统计学知识问答。语料的获取方式是将教材转化为文本后分成小段。

configs

参数配置

converter_config.py

CONVERTER_INPUT_DIR: $\texttt{.pdf}$格式的英文书籍所在文件夹

CONVERTER_OUTPUT_DIR: 转化成$\texttt{.mmd}$的输出文件夹

model_config.py

MODEL: 选用的模型名称

TEMPERATURE: 模型温度 $\in [0, 2.0]$

FREQUENCY_PENALTY: 重复token惩罚项 $\in [-2.0, 2.0]$

PRESENCE_PENALTY: 现有token惩罚项 $\in [-2.0, 2.0]$

processor_config.py

PROCESSOR_INPUT_DIR: 需要处理的文本所在文件夹

PROCESSOR_OUTPUT_BASE_DIR: 经过处理的文本的输出文件夹

qa_generator

问答生成模型

model.py

QAModel: 使用OpenAI的API接口模型生成问答

prompts.py

SYSTEM_PROMPT: 系统提示词

HUMAN_PROMPT: 第一轮对话的用户提示

AI_PROMPT: 第一轮对话的结果

INPUT_TEMPLATE: 第二轮对话中用户输入的模板

CHAT_HISTORY: 包含以上所有prompts的最终输入的模板

text_processor

对文本进行分段，有若干分段策略供选择

pieces.py

ChunkPiece: 固定长度段落

SectionPiece: 小节段落

text.py

Text: 包含原始的文本，可以通过调用segment对其进行分段

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
configs		configs
pdf		pdf
qa_generator		qa_generator
text_processor		text_processor
texts		texts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
export_json_to_dataset.py		export_json_to_dataset.py
main.py		main.py
response_viewer.py		response_viewer.py
show_chat_history.py		show_chat_history.py
test.py		test.py
test_textprocessor.py		test_textprocessor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QA-generator

configs

converter_config.py

model_config.py

processor_config.py

qa_generator

model.py

prompts.py

text_processor

pieces.py

text.py

About

Releases

Packages

Languages

License

Aoblex/QA-generator

Folders and files

Latest commit

History

Repository files navigation

QA-generator

configs

converter_config.py

model_config.py

processor_config.py

qa_generator

model.py

prompts.py

text_processor

pieces.py

text.py

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages