llmtoolkit
is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch. llmtoolkit
has implemented many language models and data preprocessing methods. More importantly, it provides a lot of examples that can run end-to-end.
Supported Language Models:
Supported Transformer Models:
- Python 3.7+
- Pytorch 1.5.0+
- https://zh.d2l.ai/
- Dive into Deep Learning,D2L.ai
- https://github.com/dmlc/gluon-nlp/
- GluonNLP: NLP made easy
- https://github.com/huggingface/tokenizers
- Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.
- https://github.com/The-AI-Summer/self-attention-cv
- Self-attention building blocks for computer vision applications in PyTorch
- 自然语言处理:基于预训练模型的方法(作者:车万翔、郭江、崔一鸣)
llmtoolkit
is released under the Apache 2.0 license.
Please cite the repo if you use the data or code in this repo.
@misc{llmtoolkit,
author = {jianzhnie},
title = {llmtoolkit: llmtoolkit is a toolkit for NLP and LLMs using Pytorch},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jianzhnie/LLMToolkit}},
}