Stars
基于springboot以及ChatGPT接口的智能BI(Business Intelligence)项目 , 用户只需要输入分析诉求并导入XLS数据, 即可通过AI进行图表生成与数据分析 , 实现数据分析的降本增效。
A model that predicts the punctuation of English, Italian, French and German texts.
A python package for deep multilingual punctuation prediction.
a Fairseq fork for sequence tagging/labeling tasks
The case study and multilingfual performance of ICASSP submission
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Python port of Moses tokenizer, truecaser and normalizer
Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>
This is a code repository for the ACL 2022 paper "Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation"
Implementation of our paper "Self-training Sampling with Monolingual Data Uncertainty for Neural Machine Translation" to appear in ACL-2021.
Tool to fix bitexts and tag near-duplicates for removal
A list of awesome Machine Translation frameworks, libraries, software and papers
OpusFilter - Parallel corpus processing toolkit
Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
Multilingual word vectors in 78 languages
Open-Source Machine Translation Quality Estimation in PyTorch
Source code for the Apple reproduction