NLP

A curated list of speech and natural language processing resources
NLPK: 강승식 교수의 nlp 카페
Introduction to NLP
An easy introduction to Natural Language Processing
Introduction to Natural Language Processing for Text
Introduction To Natural Language Processing | Machine Learning Projects | Eduonix
5—INTRO TO NLP AND RNNS
A Review of the Neural History of Natural Language Processing
Keyword extraction in Java
Extracting meaningful text from webpages
Extracting (meaningful) text from webpages - II
‘시리’가 아직까지 말귀를 못 알아듣는 까닭
Heteronym (linguistics)
Pronounceable Anagrams
ROC Curve, AUC
Part 1: For Beginners - Bag of Words 캐글뽀개기 6월 이상열
Writers Choose Their Favorite Words 쓰이는 단어의 종류를 통해 글 쓴 사람 예측?
Algorithms for text fingerprinting?
하나의 차트로 이해하는 민주당과 공화당이 세계를 보는 다른 시각
Ask HN: What are the best tools for analyzing large bodies of text?
Special Section: Reconceiving Text Analytics
Top NLP Algorithms & Concepts
ExoBrain
- 인간-기계 지식소통을 위한 자연어 QA 워크샵 – 엑소브레인 인공지능
한자로
Making Apps Understand Natural Language
Automatically spotting interesting sentences in parliamentary debates
Tone Analyzer
Bag of Words Meet Bags of Popcorn - (1) Part 1: Bag of Words
WHERE TECHNOLOGY MEETS BUSINESS. TYING TEXT ANALYTICS TO YOUR BUSINESS GOALS
For 40 years, computer scientists looked for a solution that doesn’t exist edit distance
Deep Learning for NLP Best Practices
DAWG data structure in Word Judge
A Simple Artificial Intelligence Capable of Basic Reading Comprehension
The future of programmers
IBM ‘왓슨’, 인지컴퓨팅 서비스로 업그레이드
How To Create Natural Language Semantic Search For Arbitrary Objects With Deep Learning
Natural Language Processing (NLP) for Semantic Search | Pinecone
politeness - Write in a more polite, friendly tone
Understanding Natural Language with Deep Neural Networks Using Torch
An Inside View of Language Technologies at Google
Google Cloud에서 Natural Language API 정리
Google Cloud 서비스 계정키 얻기 및 GCS 공유하기
Understanding Convolutional Neural Networks for NLP
- 자연어 처리 문제를 해결하는 CONVOLUTIONAL NEURAL NETWORKS 이해하기
Convolutional Methods for Text
- 텍스트 처리와 관련해서는 LSTM/GRU를 비롯한 RNNs 가 대세지만 CNN도 장점이 있는데 이를 잘 정리한 글
- RNN이 순서에 영향을 받지만 CNN은 단어의 의미에 영향을 주는 데에 있어 조금 멀리 떨어져 있는 문장에서의 단어 등이 역할을 할 수 있음
- 전체를 한꺼번에 보게 하는 데에는 더 유리
- NLP 전반에 대한 이해와 DNN 종류들의 장단점 등도 잘 파악할 수 있는 매우 좋은 글
Convolutional Sequence-to-Sequence Learning (2017)
- Convolutional Sequence-to-Sequence Learning (2017)
- (NLP 처음 접하시는 분들을 위한)
1. RNN enc-dec 부터 conv seq2seq 까지 간단한 흐름 정리
2. conv s2s 이해를 위해 읽어야 할 논문 10+ 편
시나브로 배우는 자연어처리
- 시나브로 배우는 자연어처리 실습자료
collocations.de - Association Measures
Perpelxity
- Perplexity in LM
- Lecture 4: Evaluating language models
- speech recognition & LM
- 하이퍼망 분자컴퓨팅 기반 단어 재인 시뮬레이션
An Experimental Study on Open Source Korean Morphological Analyzers for Evaluating Noun Extraction
Episode 22: 자연언어처리 특집 1부 – 마이크로소프트 NLP연구실의 김용범님과 함께
Espresso - AIR LAB, Changwon National University
악평생성기 (Bad Comment Generator using RNN) _ 송치성
- Bad Comment Generator using RNN
Generating text using a Recurrent Neural Network
딥엘라스틱 - 검색 + 로봇 저널리즘 + 인지신경언어학 + 딥러닝NLP
PHP + MySQL 언어 식별기(Language Detection) 개발기
- 언어 식별기 (Language Detection)
word-rnn - a fork of Andrej Karpathy's wonderful char-rnn
컴퓨터가 소설을 써요
Next Word Auto-Completion
2015 자연어처리 및 정보검색 워크샵
“네이버에서 만나보셨나요? 인공지능 채팅 로봇”
Introducing DeepText: Facebook's text understanding engine
- 페이스북, ‘사람 수준으로’ 내용을 이해하는 딥텍스트 A.I. 공개
NLP 자연어처리
니코니코동화의 공개코멘트 데이터를 Deep Learning로 해석하기
- わかるLSTM ～最近の動向と共に
Generative Models
온라인 한국어 POS 태거 만들기
파이썬을 이용한 자연어처리 기초
Introducing Cloud Natural Language API, Speech API open beta and our West Coast region expansion
구글, 자연어·음성인식 API 공개…한국어도 포함
머신러닝의 자연어 처리기술(I)
영국의 19살 청년이 만든 인공지능 로봇변호사
ko_restoration - Module for restoring Korean text working with KomornaPy
딥러닝을 이용한 자연어처리의 연구동향
Exploring Session Context using Distributed Representations of Queries and Reformulations
- 사용자의 쿼리 세션데이터와, 문서클릭데이터로 CNN으로 쿼리의 word-embedding을 만듦
- 쿼리와 관계를 벡터로 변환
- 두 쿼리의 관계벡터는 단순히 두 쿼리벡터의 뺴기(차이?)로 간단하지만
- 이러한 관계벡터들을 클러스터링하니, 쿼리 변환의 의도가 클러스터링 됨
  - 동일의도인데, 다른 모양의 쿼리변환
  - 검색 의도를 좁히는 쿼리변환
  - 의도를 아예 점프하는 쿼리변환
기계학습과 딥러닝의 응용
Universal Dependencies
BabelNet
- META prize 2015: BabelNet!
An Intuitive Natural Language Understanding System
An NLP Approach to Analyzing Twitter, Trump, and Profanity
Deep Learning Cases: Text and Image Processing
CS 124: From Languages to Information
NLP Seminar Schedule — Winter 2019
영문 복사만 하면…품사 알려드려요
PyData Paris 2016 - Statistical Topic Extraction
28회 한글 및 한국어 정보처리 학술대회
- 자료실
- 개체명 인식 시스템 개발 및 적용
확률문법
주니어 데이터 분석가의 걸그룹 데이터 분석하기
korean.abcthesaurus.com 동의어 사전
Microsoft Concept Graph Preview For Short Text Understanding
en.wikipedia.org/wiki/Precision_and_recall
- 실제와 예측이 일치; True Positive / Negative
- 실제와 예측이 불일치; False Positive / Negative
- 발생했다고 예측 Positive, 발생하지 않았다고 예측 Negative
- 정밀도와 재현율
- accuracy, precision, recall의 차이
- 정확도(accuracy)와 정밀도(precision)의 차이
- en.wikipedia.org/wiki/Sensitivity_and_specificity
- measure 상관관계
- #2.6. Accuracy, Precision, Recall
- 입개발자를 위한 Accuracy, Precision, Recall
- Classification 모델 평가 기준 1편
- Classification & Clustering 모델 평가
- Fighting Financial Fraud with Targeted Friction
- Beyond Accuracy: Precision and Recall
- Precision vs Recall
- Comparison of the best NSFW Image Moderation APIs 2018
- Understand Classification Performance Metrics
- 민감도와 특이도 (sensitivity and specificity)
- 혼돈행렬, 혼돈매트릭스,Confusion Matrix :: 통컨(통계컨설팅)
- 정밀도(Precision)와 재현율(Recall) 내용 정리 | Pacientes Devlog
- AP & mAP 내용 정리 | Pacientes Devlog AP(Average Precision) mAP(mean Average Precision)
Natural Language Understanding with Distributed Representation
Repository for PyCon 2016 workshop Natural Language Processing in 10 Lines of Code
Deep Learning the Stock Market
NLP: Everyday, Analytical & Unusual Uses
Welcome to Railroad Diagram Generator! BNF rule to diagram
Awesome-Korean-NLP
Awesome-korean-nlp
Is Google Hyping it? Why Deep Learning cannot be Applied to Natural Languages Easily
ratsgo.github.io/blog/categories
- 딥러닝 기반 자연어처리 기법의 최근 연구 동향
NLP를 위한 딥러닝 가이드
Information Extraction with Reinforcement Learning
Last Words: Computational Linguistics and Deep Learning
- PDP(연결주의)쪽 룸멜허트나 맥클랜드의 연구들 - 신경망 기반 의미론 모형
- 인간 언어와 관련한 인지과학적 연구 - 어떻게 언어를 학습하고 개념들이 조직화되는가라는 관점
Computational Linguistics and Deep Learning
4 APPROACHES TO NATURAL LANGUAGE PROCESSING & UNDERSTANDING
- Distributional: 최근 유행하는 ML. 폭은 넓힐 수 있지만, 깊이는 잡지 못함
- Frame-based: 마빈 민스키. 논리적 semantics에 강점. 확고한 supervision이 존재해야 한다는 단점
- Model-theoretical: Q/A와 rich semantics의 장점. (프레임 기반보다 더한) labor-intensive and narrow in scope
- Interactive learning: language as a cooperative game between speaker and listener
  - Syntax – what is grammatical? : “no compiler errors”
  - Semantics – what is the meaning?: “no implementation bugs”
  - Pragmatics – what is the purpose or goal?: “implemented the right algorithm.”
Deep Learning for Text Understanding from Scratch
How to get started in NLP
NATURAL LANGUAGE GENERATION
NLP for Korean
- nlp4kor
- CNN for MNIST
- CNN for MNIST #1
- CNN for MNIST #2
- FFNN for 한글 띄어쓰기
- DAE for 철자 오류 교정
Teaching Machines to Describe Images with Natural Language Feedback
Sang-Kil Park's Jupyter Notebooks
An Adversarial Review of “Adversarial Generation of Natural Language”
Deep Learning for Speech and Language
deep learning nlp best practices
Natural Language Processing in Artificial Intelligence is almost human-level accurate. Worse yet, it gets smart!
Language Emergence
Speech and Language Processing (3rd ed. draft)
Memory Augmented Neural Networks for Natural Language Processing
EMNLP 2018 참관기
EMNLP 2017
- EMNLP 2017
Natural Language Processing Tasks and Selected References
자연언어처리(NLP)를 위한 언어학 기초
- 담화분석
- 화용론
- 의미론
- 통사론
- 구와 문장
- 형태론
- 단어의 형성
- 언어의 기원
Deep Learning for NLP, advancements and trends in 2017
Deep NLP: 딥러닝을 이용한 자연어처리
AI: NLP
ML/NLP PUBLICATIONS IN 2017
Experiments Codes for Bi-directional Block Self-attention
- Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling
- 주어진 시퀀스를 여러 개의 Block 으로 나누고 intra-block SAN으로 local context 를 모델링한 뒤, inter-block SAN으로 long-range dependency 를 모델링
- 기존의 Self-Attention Network (SAN) 이 너무 메모리를 많이 쓰는 점을 개선
- 많은 NLP 분야에서 Self-attention 기법들이 (특히 번역 분야에서는) 표준으로 자리잡고 후속 연구가 활발히 이루어지고 있는 걸로 보임
  - (ex. Non-autoregressive transformer, Masked self-attention, Directional self-attention)
Understanding and Applying Self-Attention for NLP - Ivan Bilan
How to solve 90% of NLP problems: a step-by-step guide
파이썬자연어처리
Text Analysis Developers’ Workshop 2018 참석 후기
Text Analysis in Excel: Real world use-cases
Auto Tagging Stack Overflow Questions
A Neural Network Model That Can Reason - Prof. Christopher Manning
- Compositional Attention Networks for Machine Reasoning
NLP with attention
Team AURA - 1st Meeting Summary
NLP's ImageNet moment has arrived
Introduction to Clinical Natural Language Processing: Predicting Hospital Readmission with Discharge Summaries
Feature-wise transformations - A simple and surprisingly effective family of conditioning mechanisms
RNN과 Beam search
딥러닝을 이용한 자연어 처리
Unicode 2.0 에서 한글의 이해
한글 유니코드 자소 분리
PyConKr 2018 Why I learn, How I learn
Analogy and Analogical Reasoning
딥러닝이 탐구하지 못한 언어와 5가지 태스크
How NLP is Automating the complete Text Analysis Process for Enterprises?
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
NLP's ImageNet moment has arrived
- 시간 문제에 불과하다는 결론, BERT의 등장으로 현실에 가까워짐(ELMO - LSTM / OpenAI의 GPT, BERT - Transformer)
- Pre-trained Models의 fine-tuning은 필수, 인간이 언어를 이해한다는 것이 그저 엄청난 계산에 불과할 뿐이라는 사실(정말인가?)
- 이제 계산량을 줄이는 방법이 아니라 계산량을 늘리고 계산 속도를 높이는 방향이 옳을 지도 모름
github.com/warnikchow
- DLK2NLP: Day-by-day Line-by-line Keras-based Korean NLP
  - 3i4K - Intonation-aided intention identification for Korean
- KorInto - 5-class sentence-final intonation classifier for a syllable-timed and head-final language (Korean) 억양분류
- KorEmo - 5-class Korean emotion classifier 감정분류
- raws - Real-time Automatic Word Segmentation (for user-generated texts) 한영 noisy text segmentation
NLP Guide: Identifying Part of Speech Tags using Conditional Random Fields
Industrial strength Natural Language Processing
HMTL로 NLP의 최첨단 기술을 뛰어 넘다
A Review of the Neural History of Natural Language Processing
Analyzing open-ended text? Its easier than you think!
Fast Word Segmentation of Noisy Text
Solving NLP task using Sequence2Sequence model: from Zero to Hero
Natural Language Processing is Fun! How computers understand Human Language
NLP 2018 highlights
딥러닝 자연어처리 - RNN에서 BERT까지
딥러닝 자연어처리 - YouTube
Natural Language Processing in Python
A Practitioner's Guide to Natural Language Processing (Part I) — Processing & Understanding Text Proven and tested hands-on strategies to tackle NLP tasks
The 7 NLP Techniques That Will Change How You Communicate in the Future
- (Part I)
- (Part II)
Natural Language Understanding benchmark
- NLU / Intent Detection Benchmark by Intento, August 2017
콜라 좀… 쉽게 담을 수 없나요, 쓰앵님 메뉴 검색을 위해 초중종성 분리 검색 개발
Machine Learning with Python: NLP and Text Recognition
Deploying Handwritten Text Recognition Using Tensorflow and CNN
I build my ideas #8 - 07/19/20 - I build my ideas from Jordan Singer
Text generation with a Variational Autoencoder
Sentence Simplification with Seq2Seq
seq2seq.ipynb - Colaboratory
Integrating Transformer and Paraphrase Rules for Sentence Simplification
How Transformers Work
Transformer (Attention Is All You Need) 구현하기 (1/3)
Transformer (Attention Is All You Need) 구현하기 (2/3)
Transformer (Attention Is All You Need) 구현하기 (3/3)
Transformer - Harder, Better, Faster, Stronger - Transformer 구조체와 이 구조를 향상시키기 위한 기법들을 같이 알아봅시다
구글 AI 리포머: 효율적인 트랜스포머 ipynb
Transformer:언어 이해를 위한 새 신경망 구조
How-to Build a Transformer for Language Classification in TensorFlow
NLP 논문 구현 pytorch로 구현하는 Transformer (Attention is All You Need) – Hansu Kim
tta: Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2
Transformers Explained Visually (Part 1): Overview of Functionality | by Ketan Doshi | Towards Data Science
Transformers Explained Visually (Part 2): How it works, step-by-step | by Ketan Doshi | Towards Data Science
Transformer in CV. The increasing convergence of computer… | by Cheng He | Towards Data Science
Generative Python Transformer p.1 - Acquiring Raw Data - YouTube
Generative Python Transformer p.2 - Raw Data Cleaning - YouTube
Generative Python Transformer p.3 - Preprocessing Dataset - YouTube
Generative Python Transformer p.4 - Tokenizing - YouTube
Generative Python Transformer p.5 - Training and some testing of GPT-2 model - YouTube
Generative Python Transformer p.6 - Testing larger model - YouTube
Sentdex/GPyT · Hugging Face
- GPyT - Generative Python Transformer Model released (the off-brand Github Copilot) - YouTube
유니버설 컴퓨팅 엔진으로 사전 훈련된 트랜스포머
2021-dialogue-summary-competition: 2021 훈민정음 한국어 음성•자연어 인공지능 경진대회 대화요약 부문 알라꿍달라꿍 팀의 대화요약 학습 및 추론 코드를 공유하기 위한 레포입니다
Position Encoding의 종류와 분석. by 박승원 (http://swpark.me/) | by Team Deepest | Feb, 2021 | Medium
10 Exciting Ideas of 2018 in NLP
Talk Powerpoint Generator
#자연어, #시퀀스를 위한 #재귀신경망 성능향상 기법! 대공개!! 첫번째!
Natural language processing of customer reviews
자연어 처리 Word representation
Introduction to Natural Language Processing (NLP) and Bias in AI
nlp_applications ipynb
NLP News By Sebastian Ruder
NLP 101: 딥러닝과 자연어 처리 학습을 위한 자료 저장소
Natural Language Processing RoadMap - 2019
Nlp Roadmap
NLP Year in Review — 2019
NLP HighlightsPro - Allen Institute for Artificial Inte Seattle, United States
SKC_Text_Preprocessing - SKC 텍스트 전처리 강의
한국어 전처리.ipynb - Colaboratory
PRODUCTIONIZING NLP MODELS
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
Distilling knowledge from Neural Networks to build smaller and faster models
일본어 NLP 프로젝트 시작하기
자연언어처리(NLP)... 무엇이며, 그 기술과 시장은?
기획자와 마케터를 위한 이벤트 댓글 분석 - feat. 인프런 새해 다짐 이벤트
- 텍스트데이터분석
NLU sense
A no-frills guide to most Natural Language Processing Models — The Pre-LSTM Ice-Age — (R)NNLM, GloVe, Word2Vec & fastText
Natural Language Processing(NLP) Real World Project in Web Using Flask:- Himanshu Tripathi
싸이감성체를 활용한 한국인 인증 인터페이스 개발기
네이버 스마트 스토어 구매평을 통한 소비자 반응 분석(텍스트 분석) 프로젝트
100문제로 두드려 보는 자연어처리 2020 (Rev 1) - NLP100 2020
Text-to-SQL Learning to query tables with natural language
클린봇 2.0: 문맥을 이해하는 악성 댓글(단문) 탐지 AI ELMO
Semantic Segmentation PyTorch Tutorial & ECCV 2020 VIPriors Challenge 참가 후기 정리
- semantic-segmentation-tutorial-pytorch: A simple PyTorch codebase for semantic segmentation using Cityscapes
awesome-semantic-segmentation: awesome-semantic-segmentation
머신러닝을 활용한 오픈챗 클린 스코어 모델 개발기 - LINE ENGINEERING
badword_check: 딥러닝을 통한 한글 욕설 분류 프로젝트
위클리 NLP - jiho-ml
Automate Data Cleaning with Unsupervised Learning | by Marco Cerliani | Towards Data Science
Knowledge Graphs in Natural Language Processing @ ACL 2020 | by Michael Galkin | Towards Data Science
지식그래프에서 경로를 탐색하는 모델 AttnIO를 소개합니다
AI Grand Challenge, 인공지능 그랜드 챌린지 1위 입상 후기 및 입상 모델 설명 음성인식 + 텍스트분류
ML and NLP Research Highlights of 2020
2018-2020 NLU 연구 동향을 소개합니다
자연어 생성의 편견과 기타 유해성에 대처하기 - ITWorld Korea
Best Practices: Designing autosuggest experiences
- 자동 추천(Autosuggest) 기능의 사용자 경험 설계하기 | GeekNews
'사물에 이입해 대답한다' 구글, 개발자 회의서 AI 기술 역량 과시 - CIO Korea LaMDA MUM
텍스트 스타일을 바꾸는 딥러닝 기술 | Kakao Enterprise AI Research text style transfer
'빅데이터/빅데이터 with python' 카테고리의 글 목록 웹 검색엔진 만들기
How MDN's autocomplete search works - Mozilla Hacks - the Web developer blog
- MDN의 검색 자동완성 구현 방법 | GeekNews
NLP와 OCR, 머신 러닝으로 더욱 편리하고 특별한 LINE 가계부 만들기 - LINE ENGINEERING
Machine Learning Won't Solve Natural Language Understanding NLU
NLP in Fintech. Introduction | by FinTech MK | Sep, 2021 | Medium
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Proces…
Selecting optimal subsets of Amazon Reviews & Large Scale Data Pipeline for Scraping Amazon Reviews - YouTube
Boost Customer Experience With NLP | LinkedIn 기술적인 이야기가 아니라 NLP로 product의 가치를 높이는 이야기
"언어학자는 초거대 AI 개발 비용 줄이는 법을 알고 있다"...서울대 박진호 교수 인터뷰 - AI타임스
essay-grading-hackathon: 🥇1st solution 에세이 글 데이터 인공지능 학습용 데이터 해커톤
자연어처리와 HR analytics
2021년 기계 학습과 자연어 처리 연구 하이라이트
Using Kubeflow to solve natural language processing problems
Document Understanding 그리고 Information Extraction의 Multi-Modal Embedding DRAMA&COMPANY AI Lab.

띄어쓰기

기계학습을 이용한 한글 자동 띄어쓰기
어절 uni-gram을 이용한 띄어쓰기 모델
Sentence boundary disambiguation
python-crfsuite를 사용해서 한국어 자동 띄어쓰기를 학습해보자
RNN을 이용한 한글 자동 띄어쓰기
대화체에 유연한 띄어쓰기 모델 만들기
- 핑퐁에서 만든 채팅체랑 잘 맞는 띄어쓰기 모델!
딥러닝 기반 한글 자동 띄어쓰기 API 공개
- 딥러닝 한글 자동띄어쓰기 모형 성능 향상 및 API 업데이트
한국어 띄어쓰기 프로그램 도전기
korean-spacing-model: 한국어 문장 띄어쓰기(삭제/추가) 모델입니다. 데이터 준비 후 직접 학습이 가능하도록 작성하였습니다
- 한국어 띄어쓰기 모델 작성하기 – Jeong Ukjae
KoSpacing : 한글 자동 띄어쓰기 패키지 공개
- KoSpacing - R package for automatic Korean word spacing
soyspacing. Heuristic Korean Space Correction, A safer space corrector

Annotation

Korean Treebank Annotations Version 2.0
- sample EUC-KR encoded
brat rapid annotation tool online environment for collaborative text annotation
- brat rapid annotation tool (brat) - for all your textual annotation needs
doccano: Open source text annotation tool for machine learning practitioner

BERT

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT TensorFlow code and pre-trained models for BERT
bert-as-service - Mapping a variable-length sentence to a fixed-length vector using pretrained BERT model
BERT – STATE OF THE ART LANGUAGE MODEL FOR NLP
Language Learning with BERT - TensorFlow and Deep Learning Singapore
BERT-NER - Use google BERT to do CoNLL-2003 NER !
BERT-BiLSMT-CRF-NER - Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning
BERT을 이용한 NER 적용 방법 정리 :: MezzanineX
Dissecting BERT
Bert state Of The Art pre Training for nlp Post
bert-multiple-gpu - A multiple GPU support version of BERT
NVIDIA Achieves 4X Speedup on BERT Neural Network
BERT 톺아보기
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
SQUAD 2.0과 BERT(2)
Multi-label Text Classification using BERT – The Mighty Transformer
Multi-GPU Ready BERT
BERT 논문정리
Visualization tool for Transformer-based language representation models (demonstrated on BERT)
Guide KorQuAD upload to leaderboard (EM 68.947 / F1 88.468) model which only use BERT-multilingual(single) https://korquad.github.io
Transformer-Encoder-with-Char
Language Model Overview: From word2vec to BERT
BERT Explained: State of the art language model for NLP
Efficient Training of Bert by Progressively Stacking
- Source code for "Efficient Training of BERT by Progressively Stacking"
카톡 데이터는 어떻게 정제할 수 있을까? - Dialog-BERT 만들기 1편
누가누가 잘하나! 대화체와 합이 잘 맞는 Tokenizer를 찾아보자! - Dialog-BERT 만들기 2편
카톡 대화 데이터를 BERT로 잘 학습시킬 수 있을까? - Dialog-BERT 만들기 3편
대화의 Context를 반영한 답변을 생성할 수 있을까? - Dialog-BERT 만들기 4편
A Simple Guide On Using BERT for Binary Text Classification
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
MULTI GPU환경에서 ETRI 한국어 BERT모델 활용한 Korquad 학습 방법
- nlp-api - ETRI KoBERT에서 사용하기 위해 만든 Mecab 형태소 분석기 API
AI도 한글 공부가 필요해! 국내 유일의 한국어 데이터셋 코쿼드(KorQuAD) 2.0 이야기
꼼꼼하고 이해하기 쉬운 XLNet 논문 리뷰
한국어로 XLNet 학습해보기!
Google Brain’s XLNet bests BERT at 20 NLP tasks
실제 코드로 보는 XLNet (Code Review)
Introducing MASS – A pre-training method that outperforms BERT and GPT in sequence to sequence language generation tasks
BERT 설명 발표 자료
파이콘 2019 100억건의 카카오톡 데이터로  똑똑한 일상대화 인공지능 만들기
Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT
GPT3 능가하는 자연어 모델 훈련 알고리즘 등장 - AI타임스
More on Transformers: BERT와 친구들
GLUE: 벤치마크를 통해 BERT 이해하기
2020.02.06 우리는 왜 glue를 버렸나?
StructBert Review
Using BERT For Classifying Documents with Long Texts
AI조직에서의 1년
BERT(Bidirectional Encoder Representations from Transformers) 구현하기 (1/2)
BERT(Bidirectional Encoder Representations from Transformers) 구현하기 (2/2)
BERT-related Papers
BERT로 Q&A 구현해보기 With SQuAD AND KERAS
주정헌 - Revealing the Dark Secrets of BERT - YouTube
nn.utils.prune 모듈로 BERT 파라미터 Pruning 해보기
Movie Reviews with bert-for-tf2 on TPU.ipynb - Colaboratory
BERT for Sentiment Analysis on Sustainability Reporting
Colab에서 TPU로 BERT 처음부터 학습시키기 - Tensorflow/Google ver. - Beomi's Tech blog
- 공개용 Colab에서 TPU로 KcBERT 처음부터 Pretrain하기 with Korpora - Colaboratory
PyCon2020 NLP beginner's BERT challenge
딥러닝으로 동네생활 게시글 필터링하기. BERT를 사용해서 동네 생활 게시글 필터링 모델을 개발한 과정을… | by matthew l | 당근마켓 팀블로그 | Medium
Using BERT to Battle Job Scams. The BERT model has many practical… | by Sadrach Pierre, Ph.D. | Towards Data Science
Pydata Berlin Meetup October 2020: Long Story Short: - YouTube
파이썬 문장 유사도 알고리즘 쉽게 확인하는 방법은?!
deep learning NLP easy to understand BERT - YouTube
Why and how to use BERT for NLP Text Classification? - Analytics Vidhya
ALBERT Review
- ALBERT:언어 표현의 자율지도 학습
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Langua…
deberta-v2-xlarge-mnli · Hugging Face
exBERT - A Visual Analysis Tool to Explore Learned Representations in Transformers Models
HanBert-54kN
Keras-Bert를 이용한 간단 구현(정확도 94%) - DACON
KeyBERT: Minimal keyword extraction with BERT
- KeyBERT.ipynb - Colaboratory
KoBART: Korean BART Bidirectional and Auto-Regressive Transformers, 한국어 encoder-decoder 언어 모델
- Korean BERT pre-trained cased (KoBERT)
- KoBART-summarization: Summarization module based on KoBART
- kobart-transformers: kobart on huggingface transformers
Korean ALBERT
KoreanCharacterBert - Korean BERT model using character tokenizer
korpatbert: 특허분야 한국어 AI언어모델 KorPatBERT
ko-sentence-transformers: 한국어 BERT 모델을 sentence-transformers 라이브러리에서 활용
KR-BERT-SimCSE: Implementing SimCSE using KR-BERT
- SimCSE 리뷰 & KR-BERT 이용해서 구현해보기 – Jeong Ukjae
MT-DNN Review
publicservant_AI
RoBERTa Review
- Decoding-Enhanced BERT with Disentangled Attention Paper explained - YouTube
SBERT Basic NLP sentence-transformers 라이브러리를 활용한 SBERT 학습 방법
soongsil-bert-base-nsmc.ipynb - Colaboratory
TinyBERT
XLNet: Generalized Autoregressive Pretraining for Language Understanding(19.06.25)
- A Simple Explanation of XLNet

Book

Neural Network Methods for Natural Language Processing
- A Primer on Neural Network Models for Natural Language Processing
Quantitative corpus linguistics with R: a practical introduction
Speech and Language Processing (3rd ed. draft)
딥 러닝을 이용한 자연어 처리 입문
- tensorflow-nlp-tutorial: tensorflow를 사용하여 텍스트 전처리부터 BERT, GPT와 같은 최신 모델의 다운스트림 태스크 코드들을 정리한 Deep Learning NLP 저장소입니다
음성인식으로 시작하는 딥러닝
자연어처리(NLP) 추천 웹사이트, 동영상강좌, 책
7 Best Natural Language Processing Books In 2020
7 Best Natural Language Processing Books In 2021
practical-nlp: Official Repository for 'Practical Natural Language Processing' by O'Reilly

ChatBot

HipChat을 이용한 ChatBot 만들기
DEEP LEARNING FOR CHATBOTS, PART 1 – INTRODUCTION
- 딥러닝 챗봇, PART 1 – INTRODUCTION (한글번역)
DEEP LEARNING FOR CHATBOTS, PART 2 – IMPLEMENTING A RETRIEVAL-BASED MODEL IN TENSORFLOW
- 딥러닝 챗봇 , PART 2 – IMPLEMENTING A RETRIEVAL-BASED MODEL IN TENSORFLOW(한글번역)
Deep leaning for Chatbot Developers
- DL-for-Chatbot
Clippy’s Back: The Future of Microsoft Is Chatbots
Build a bot without coding - Launch a full-featured chatbot in 7 minutes
Microsoft Bot Framework
사람이 챗봇을 만듭니다
- Microsoft Bot Framework 관련 강좌
- 20180120Hands_on_Lab
AWS Lambda와 API Gateway로 Slack Bot 만들기
Your next shopping experience starts with a text
AWS 서버리스 챗봇 경진대회에 참여하세요!
The White House's New Facebook Messenger Bot Makes It Easy To Send A Message To Obama
Wonder is a bot that will remember anything for you
Introducing the Bots Landscape: 170+ companies, $4 billion in funding, thousands of bots
지적 대화를 위한 깊고 넓은 딥러닝 Pycon APAC 2016
- PyCon 2016’s TensorFlow 자료
- 1. 이미지(사람의 얼굴 사진)을 이해하고 스스로 만드는 모델
  - carpedm20.github.io/faces
- github.com/carpedm20/DCGAN-tensorflow
- 뉴럴 네트워크로 만든 튜링 머신
- Question Answering, Language Model
- Teaching Machines to Read and Comprehend
- Neural Variational Inference for Text Processing
Stanfy Blog
- Advanced Natural Language Processing Tools for Bot Makers – LUIS, Wit.ai, Api.ai and others
- The Rise of Chat Bots: Useful Links, Articles, Libraries and Platforms
- Know Your Bot, Part II: Slack, The Bot Paradise
- Know Your Bot, Part I: Telegram And Twitter
- s2 lab1-1: API.ai concept and terms
- s2 lab1-2: API.ai making bot demo
Multi-domain Neural Network Language Generation for Spoken Dialogue Systems(NAACL-HLT 2016)
- code
코딩없이 만드는 채팅봇
Do-it-yourself NLP for bot developers
Making Friends With Artificial Intelligence: Eric Horvitz at TEDxAustin
4차 산업혁명 특별기획 ‘기계와의 대결’ 2부
Facebook steps in to prove the value of chatbots with Tommy Hilfiger
The rise of bots... acquisitions!
라이크 어 Poncho: JiveScript 날씨 챗봇
혼자 힘으로 한국어 챗봇 개발하기
챗봇 개발 프레임워크 ChatFlow, 베타버전 출시
Build a restaurant reservation Messenger bot using IBM Watson with no code
DeepQA - My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot
- Deep Q&A
챗봇 시작해보기
대화형 챗봇 설계의 과제
A developer's guide to chatbots
UX 북마크#10. 챗봇(Chatbot) A-Z
TF-KR Conf 2 강의 2: 조재민, Developing Korean chatbot 101
Developing Korean Chatbot 101
Retrieval-Based Conversational Model in Tensorflow (Ubuntu Dialog Corpus)
20170121 한국인공지능협회 - 제7차 오픈세미나 - 챗봇 (1/5)
20170121 한국인공지능협회 - 제7차 오픈세미나 - 챗봇 (2/5)
20170121 한국인공지능협회 - 제7차 오픈세미나 - 챗봇 (3/5)
20170121 한국인공지능협회 - 제7차 오픈세미나 - 챗봇 (4/5)
20170121 한국인공지능협회 - 제7차 오픈세미나 - 챗봇 (5/5)
세계 챗봇 생태계 분석
20170227 파이썬으로 챗봇 만들기
KahWee Teng: Coding Chat Bots - JSConf.Asia 2016
Node.JS로 카카오봇 만들기
카카오톡 자동응답 API로 학식봇 구현
카카오톡 자동응답 API를 이용하여 카카오톡 봇 만들기
The Conversational Intelligence Challenge
Visual Dialog - a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content
- Visual Dialog Challenge 2018
Natural Language Pipeline for Chatbots
Contextual Chat-bots with Tensorflow
How To Build an Interactive Chatbot for Twitter Direct Messages
슬랙봇, 어디까지 만들어봤니?
왓슨으로 쉽게 개발하는 카카오톡 챗봇 1. Watson Conversation 서비스로 인공지능 대화 서비스 만들기
Node.js Facebook 챗봇 빠른시작: 369봇 만들기
챗봇을 만들기 위한 두 가지 AI 모델
Deal or no deal? Training AI bots to negotiate
- Deal or No Deal? End-to-End Learning for Negotiation Dialogues pytorch
대화시스템 개발을 위한 자연어처리 기술
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
- tensorbeat-answerbot
30분 안에 챗봇 만들기 1편
30분 안에 챗봇 만들기 2편
Own ChatBot Based on Recurrent Neural Network
Chatbots: Theory and Practice
Show me red! – feat. 서울 시립 미술관 데이터를 사용한 챗봇 만들기
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
“인공지능” 리테일 챗봇 만들기
Seq2Seq Chatbot
페이스북 챗봇 만들기
Neural Network Dialog System Papers
강화학습 챗봇 Dialogue Policy Optimization
딥러닝을 사용한 챗봇 개발방법 정리
Python과 Tensorflow를 활용한 Al 챗봇 개발 4강
- Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
- Seq2Seq를 활용한 간단한 Q/A 봇을 만들어보자
Retrieval-Based Conversational Model in Tensorflow (Ubuntu Dialog Corpus)
Automated Text Classification Using Machine Learning
sunwoobot - 선우봇 카카오 i 오픈빌더 챗봇
A Repository of Conversational Datasets PolyAI 공개. Reddit, OpenSubtitles, AmazonQA 등에서 모은 수억 건의 대화 데이터셋
인공지능 챗봇과 딥러닝 자연어 처리
Not another Conversational AI report
How do Dialogue Systems decide what to say or which actions to take?
구글의 Open-Domain 챗봇 'Meena' 논문 리뷰
Build a WhatsApp Chatbot With Ruby, Sinatra, and Twilio
Dialogue Generation
일상대화 챗봇 분석 시스템 구축기 1편 - 데이터 파이프라인 편 – 핑퐁팀 블로그
일상대화 챗봇 분석 시스템 구축기 2편 - 데이터 시각화 편 – 핑퐁팀 블로그
챗봇을 위한 대화는 어떻게 디자인할까
Recipes for building an open-domain chatbot
How We Built Our In-house Chat Platform for the Web
How We Improved Agent Chat Efficiency with Machine Learning
AI 모델 탐험기 #1 프롤로그: 인공지능의 달에 착륙하다 | by AI Network | AI Network_KR | Mar, 2021 | Medium
1일차 - 딥러닝 챗봇 만들기 스터디 후기 - 토크나이징, 임베딩, 텍스트 유사도 : 네이버 카페
Using Semantic Search to Drive Smart Annotations for Chatbot Models | by Samarth Agarwal | DBS Tech Blog | Jan, 2022 | Medium
Botkit - Building Blocks for Building Slack Bots
bots.duolingo.com
Chatbot
Dialog System - http://nlp.postech.ac.kr/research/dialog_system
Heek is a chatbot that can build you a website
Kino - My Personal Assistant (개인용 Slack Bot을 통한 Quantified Self 프로젝트)
www.luis.ai
Plato Research Dialogue System
- Introducing the Plato Research Dialogue System: A Flexible Conversational AI Platform
slacker로 slack bot 만들기
Stephanie - YOUR VIRTUAL ASSISTANT!
- Stephanie Virtual Assistant
- Stephanie - an open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of a virtual assistant's work
- SOUNDER ALGORITHM
- Sounder API - the Sounder Library API, which is an abstraction of the Sounder Algorithm
- USAGE
wit.ai
- Wit.ai stories/conversational app demo
x.ai is a personal assistant who schedules meetings for you

ChatBot Python

Slacker를 이용한 Slack Bot 만들기
Building AI Chat bot using Python 3 & TensorFlow
- Chat bot making process using Python 3 & TensorFlow
- 신정규 : Creating AI chat bot with Python 3 and Tensorflow - PyCon APAC 2016
- Scripts used for preparing PyCON APAC 2016 presentation https://speakerdeck.com/inureyes/building-ai-chat-bot-using-python-3-and-tensorflow
Create a Chatbot for Telegram in Python to Summarize Text
python에서 telegram bot 사용하기
python으로 telegram bot 활용하기
- 1 기본 설정편
- 2 채널편
- 3 챗봇편
- 4 Inline Keyboard편
Learn to build your first bot in Telegram with Python
Building a Telegram Bot 🤖 to Automate Web Processes Using Python, Selenium and Telegram
[카카오톡 대화 생성기(http://jsideas.net/python/2017/04/05/kakao_rnn.html)
Building a botnet on PyPi
ChatOps with PowerShell - Matthew Hodgkins
Let Android dream electric sheep: Making emotion model for chat-bot with Python3, NLTK and TensorFlow
Building a Simple Chatbot from Scratch in Python (using NLTK)
A Transformer Chatbot Tutorial with TensorFlow 2.0
How To Make a Chatbot in Python | Python Chat Bot Tutorial
채팅 프로그램 만들기 : 네이버 블로그
Build a Collaborative Chatbot with Google Sheets and TensorFlow | Jonathan Bgn
Build A Simple Chatbot In Python With Deep Learning | by Kurtis Pykes | Mar, 2021 | Towards Data Science
Blender, Facebook State-of-the-Art Human-Like Chatbot, Now Open Source
- A state-of-the-art open source chatbot
- Blender Bot 2.0: An open source chatbot that builds long-term memory and searches the internet
  - 페이스북 Blenderbot 2.0 공개 | GeekNews
- Facebook Open-Sources BlenderBot 2.0 Chatbot
dialogpt-chat: Chatting with DialoGPT (Large-Scale Generative Pre-training for Conversational Response Generation)
- P.1 Chatbot with Mic input/Speaker output using Python, Jarvis, and DialoGPT - YouTube
- P.2 Chatbot with Mic input/Speaker output using Python, Jarvis, and DialoGPT - YouTube
- Microsoft Releases DialogGPT AI Conversation Model
kochat: Opensource Korean chatbot framework based on deep learning
openchat: Opensource chatting framework for generative models
- AI 모델 탐험기 #2 챗봇이 뭐지? NLP 기술을 활용한 Open chat | by AI Network | AI Network_KR | Apr, 2021 | Medium
Parrot: A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines
- To build a chatbot you need data for your intent classification. But what if you have too little? Paraphrasing is one option for augmentation. But what is a good paraphrase?
- Almost all conditioned text generation models are validated on 2 factors:
  1. If the generated text conveys the same meaning as the original context (Adequacy)
  2. If the text is fluent / grammatically correct english (Fluency)
- For instance Neural Machine Translation outputs are tested for Adequacy and Fluency
- But a good paraphrase should be adequate and fluent while being as different as possible on the surface lexical form. With respect to this definition, the 3 key metrics that measures the quality of paraphrases are:
  1. Adequacy: Is the meaning preserved adequately?
  2. Fluency: Is the paraphrase fluent English?
  3. Diversity: Lexical / Phrasal / Syntactical → how much has the paraphrase changed the original sentence?
RASA - Create assistants that go beyond basic FAQs
- Building a chatbot with Rasa
- Building a Conversational Chatbot for Slack using Rasa and Python -Part 1
- How to build a voice assistant with open source Rasa and Mozilla tools
- Rasa youtube channel
- GPT-3 vs. Rasa chatbots. Comparing the performance of GPT-3 and… | by Mark Ryan | Aug, 2020 | Towards Data Science
- Building your first chatbot in Python - Rachael Tatman | PyData Jeddah - YouTube
TextFeatureSelection

Classification

Bag-of-words model
Implementing a CNN for Text Classification in TensorFlow
- Convolutional Neural Network for Text Classification in Tensorflow
- IMPLEMENTING A CNN FOR TEXT CLASSIFICATION IN TENSORFLOW (한글 번역)
- CNNs for sentence classification
- 합성곱 신경망(CNN) 딥러닝을 이용한 한국어 문장 분류
- MIT 6.S191 Lecture 2: Sequence Modeling with Neural Networks
Free Code Friday - Better and Faster Machine Learning Classifiers in Python
Time series classification
“What is Relevant in a Text Document?”
- 예를 들어, 카테고리가 있는 뉴스문서 학습데이터가 있는 경우 문서를 분류하는 분류기를 만들 때
- 문서에서 어떤 단어가 어떤 클래스로 분류하는데 얼만큼의 영향이 있었는지 역으로 추적하기가 쉽지 않음(Maximum Entropy 같은 걸 사용하는 것이 아니라면)
- 이를 역으로 추적하는 방법에 대한 논문
Text Classification using Neural Networks
Text Classification using Algorithms
Text Classifier Algorithms in Machine Learning
Tensorflow Text Classification – Python Deep Learning
lime
On Building a “Fake News” Classification Model *update
scalawox fakenews
Automated Text Classification Using Machine Learning
TRAIN ONCE, TEST ANYWHERE: ZERO-SHOT LEARNING FOR TEXT CLASSIFICATION
- Zero Shot Learning : 학습 데이터없이 텍스트 분류 모델 만들기
  - Zero Shot Learning은 학습을 하지 않고 데이터세트의 구성원을 추론할 수 있는 방법
  - 대부분 하나의 데이터 세트에서 습득한 지식을 다른 학습 세트에 적용 할 수 있는 일부 형태의 transfer learning에 의해 성취됩니다
- 지금까지 imagenet 데이터세트의 지식을 새로운 것에 사용할 수 있는 비전 작업을 위해 여러 개의 Zero Shot Learning 방법을 제안했지만 텍스트 분류를 위한 건 최초
  - 큰 노이즈의 데이터세트에서 문장과 해당 범주 간의 관계를 학습하여 새로운 범주 또는 새 데이터세트로 일반화
- TRY OUR CUSTOM CLASSIFIER DEMO
Alisa Dammer - Baby steps in short-text classification with python
Actionable and Political Text Classification Using Word Embeddings and LSTM
Pycon Ireland 2017: Text Classification with Word Vectors & Recurrent Neural Networks - Shane Lynn
Machine Learning - Text Classification with Python, nltk, Scikit & Pandas
Introduction to Natural Language Processing with Python - Asyncjs
Patrick Harrison | Modern NLP in Python
Advanced Python 2: Advanced Text Processing
Creating a simple text classifier using Google CoLaboratory Google CoLaboratory 환경에서 Scikit Learn를 사용하여 간단한 2진 텍스트 분류자를 만드는 방법
Text Classification with TensorFlow Estimators
Multi-Class Text Classification with Scikit-Learn
Multi Label Text Classification with Scikit-Learn
Recurrent Neural Network for Text Calssification
Introducing state of the art text classification with universal language models
Evaluating Classifiers: Confusion Matrix for Multiple Classes
The last 3 years in Text Classification
Automated Text Classification Using Machine Learning
CNN으로 문장 분류하기
Introducing Custom Classifier – Build Your Own Text Classification Model Without Any Training Data
Practical Text Classification With Python and Keras
Multi-Class Text Classification with SKlearn and NLTK in python| A Software Engineering Use Case
Tutorial on Text Classification (NLP) using ULMFiT and fastai Library in Python
Deep Transfer Learning for Natural Language Processing — Text Classification with Universal Embeddings
Democratizing NLP content modeling with transfer learning using GPUs - Sanghamitra Deb
The State of Transfer Learning in NLP
Using Transfer Learning for NLP with Small Data
- 이미지 분류 작업의 경우 Transfer Learning은 레이블이 적은 데이터세트로 우수한 정확도를 제공하여 매우 효과적인 것으로 입증
- Transfer Learning은 한 데이터세트에서 다른 데이터세트로 학습된 지식을 전송할 수 있는 기술
- Transfer Learning을 텍스트 분류에 사용하기 쉽게 만드는 이 프로젝트를 통해 단지 500개의 IMDB 영화 리뷰 데이터세트로 83%의 분류 정확도 달성
Adapters: A Compact and Extensible Transfer Learning Method for NLP
A Light Introduction to Transfer Learning for NLP | by Elvis | dair.ai | Medium
Develop a NLP Model in Python & Deploy It with Flask, Step by Step - Flask API, Document Classification, Spam Filter
NLP Classification Tutorial with PyTorch CBOW, CNN, DCNN, RNN, LSTM
Practical Text Classification With Python and Keras
Multi-Class Text Classification Using PySpark, MLlib & Doc2Vec
Intro to Text classification through tensorflow in Python
Using Doc2Vec to classify movie reviews
A Basic NLP Tutorial for News Multiclass Categorization
- Natural Language Processing, Support Vector Machine, TF- IDF, deep learning, Spacy, Attention LSTM
- 헤드 라인과 간단한 설명을 기반으로 뉴스 유형을 식별하여 Python에서 텍스트 데이터의 멀티 클래스 분류 방법을 이해
NLP 튜토리얼: 라벨링 없이 트위터 유저들을 자동으로 나누어보기
소설 작가 분류 AI 경진대회
- Baseline + 1D CNN - DACON
- Baseline + Bidirectional LSTM - DACON
How-to Build a Transformer for Language Classification in TensorFlow
TextFeatureSelection: Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models

Clustering

dbscan
Finding Topics in Harry Potter using K-Means Clustering
언론사가 알아야 할 알고리즘
- ① k-means 클러스터링
- ② 협업 필터링 추천
Comparing different clustering algorithms on toy datasets
Density-Based Clustering
Text Clustering : Get quick insights from Unstructured Data 1
Text Clustering : Get quick insights from Unstructured Data 2
14 Great Articles and Tutorials on Clustering
The 5 Clustering Algorithms Data Scientists Need to Know
Understanding Hate Speech on Reddit through Text Clustering

Conference

JSALT 2019 Montréal: Dive into Deep Learning for Natural Language Processing
LangCon
- 발표소개 | LangCon
- 2020Langcon - YouTube
텐서플로 월드2019 행사 핵심요약 2. NLP가 대세입니다!
이 선 넘으면 침범이야 BEEP! - 문지형 - PyCon Korea 2020 - YouTube
Smart Use of Legal NLP | Dr. Benjamin Werthmann, RAILS PyData Südwest / Big Data BBQ - YouTube
Highly-Scalable NLP to Answer Questions on COVID-19 WhatsApp Hotline | PyData Global 2021 - YouTube
Natural Language Processing: Trends, Challenges and Opportunities | PyData Global 2021 - YouTube

Corpus

CORPORA AND OTHER LANGUAGE AND SPEECH DATA UNDER DICE
UTagger + KorpuSQL을 이용해서 코퍼스 구축하기
KorpuSQL 클릭만으로 간편하게 코퍼스 구축하기
PHP, MySQL 코퍼스를 통해 관련어 추출
인공지능 씨앗 한글 말뭉치, 2007년 멈춰선 까닭
④ 송철의 국립국어원장 "한국어 AI 시대의 기초는 말뭉치..제2의 세종계획 추진해야"
언제까지 포털 영어사전만 쓸 건가요? – 말뭉치(코퍼스)를 활용한 영어 글쓰기 기초 편
형태소 분석기와 Branching entropy를 활용한 비지도 신조어 탐색 – Ukjae Jeong corpus는 아니지만, 신조어 찾는 방법에 대한 이야기라 연관이 있음
Facebook, NYU expand available languages for natural language understanding systems
TextNet Linguist가 수행하는 대화자원구축 service
개체명 인식용 말뭉치
국어사전 데이터
표준국어대사전.csv
모두의 말뭉치
NIA(National Information Society Agency) Dictionary
Korean Parallel corpora (of https://sites.google.com/site/koreanparalleldata/)
koSentences - a large-scale web corpus of Korean text

Course MOOC Lecture

언어와 컴퓨터 (100.130)
- lecture/2021/LC at master · suzisuti/lecture · GitHub
자연어처리 특강 - YouTube
Donkuk_AI_NLP_MachineTranslation - 동국대학교 영어영문학부 대상으로 진행하는 인공지능, 자연언어처리, 기계번역 강의자료
A Primer on Neural Network Models for Natural Language Processing
List of free resources to learn Natural Language Processing
Learn Natural Language Processing
9 Best Tensorflow Courses & Certifications Online- Discover the Best One!
DeepMind x UCL | Deep Learning Lectures | 7/12 | Deep Learning for Natural Language Processing - YouTube
NLP Course | For You word embeddings, text classification, language modeling, seq2seq and attention
Best Natural Language Processing Courses Online in 2021-UPDATED
11 Best Natural Language Processing Courses Online- Bestseller in 2021
Computational Linguistics
CS224d: Deep Learning for Natural Language Processing
- DSBA CS224d
- CS224d 2017 video subtitles translation project for everyone
CS224n: Natural Language Processing with Deep Learning
- cs224n-winter17-notes
- CS 224N: TensorFlow Tutorial
- Lecture Collection | Natural Language Processing with Deep Learning (Winter 2017)
- CS224n: Natural Language Processing with Deep Learning | Winter 2019
CS224U: Natural Language Understanding
- Distributional word representations
Deep Learning for NLP
Deep Learning for Natural Language Processing: 2016-2017
- Oxford Deep NLP 2017 course
- Lecture 8 - Generating Language with Attention Chris Dyer
CS4650 and CS7650 ("Natural Language") at Georgia Tech
CS 447: Natural Language Processing
CS 20SI: Tensorflow for Deep Learning Research
YSDA Natural Language Processing course
- NLP_COURSE: A Deep Learning YSDA Natural Language Processing Course By GitHub

Data

Justin J. Nguyen: Exposing Dark Data in the enterprise with custom NLP | PyData Miami 2019
handwritten Hangul Datasets: PE92, SERI95, and HanDB
Building A Gigaword Corpus Lessons on Data Ingestion, Management, and Processing for NLP
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
Extracting Structured Data From Recipes Using Conditional Random Fields
The Big Bad NLP Database - Quantum Stat
Mimesis - a package for Python, which helps generate big volumes of fake data for a variety of purposes in a variety of languages
- 기본적으로 Random 기반, 생성보다 Possible 한 목록이 기록된 json에서 내용 추출. 해쉬같이 완전히 무작위로 만드는 것도 존재
- API 설계를 쉽게 해 둬서, 자신만의 Generator 생성 및 기존의 Generator와 혼합 가능
- 1. local 특성에 따라 달라지는것; Address, Business, Datetime, Food, Person, Science, Text
- 1. local 특성과는 무관한것; Clothing, Code, Choice, Cryptographic, Development, File, Hardware, Internet, Numbers, Path, Structure, Transport, UnitSystem
- 1. 나만의 Generator는 BaseDataProvider 같은 것을 상속받아 작성
The Pile

Disambiguation

Automatic disambiguation of English puns
Discovering Types for Entity Disambiguation

Doc2Vec

REDDIT 2 VEC - Use Doc2Vec to get SubReddit Suggestions

Filtering

집단지성프로그래밍 ch6. 문서 필터링

Knowledge

국가생물종지식정보시스템
:BaseKB Gold Ultimate is now available in AWS
- :BaseKB Gold Ultimate
- :BaseKB Gold Ultimate
Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources

Language Model LM

언어 모델링은 음성-텍스트, 대화식 시스템, 텍스트 요약과 같은 여러 가지 자연어 처리 작업에 핵심적인 문제
Text Generation
- 텍스트 생성은 언어 모델링 문제의 유형
- 잘 학습된 언어 모델은 텍스트에서 사용된 단어의 이전 순서를 기반으로 단어의 발생 가능성을 학습
- 언어 모델은 문자 수준, n-gram 수준, 문장 수준 또는 단락 수준에서 조작 가능
WHAT EVERY NLP ENGINEER NEEDS TO KNOW ABOUT PRE-TRAINED LANGUAGE MODELS
Language modeling a billion words
확률론적 언어 모형
Perplexed by Game of Thrones. A Song of N-Grams and Language Models
Character-Aware Neural Language Models
- Character-Aware Neural Language Models
- CNN과 Highway Network를 사용 (입력은 LSTM)해서 State-of-Art의 성과
- 기존보다 크게 감소된 Parameter로 높은 성능을 내어, 휴대폰과 같은 Model Size가 중요한 영향을 미치는 곳에 적합
- Word Embedding 시 형태소 tagging 필요하지 않음
- 형태소 정보들이 많은 언어에서 기존보다 높은 성능 (언어 종속성 낮음)
14. 텐서플로우(TensorFlow)를 이용해서 언어 모델(Language Model) 만들기 – Recurrent Neural Networks(RNNs) 예제 2 – PTB(Penn Tree Bank) 데이터셋
How to Develop a Word Embedding Model for Predicting Movie Review Sentiment keras, word2vec
MUSE: Multilingual Unsupervised and Supervised Embeddings
Dynamic Meta Embeddings DME
LSTM and QRNN Language Model Toolkit
Generating Drake Rap Lyrics using Language Models and LSTMs
Recurrent Neural Networks: The Powerhouse of Language Modeling
Character-Aware Neural Language Models
Language Models are Open Knowledge Graphs .. but are hard to mine! | by Nikhil Dharap | Jan, 2021 | Towards Data Science
Large-scale LM에 대한 얕고 넓은 지식들(part 1) - YouTube
- season2/advanced at main · jiphyeonjeon/season2
Large-scale LM에 대한 얕고 넓은 지식들(part 2) - YouTube
Beauty Domain-Specific Pre-trained Language Model 개발하기 –
What Have Language Models Learned?
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model | NVIDIA Developer Blog
- MS와 Nvidia가 세계 최대규모 언어 모델 MT-NLG 530B를 발표 | GeekNews
Do large language models understand us? | by Blaise Aguera y Arcas | Dec, 2021 | Medium
Ecco - Look Inside Language Models
- Jay Alammar - Take A Look Inside Language Models With Ecco | PyData Khobar - YouTube
GSLM
- [텍스트 없는 자연어처리?... 음성 인공지능 NLP 시대 열어, 페이스북 AI ‘생성적 화자 언어 모델’ 오픈 소스로 공개](https://www.aitimes.kr/news/articleView.html?idxno=22445
- Textless NLP: Generating expressive speech from raw audio
KLUE Benchmark
- KLUE-benchmark/KLUE: 📖 Korean NLU Benchmark
- klue-transformers-tutorial: KLUE 데이터를 활용한 HuggingFace Transformers 튜토리얼
- KLUE 한국어 데이터 셋 | GeekNews
- NIKL-KLUE: 모두의 말뭉치 인공 지능 언어 능력 평가 1등 솔루션입니다
KoBigBird: 🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)
LAMA: LAnguage Model Analysis
lassl: Easy framework for pre-training language models
lbox-open
- LBox Open: 한국어 AI Benchmark Dataset
Legal-BERT, 법률 도메인에 특화된 언어모델 개발기
LM-kor: Pretrained Language Models for Korean
PLMpapers
SNgramExtractor: Python package code repo for Implementation of syntactic n-grams (sn-gram) extraction
- 의존성 파스 트리 구조를 사용하여 syntactic 합성 n-gram 추출
tunib-electra: Korean-English Bilingual Electra Models
WellnessConversationAI: Korean Language Model을 이용한 심리상담 대화 언어 모델

Language Model LM GPT

GPT 한글판 - YouTube
OpenAI GPT-2: Understanding Language Generation through Visualization
- Better Language Models and Their Implications GPT-2 based artificial news
- GPT-2 Playground
The Way you Write Code Is About to Change: Join the Waiting List | by Dimitris Poulopoulos | Towards Data Science OpenAI
The Illustrated GPT-2 (Visualizing Transformer Language Models)
OpenGPT-2: We Replicated GPT-2 Because You Can Too
Fine-Tuning GPT-2 from Human Preferences
Algpt2 Part 2 | Bilal Khan
KoGPT2 - Korean GPT-2 pretrained cased (KoGPT2)
KoGPT2ForParaphrasing
Too big to deploy: How GPT-2 is breaking servers
The Annotated GPT-2
KorGPT2Tutorial: Tutorial for pretraining Korean GPT-2 model
KoGPT2-chatbot: Simple Chit-Chat based on KoGPT2
Does GPT-2 Know Your Phone Number? – The Berkeley Artificial Intelligence Research Blog
This Code Does Not Exist GPT2를 이용한 코드 생성
자연어 인공지능 모델 해킹하기 | GeekNews GPT2 대상 공격
awesome-gpt3
How GPT3 Works - Visualizations and Animations – Jay Alammar – Visualizing machine learning one concept at a time
What is GPT-3? Showcase, possibilities, and implications - YouTube
GPT-3가 뭐길래, 제2의 알파고? - YouTube
GPT-3, 인류 역사상 가장 뛰어난 언어 AI – 핑퐁팀 블로그
OpenAI GPT-3 - Good At Almost Everything! 🤖 - YouTube
Can GPT-3 Make Analogies?. By Melanie Mitchell | by Melanie Mitchell | Aug, 2020 | Medium
GPT-3의 다섯 가지 한계 – 핑퐁팀 블로그
GPT-3 paper를 읽고 써보는 간략한 리뷰, Language Models are Few-Shot Learners
대화형 인공지능(GPT-3) 한방에 이해하기 feat. 솔트룩스 이경일 대표 - YouTube
The First Wave of GPT-3 Enabled Applications Offer a Preview of Our AI Future
영상 초보자도 쉽게 GPT-3를 사용해 혼자서 GPT-3 모델을 구현한다 - 인공지능신문
과연 GPT-3는 얼마나 똑똑한 걸까? – 핑퐁팀 블로그
Can GPT-3 Make Analogies?. By Melanie Mitchell | by Melanie Mitchell | Medium
GPT-3은 얼마내고 써야할까요? 계산 과정이 재미있음
구독자 GPT-3는 우리 중에 최약체지
GPT-3 is not That Smart. With a Reason | LinkedIn
GPT-3 is No Longer the Only Game in Town - Last Week in AI
OpenAI’s API Now Available with No Waitlist
- GPT-3 악용을 막기 위해 그동안 API의 사용이 허가된 사용자만 사용 가능했으나 이제 보호장치를 만들어 지원국가의 사용자는 가입만 하면 GPT-3 API 사용 가능. API는 콘텐츠 가이드라인하에서만 사용
GPT-3 공식문서번역 1. Get started - Introduction — BetaMan의 공사장
GPT-3 공식문서번역 2. Get started - Developer quickstart — BetaMan의 공사장
GPT-3 공식문서번역 3. Get started - Engines — BetaMan의 공사장
GPT-3 공식문서번역 4. Get started - Going live — BetaMan의 공사장
GPT-3 공식문서번역 5. Get started - Usage guidelines — BetaMan의 공사장
AI Can Write in English. Now It's Learning Other Languages | WIRED GPT3
Can’t Access GPT-3? Here’s GPT-J — Its Open-Source Cousin | by Alberto Romero | Towards Data Science
HyperCLOVA 서빙 프레임워크 선정 | CLOVA Engineering Blog
OpenAI Codex 공개 및 파이썬 퍼즐 챌린지 예정 | GeekNews
gpt-neo: An implementation of model parallel GPT2& GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library
21년 2월 2주 - from future import dreamfrom future import dream
minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training 일종의 교육용 GPT
꿀벌개발일지 :: 뉴스: GPT-Neo 를 개발하고 있다는 소식
AI x Bookathon｜인공지능을 수필 쓰는 작가로 학습시켜보자! GPT, hdf5, scrapy, selenium
- Ai bookathon public
Google AI Blog: Advancing NLP with Efficient Projection-Based Model Architectures GPT-3와 반대로 적은 parameter로 만드는 model에 대한 이야기
Goopt: 🔍 Search Engine for a Procedural Simulation of the Web with GPT-3
kogpt: KakaoBrain KoGPT (Korean Generative Pre-trained Transformer) GPT-3 기반의 한국어 특화 AI 언어 모델
- if(kakao) 2021
- kogpt at web-app
mesh-transformer-jax: Model parallel transformers in JAX and Haiku
- Checking out a 6-Billion parameter GPT model, GPT-J, from Eleuther AI - YouTube

LDA Latent Dirichlet Allocation

Latent Dirichlet Allocation
Yes24 책 추천 알고리즘, 어떻게 구현했나
Latent Dirichlet Allocation (LDA) with Python
Latent Dirichlet Allocation, LDA
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
Spectral LDA on Spark
LDA in Python – How to grid search best topic models?
- Scikit Learn은 Latent Dirichlet allocation(LDA), LSI, Non-Negative Matrix Factorization과 같은 알고리즘을 사용하여 주제 모델링을 위한 편리한 인터페이스를 제공
- 이 튜토리얼에서는 최상의 LDA 토픽 모델을 작성하고 결과를 의미있는 결과로 보여주는 방법
Language Modelling and Text Generation using LSTMs — Deep Learning for NLP
- 최첨단의 RNN을 구현하고 학습하여 자연어 텍스트를 생성하는 언어 모델을 만드는 방법을 설명
- 이 모델의 목적은 일부 입력 텍스트가 있는 경우 새 텍스트를 생성
Topic Modeling and Latent Dirichlet Allocation (LDA) in Python
The Hottest Topics In Machine Learning - Analyzing machine learning trends in research

Library

Free Term Extractors
Hugging Face "Tokenizers"와 PyTorch "Captum" 라이브러리 사용기 ipynb
- huggingface.co/nlp/viewer
- ML (Huggingface transformers) coding tips from Yannic Kilcher
- A small timing experiment on the new Tokenizers library — a write-up
- Beyond Classification With Transformers and Hugging Face | by Nikhil Dharap | Towards Data Science
- huggingface를 이용한 한국어 BART 학습 후기
- transformers에 모델 기여하기 | LASSL
- koclip: KoCLIP: Korean port of OpenAI CLIP, in Flax
  - 2021년 1월 OpenAI 가 공개한 CLIP(Contrastive Language–Image Pre-training)은 자연어와 이미지를 동시에 학습하는 멀티모달 모델로 ImageNet등의 태스크에서 기존 모델을 상회하는 정확도와 범용성
  - KoCLIP은 오픈소스로 공개 되는 첫 한국어 멀티모달 인공지능으로 AIHub의 한국어-이미지 캡션 데이터셋을 사용해 학습
  - Flax-Community-Week에서 제공한 TPU3-v8 VM 을 활용해 KoCLIP-Base와 KoCLIP-Large, 총 두 버전의 모델 제작
  - KoCLIP-Base는 텍스트 인코더로 klue/roberta-large와 이미지 인코더로 openai/clip-vit-base-patch32 를, KoCLIP-Large는 같은 텍스트 인코더와 google/vit-large-patch16-224를 이미지 인코더로 사용
  - KoCLIP은 다양한 방향으로 활용 가능성을 품고 있으며 다음 세 가지 기능이 Streamlit을 통해 배포
  - Text2Image: 텍스트 쿼리를 입력하면 사전에 저장해둔 사진들 중 쿼리와 가장 유사도 높은 사진 반환
  - Image2Text: Zero-Shot Classifier의 한 종류로 사진과 다수의 레이블을 입력하면 여러 레이블중 사진과 가장 어울리는 것을 반환
  - Text2Patch: 역시 Zero-Shot Classifier의 한 갈래로 사진과 텍스트 쿼리가 입력되면 텍스트와 가장 연관성이 깊은 사진 패치 반환
- nlp_tutorials: huggingface를 이용하여 downstream task 수행하기
- optimum
  - Introducing Optimum: The Optimization Toolkit for Transformers at Scale
  - Exporting 🤗 Transformers Models
- parallelformers: Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
3 Natural Language Processing Tools From AWS to Python | by SeattleDataGuy | Better Programming | Oct, 2020 | Medium
- Amazon Comprehend - Natural Language Processing (NLP) and Machine Learning (ML)
- Cloud Natural Language | Google Cloud
- TextBlob: Simplified Text Processing
  - How to Perform Emotion detection in Text via Python | Hacker Noon
Open Source Natural Language Processing Libraries To Get You Started
꼬꼬마 프로젝트!
날개셋
- 다음 버전 개발 근황
오픈 한글
은전한닢 프로젝트 - 검색에서 쓸만한 오픈소스 한국어 형태소 분석기를 만들자!
- elasticsearch-analysis-seunjeon 5.0.0.0 배포합니다
academictorrents.com
Adapt Intent Parser - an open source software library for converting natural language into machine readable data structures
AllenNLP - An open-source NLP research library, built on PyTorch
- An open-source NLP research library, built on PyTorch
  - crf
Autosub - Command-line utility for auto-generating subtitles for any video file
Babelpish.github.io
CLaF: Clova Language Framework https://naver.github.io/claf
Compact Language Detector 2
ConceptNet - a multilingual knowledge base, representing words and phrases that people use and the common-sense relationships between them
coreferee: Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages
Cubism
- SF Scala: Enhancing Spark's Power with ZIO, Qubism and NLP at Scale, Using Nix for Haskell
Daon 형태소 분석기
decaNLP - The Natural Language Decathlon: A Multitask Challenge for NLP
- The Natural Language Decathlon
fastT5: ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x
fastText is a library for efficient learning of word representations and sentence classification
- C++, 추가적인 의존 라이브러리 없음
- Deep Learning 기반의 분류기와 정확도는 비슷하면서도 속도가 빠름
- multi-core CPU 상에서 10억개 이상의 단어를 10분 내로 학습하고, 50만개의 문장을 1분안에 312k개의 클래스로 분류 가능
- Bag of Tricks for Efficient Text Classification
  - our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation
  - We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.
- Enriching Word Vectors with Subword Information
- Facebook’s Artificial Intelligence Research lab releases open source fastText on GitHub
- Introduction to Natural Language Processing with fastText
- FastText.zip: Compressing text classification models
- Pre-trained word vectors
- Aligning the fastText vectors of 78 languages
- Introduction to Natural Language Processing with fastText
- FastText Tutorial - How to Classify Text with FastText
- 한국어를 위한 어휘 임베딩의 개발 -1- -1-
- 한국어를 위한 어휘 임베딩의 개발 -2- -2-
- FastText, 실전 사용하기
- 글쓰기 화면에서 카테고리 자동 추천하는 모델 만들기
- FastText Pre-trained 한국어 모델 사용기 – Inah Jeon – Inah Jeon's personal blog
- fastText4j - Java port of C++ version of Facebook Research fastText
- fastText_doc2vec
- fastText for Korean
- fasttext.js: FastText for Node.js
  - FastText for Node.js
- models.fasttext – FastText model gensim example
- Production Machine Learning Pipeline for Text Classification with fastText
  - Running fastText in Valohai
- pyfasttext
- scikit-learn wrappers for Python fastText
- SwiftFastText - Swift wrapper for the Facebook FastText Library for efficient text classification and representation learning
GluonNLP: NLP made easy
- Attention API로 간단히 어텐션 사용하기 gluonNLP
go-freeling - Golang Natural Language Processing
graph4nlp: Graph4nlp is the library for the easy use of Graph Neural Networks for NLP
hangul-toolkit - 한글 자모 분해, 조합(오토마타), 조사 붙이기, 초/중/종 분해조합, 한글/한자/영문 여부 체크 등을 지원
InferSent - semantic sentence 표현을 제공하는 sentence embedding 방법
Jarvis Introducing NVIDIA Jarvis: A Framework for GPU-Accelerated Conversational AI Applications
JoSH: KDD 2020 Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
kakaotalk_msg_preprocessor: 카카오톡 채팅방에서 대화 내보내기를 통해 생성한 파일을 전처리해주는 라이브러리입니다
Kanji recognition - implementation of Nei Kato's directional feature extraction algorithm
KETI/KE-T5-Vision
khaiii
- 카카오의 딥러닝 기반 형태소 분석기
- kakao의 오픈소스 Ep9 - Khaiii : 카카오의 딥러닝 기반 형태소 분석기
- 카카오 형태소 분석기(khaiii) 설치와 은전한닢(mecab) 형태소 분석기 비교
- 카카오 형태소 분석기(khaiii) 분석 시간 및 딥러닝 모델 성능 비교
- 한국어 형태소 분석기 성능 비교
- 임재수 khaiii(카카오 형태소 분석기)
Kiwi - 지능형 한국어 형태소 분석기(Korean Intelligent Word Identifier)
- 좋아, 형태소 분석기를 만들어봅시다. - 0
- 좋아, 형태소 분석기를 만들어봅시다. - 1
- 좋아, 형태소 분석기를 만들어봅시다. - 2
- 좋아, 형태소 분석기를 만들어봅시다. - 3
- 지능형 한국어 형태소 분석기 ver 0.2
- 지능형 한국어 형태소 분석기 ver 0.3 - 알고리즘 최적화 & 메모리 풀
- 지능형 한국어 형태소 분석기 0.4버전 업데이트
- kiwigo: https://github.com/bab2min/Kiwi for go
  - kiwigo - 한글 형태소 분석기인 kiwi의 go binding | GeekNews
- kiwipiepy: Python API for Kiwi
  - Kiwi로 한국어 문장 분리하기
knwl - A Javascript Natural Language Parser
KoalaNLP = Korean + Scala + NLP. 한국어 형태소 및 구문 분석기의 모음입니다
KoParadigm: Korean Inflectional Paradigm Generator
- paradigm은 용언 활용 테이블을 뜻하는 언어학 용어. 예를 들어, 영어의 go는 go, went, going, goes 등과 같이 어형이 변화
- 한국어는 그 변화양상이 복잡. 동사/어미의 종류와 소리에 따라 규칙이 복잡. 그 규칙들을 테이블로 정리해 공개
Korpora: Korean corpus repository
KorpuSQL
- 웹용 KorpuSQL 실행기
Koshort - Koshort은 한국어 NLP를 위한 high-level API 프로젝트입니다
LASER - Zero-shot transfer across 93 languages: Open-sourcing enhanced LASER library
lit: The Language Interpretability Tool: Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface
live caption 구글 크롬, Live Caption 기능 공개 | GeekNews
madlibs: Generates random strings with random verbs, nouns, and adjectives
Mecab
- Taku Kudo - Mecab developer
- mecab-ko 윈도우에서 빌드하기
- 윈도우 python3.X mecab 설치 간단~
- Google Colab에서 Mecab-ko-dic 쉽게 사용하기
- mecab-bind: Binding MeCab Tagger to Python3 and TensorFlow
  - mecab-ko-dic-prebuilt: 미리 빌드되어 있는 mecab-ko-dic
- natto-py - combines the Python programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language
- python-mecab - A repository to bind mecab for Python 3.5+. Not using swig nor pybind. https://pypi.org/project/python-mecab
Memory Networks
mesh-transformer-jax: Model parallel transformers in JAX and Haiku
mit-nlp
name2nat: a Python package for nationality prediction from a name
NGT - Neighborhood Graph and Tree for Indexing High-dimensional Data
- word embeddings와 같은 고차원 데이터에서 k nearest item을 근사적으로 빠르게 찾는 라이브러리
- annoy와 비슷하지만 graph tree 기반 indexing
nlg-eval - Evaluation code for various unsupervised automated metrics for Natural Language Generation
nori-clone: Standalone Nori (Korean Morphological Analyzer)
parserator - a framework for making parsers using natural language processing (NLP) methods
pattern: Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization
pecab: Pure python mecab analyzer for Japanese and Korean
Pororo: A Deep Learning based Multilingual Natural Language Processing Library
- Welcome to Pororo’s documentation! — Pororo 0.1.2 documentation
- Pororo 출시기념 이메일 요약기 - YouTube
  - gmail-summary.ipynb - Colaboratory
  - gmail-summary.ipynb - Colaboratory
- stock-news-summary: 관심 주식 종목의 뉴스의 요약본을 메일로 전달하는 모듈
  - NLP 주식 뉴스 요약 메일링 프로그램 - 러닝머신의 Train Data Set
Pragmatic Segmenter - a rule-based sentence boundary detection gem that works out-of-the-box across many languages
python-nori: Pynori - Lucene Nori, Korean Mopological Analyzer, in Python
PyText - a deep-learning based NLP modeling framework built on PyTorch
- PyText - A natural language modeling framework based on PyTorch https://fb.me/pytextdocs
- Open-sourcing PyText for faster NLP development
- 페이스북, 자연어 처리 프로젝트를 오픈소스로 전환
- Introducing PyText - Facebook’s New Framework for Better NLP Development
quepy - A python framework to transform natural language questions to queries in a database query language
recsys-nlp-graph: 🛒 Simple recommender with matrix factorization, graph, and NLP
- 그래프 & 자연어처리 기법으로 추천 시스템 개발하기 - pytorch - 러닝머신의 Train Data Set
Rouzeta - 유한 상태 기반의 한국어 형태소 분석기
- 유한 상태 기반의 한국어 형태소 분석기
- 유한 상태 기반의 한국어 형태소 분석기
SentencePiece - an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training
- SentencePiece 알고리즘
Simplenlg - a simple Java API designed to facilitate the generation of Natural Language
spark-nlp: State of the Art Natural Language Processing
- Advanced Natural Language Processing with Apache Spark NLP - YouTube
- Scale By The Bay 2020: David Talby, State of the art natural language understanding at scale - YouTube
SPARTA: Semantic Parsing And Relational Table Aware Model that generates SQL from question written in Korean language
- Text2SQL 한국어 데이터 테스트 - YouTube
Stanford Natural Language Processing Group
- corenlp
- Stanford CoreNLP – a suite of core NLP tools
- nlp.stanford.edu/teaching
- StanfordNLP: A Python NLP Library for Many Human Languages
- Stanford CS224U: Natural Language Understanding | Spring 2019
Stanza - A Python NLP Library for Many Human Languages
StarSpace - Learning embeddings for classification, retrieval and ranking
- Embed All The Things
The Super Duper NLP Repo
text-to-text-transfer-transformer: Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
- kolang-t5-base: T5-base model for Korean
tacit - Text Analysis,Collection and Interpretation Tool
Text Understanding from Scratch
TextWorld: A learning environment for training reinforcement learning agents, inspired by text-based games
teachable-nlp Ainize | Launchpad for open-source AI projects
- AI 모델 탐험기 #3 모델 Fine-Tuning(feat. Teachable NLP) | by AI Network | AI Network_KR | Apr, 2021 | Medium

Library Java

한글 받침에따라서 '을/를' 구분하기
Autocomplete words with spring boot and redis 자동완성
KLAY - Korean Language AnalYzer (한국어 형태소 분석기)
lucene-Korean-Analyzer Lucene Analyzer For Korean
- 03. Solr 5.0.0 - 아리랑(arirang) 한글 형태소 분석기 적용
VWL 텍스트 분석기 0.9

Library JavaScript

TajaJS is a simple Hangul library in JavaScript

Library Python

13 Deep Learning Frameworks for Natural Language Processing in Python
자연어 처리(NLP)가 필요하다면?··· 추천 파이썬 라이브러리 8종 - CIO Korea CoreNLP Gensim NLTK Pattern Polyglot PyNLPI SpaCy TextBlob
Annoy (Approximate Nearest Neighbors Oh Yeah) - a C++ library with Python bindings to search for points in space that are close to a given query point
- Approximate nearest neighbor methods and vector models – NYC ML meetup
- Approximate Nearest Neighbors
Document Clustering with Python
ecco: Visuals and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2)
Ekphrasis - a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction
Emoji for Python
- A Python script to check if a character is or a text contains emoji
flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)
- A very simple framework for state-of-the-art Natural Language Processing (NLP)
- Text Classification with State of the Art NLP Library — Flair - A new version of Flair - simple Python NLP library has just been released by Zalando Research!
- Tadej Magajna - State of the Art NLP with Flair
Hangulize - 외래어 자동 한글 변환 모듈
keystroke practice
Keyword finder: automatic keyword extraction from text
KoNLPy: Korean NLP in Python
- github.com/konlpy/konlpy
- 자바, 미안하다! 파이썬 한국어 NLP
- 자바, 미안하다! Korean NLP with Python
- word2vec을 하기 앞서 형태소 분석을 해보자
- Pycon2017 koreannlp
- customized KoNLPy
- MAC OSX에서 konlpy 설치 시 ImportError: No module named 'jpype' 오류 해결
- 파이썬과 커뮤니티와 한국어 오픈데이터
- 말뭉치를 이용한 한국어 용언 분석기 (Korean Lemmatizer)
- docker-ubuntu-konlpy
- KoNLPy-homi: Redesigned KoNLPy (Wrapper) for Usability and Portability with gRPC Using Homi
korean - A library for Korean morphology
- gist.github.com/allieus/0e8b609fe146ad63462ca81c70b2f5a2
ko_restoration - Module for restoring Korean text working with KomornaPy
- 파이썬(Python) 형태소 분석기를 활용한 한국어 원형 복원 분석기 설치 및 설정하기
Koshort - a Python project for Korean natural language processing... or maybe Korean domestic cat
- Goorm - A little word cloud generator in Python - Korean wrapper
krtpy - Korean Romanization/Hangulization utility written in python
kss - Korean Sentence Splitter
- Korean Sentence Splitter
- 한글 문장 분리기
- Kss: A Toolkit for Korean sentence segmentation
- kss-java: Korean Sentence Splitter
NLP Architect - an open-source Python library for exploring the state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding
NLTK
- book
- 한국어와 NLTK, Gensim의 만남
- NLP with NLTK – Part 1
- python_nltk
- github.com/zerosum99/python_nltk
- NLTK로 배우는 자연언어처리
- Tutorial 5: Analyzing text using Python NLTK
- NLTK Basic Text Analytics
- NLTK with Python 3 for Natural Language Processing
- 22 Python NLTK Corpus
- NLTK Text Processing Tutorial Series
- Computing Document Similarity with NLTK (March 2014)
- Tokenizing Words Sentences with Python NLTK
- Natural Language Processing (NLP) Tutorial with Python & NLTK
- tokenizeComplete NLTK Tokenizer Tutorial for Beginners | MLK - Machine Learning Knowledge
- TOKENIZE | NLTK | DATA CLEANING - YouTube
ParlAI (pronounced “par-lay”) - a framework for dialog AI research, implemented in Python
- ParlAI: A new software platform for dialog research
PreNLP - Preprocessing Library for Natural Language Processing
pyeunjeon (python + eunjeon) 은전한닢 프로젝트와 mecab 기반의 한국어 형태소 분석기의 독립형 python 인터페이스
pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box
PyStruct - Structured Learning in Python
Python-jamo is a Python Hangul syllable decomposition and synthesis library for working with Hangul characters and jamo
soynlp 단어 추출/ 토크나이저 / 품사판별/ 전처리 기능을 제공
spaCy - a library for industrial-strength natural language processing in Python and Cython
- NLP (SpaCy) 총 4개의 챕터, SpaCy 패키지 사용 방법
- spaCy Cheat Sheet: Advanced NLP in Python
- dependency parse tree visualization
- Dead Code Should be Buried
- spaCy: Industrial-strength NLP
- dependency parse tree visualization
- Neural coref - State-of-the-art coreference resolution based on neural nets and spaCy
  - 신경망과 spaCy를 이용한 coreference resolution library
  - State-of-the-art neural coreference resolution for chatbots
- NLP With Python: Build a Haiku Machine in 50 Lines Of Code | by Sean Zhai | Better Programming | Oct, 2020 | Medium
- yujuwon.tistory.com/m/tag/spaCy
- Machine Learning for Text Classification Using SpaCy in Python
- Korean support
- irl.spacy.io/2019
  - SPACY IRL 2019
- Advanced NLP with spaCy
- Natural Language in Python using spaCy: An Introduction
- Vincent Warmerdam - Playing by the Rules-Based-Systems | PyData Eindhoven 2020 - YouTube
- merge-idioms: Implementation of Spacy's NLP pipeline for merging idioms as standalone tokens 숙어는 tokenizing할 때 각 단어별로 나누지 않는 library
- Prodigy: A new tool for radically efficient machine teaching
- Introducing spaCy v3.1 · Explosion
- spaCyOpenTapioca · spaCy Universe
TextBlob Sentiment: Calculating Polarity and Subjectivity python
- Natural Language Basics with TextBlob
TextFeatureSelection · PyPI
- TextFeatureSelection — A Python package - Praveen Govindaraj - Medium
Text Generation With LSTM Recurrent Neural Networks in Python with Keras
textgenrnn - Python module to easily generate text using a pretrained character-based recurrent neural network
twitter_optimus_twint: Analyzing tweets with Twint, Optimus and Apache Spark
- Analyzing Tweets with NLP in minutes with Spark, Optimus and Twint | by Favio Vázquez | Towards Data Science
UTagger
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. http://hunch.net/~vw
- vwnlp - Solving NLP problems with Vowpal Wabbit: Tutorial and more

Library R

KoNLP - R package for Korean NLP http://cran.r-project.org/web/packages/KoNLP/index.html
KoSpacing - Automatic Korean word spacing with R
- KoSpacing : 한글 자동 띄어쓰기 패키지 공개
- Automatic Korean word spacing with neural n-gram detector(NND)

Library Scala

Open Korean Text Processor - An Open-source Korean Text Processor
twitter-korean-text - 트위터에서 만든 한국어 처리기

LSA

잠재 디리클레 할당
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge
Latent Semantic Variable Models
Word vectors using LSA, Part - 2
Sentence Embedding
숨은의미분석 LSA(Latent Semantic Analysis)

LSH

LSH (Locality sensitive hashing)

Named Entity

Named Entity Recognition: Examining the Stanford NER Tagger
한국어 개체명 인식 기술(Named Entity Recognition)
K-ICT 빅데이터센터
Entity extraction using Deep Learning
- 기사의 각 단어를 organisation, person, miscellaneous 및 other의 네가지 범주로 태그
- 그런 다음 기사에서 가장 두드러진 조직과 이름을 찾아 딥러닝 모델은 각 단어를 위의 4가지 범주로 분류
- 그런 다음 원치 않는 태깅을 필터링하고 가장 유명한 이름과 조직을 찾는 규칙 기반 접근 방식
Named Entity Recognition: Milestone Models, Papers and Technologies
Introduction to Named Entity Recognition
Named Entity Recognition (NER), Meeting Industry’s Requirement by Applying state-of-the-art Deep
Parsing XML, Named Entity Recognition in One-Shot - Conditional Random Fields, Sequence Prediction, Sequence Labelling
Named Entity Recognition with NLTK and SpaCy
Multilingual Named Entity Recognition: Research to Reality
etagger - reference tensorflow code for named entity recognition
GENRE: Autoregressive Entity Retrieval
NeuroNER - A Named-Entity Recognition Program based on Neural Networks and Easy to Use

News

“포털 야구 중계, 로봇 저널리즘이 대체 가능해“
- 이 기사는 로봇이 썼을까, 기자가 썼을까
③로봇, 저널리즘을 넘보다
마커, “뉴스, 다 읽지 마세요. 형광펜 처리된 중요한 부분만 보세요”
“수 없이 쏟아지는 읽을거리, 중요한 것만 밑줄 쳐 드립니다”, 마커 정철현 대표
‘뉴욕타임스’, 머신러닝 기반 자동 태그 시스템 개발
지난 26년간 언론에서 가장 중요한 정보원은 누구였을까?
세월호 참사 1년 동안의 언론보도를 통해 드러난 언론매체의 정치적 경도
세월호 참사 1년 동안의 언론보도를 통해 드러난 언론매체의 정치적 경도
왜 언론사는 채팅봇에 흥분하는가
뉴스 빅데이터 분석 시스템 ‘빅카인즈’ 공식 출범
네이버 뉴스 댓글 ‘남성’ 많고 ’10대·여성’ 적고
뉴스를 재미있게 만드는 방법; 뉴스잼
- 김경훈: 뉴스를 재미있게 만드는 방법; 뉴스잼 - PyCon APAC 2016
- 20160813, PyCon2016APAC 뉴스를 재미있게 만드는 방법; 뉴스잼
‘2억9천만원 아파트’ 기사에 달린 댓글로 본 사회학
Google starts highlighting fact-checks in News
Extract News In Three Words Using Triples
factcheck.snu.ac.kr
컴퓨테이셔널 저널리즘
딥러닝을 활용한 뉴스 메타 태깅
딥러닝을 활용한 뉴스 메타 태깅
스포츠 저널리즘의 변화와 AI의 활용
8.15 광화문 집회로 인한 코로나 재확산, 통합당 책임 vs 통합당과 무관, 정부가 야당을 탄압하려는 정치적 시도 (데이터 분석으로 알아보자) - YouTube
- news-analysis-8.15-rally: 8.15 광화문 집회로 인한 코로나 재확산, 통합당 책임 vs 통합당과 무관, 정부가 야당을 탄압하려는 정치적 시도? 여러분의 선택은??? 데이터 분석으로 알아보자!

Ontology

Protege Ontology Library
Disease Ontology
SNOMED CT
jena Ontology API와 sparQL을 사용하여 검색시스템 만들기

Paper

Semantics, Representations and Grammars for Deep Learning
Language Understanding for Text-based Games Using Deep Reinforcement Learning
Linguistic Knowledge as Memory for Recurrent Neural Networks
Recent Trends in Deep Learning Based Natural Language Processing
Awesome Korean NLP Papers
57 SUMMARIES OF MACHINE LEARNING AND NLP RESEARCH
100 Must-Read NLProc Papers
100-nlp-papers: 100 Must-Read NLP Papers
NLP papers
Paper in Natural Laguage Processing
Attention Is All You Need
EMNLP-IJCNLP 2019 프리뷰
Paper Digest: EMNLP 2019 Highlights
핑퐁팀 ML 세미나, 그 네번째
핑퐁팀 ML 세미나, 그 다섯 번째 – 핑퐁팀 블로그
핑퐁팀 ML 세미나, 그 여섯 번째 – 핑퐁팀 블로그
집현전 NLP 리뷰 모임
Kakao Enterprise AI Research | 카카오엔터프라이즈 연구 성과를 공개하는 리서치 플랫폼

Parser

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source
Grammatical Framework - A programming language for multilingual grammar applications
Syntactic Parsing of Web Queries with Question Intent
Phoenix Server - a Galaxy-wrapped version of the Phoenix robust semantic CFG parser
SLING: A Natural Language Frame Semantic Parser
SQLova - a neural semantic parser translating natural language utterance to SQL query

QA Question Answer

SQuAD - The Stanford Question Answering Dataset
- BiDAF - Bi-Directional Attention Flow for Machine Comprehension
- SQuAD - The Stanford Question Answering Dataset
- 어떻게 해야 기계에게 글을 잘 읽고 말할 수 있게 할까?
www.facebook.com/groups/AIKoreaOpen/permalink/1207284209305687
- Query-Regression Networks
carpedm20.github.io
Implementation of Dynamic memory networks by Kumar et al. http://arxiv.org/abs/1506.07285
Implementation of the Convolution Neural Network for factoid QA on the answer sentence selection task
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Deep Language Modeling for Question Answering using Keras
Deep Language Modeling for Question Answering using Keras
FRDF Frame Semantic-based QA system
- FRDF: Frame-semantic-based QA system
gotquestions.org
OKBQA Home
KBQA: An Online Template Based Question Answering System over Freebase
KBQA: Learning Question Answering over QA Corpora and Knowledge Bases
Question Answering System using Multiple Information Source and Open Type Answer Merge
qald.sebastianwalter.org
SearchQA
START - Natural Language Question Answering System
TriviaQA: A Large Scale Dataset for Reading Comprehension and Question Answering
Reading Wikipedia to Answer Open-Domain Questions
SIGIR2017에서 발표한 RNN을 이용한 자연어 질의 변환
PR-037: Ask me anything: Dynamic memory networks for natural language processing
Learning to reason by reading text and answering questions
- Learning to reason by reading text and answering questions
강화학습 기반 QA 시스템 - 김영삼
MRQA 2018: Machine Reading for Question Answering
MRC 시리즈 1편: MRC가 뭐예요? : 네이버 블로그
Transparency-by-Design networks (TbD-nets)
Relational Network Review
Building a Question-Answering System from Scratch— Part 1
2018 06-11-active-question-answering
Bilinear attention networks for visual question answering
Presenting Multitask Learning as Question Answering: The Natural Language Decathlon
ATOMIC An Atlas of Machine Commonsense for If-Then Reasoning
Run your own Q&A Platforms like Stackoverflow or Quora with Open Source Projects for free!
7 open source Q&A platforms
QA Search Engine: Amazon Kendra, Canada project, Talk to Books, etc
정답 유형을 분류하는 딥러닝 기술
gpt3-krtranslated-qa

Sentiment

A comparison of open source tools for sentiment analysis
감정어휘 평가사전과 의미마디 연산을 이용한 영화평 등급화 시스템
- 감정어휘 평가사전 1.0
TextBlob Sentiment: Calculating Polarity and Subjectivity python
- Natural Language Basics with TextBlob
Modern Methods for Sentiment Analysis
LSTM Networks for Sentiment Analysis
Sentiment Analysis using LSTM network
KOrean Sentiment Analysis Corpus, KOSAC
Naver sentiment movie corpus v1.0
Naver Movie Sentiment Classification
The emotional arcs of stories are dominated by six basic shapes
- 컴퓨터가 분석한 6가지 이야기 유형
  - The emotional arcs of stories are dominated by six basic shapes
dracula.sentimentron.co.uk/sentiment-demo
Sentiment Analysis and Aspect classification for Hotel Reviews
Exploring Sentiment in Literature with Deep Learning
Learning when to skim and when to read
감성분석 API
Sentiment analysis on Twitter using word2vec and keras
- Sentiment analysis on forum articles using word2vec and Keras
TWITTER SENTIMENT ANALYSIS USING COMBINED LSTM-CNN MODELS
한국어 감성 분석기
How to Develop an N-gram Multichannel Convolutional Neural Network for Sentiment Analysis
5 Things You Need to Know about Sentiment Analysis and Classification
Sentiment analysis in Korean
IMDB 영화리뷰 감정 분석
소셜 미디어 감성분석을 통한 주가 예측
Detecting Sarcasm with Deep Convolutional Neural Networks
Basic Data Cleaning/Engineering Session Twitter Sentiment Data
Perform sentiment analysis with LSTMs, using TensorFlow
Sentence classification by MorphConv
Sentiment Analysis
How to build a simple text classifier with TF-Hub
- 예제의 텍스트 임베딩 함수가 estimator로 바로 피딩되는 바람에 feature vector 자체에 접근 불가능
- 이를 해결한 방법 demo_sentence_feature.ipynb
Sentiment analysis : Frequency-based models
Sentiment analysis : Frequency-based models
Sentiment analysis : Machine-Learning approach
Sentiment analysis : Machine-Learning approach
A Beginner’s Guide on Sentiment Analysis with RNN
Sentiment Classification with Natural Language Processing on LSTM
Sentiment Analysis: Concept, Analysis and Applications
sentiment_dataset
Sentiment Analysis using Deep Learning with Tensorflow
python-machine-learning-book-3rd-edition 네이버 영화 리뷰 감성 분류
Sentiment Analysis (Opinion Mining) with Python - NLP Tutorial | Towards AI
Sentiment-analysis-using-tensorflow: Here i tried to do a simple sentiment analysis of amazon product reviews using universal sentence encoder
국민·고객·직원의 '마음'을 엿본다··· ‘정서 분석’ 가이드 - CIO Korea
HuggingFace KoElectra로 NSMC 감성분석 Fine-tuning해보기 | by 김희규 | Aug, 2020 | Medium
NLTK | Sentiment Analysis with python | NLP - YouTube
nsmc: Naver sentiment movie corpus
nsmc-tf-text: tensorflow text로 간편하게 시작하는 NSMC 분류
- 편리한 NLP를 위한 TensorFlow-Text와 RaggedTensor – Jeong Ukjae
vader
- Simplifying Sentiment Analysis using VADER in Python (on Social Media Text)
- Rule-based Sentiment Analysis of App Store Review in Python | by Ng Wai Foong | Jun, 2020 | Towards Data Science

Similarity

Analyzing stylistic similarity amongst authors A quantitative comparison of writing styles in 12,590 books from Project Gutenberg
Correlation and dependence
faiss - A library for efficient similarity search and clustering of dense vectors
- faiss-serving: A lightweight Faiss HTTP Server 🚀
Fuzzy string matching using cosine similarity
코사인 유사도의 의미
Jaccard index
Most frequent k characters
Mutual information
- Pointwise mutual information
Similarity measure
Simple matching coefficient
Sørensen–Dice coefficient
Tversky index
FIVE MOST POPULAR SIMILARITY MEASURES IMPLEMENTATION IN PYTHON
Vector Similarity - Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
MinHash Tutorial with Python Code
Vector_Similarity
NMF 알고리즘을 이용한 유사한 문서 검색과 구현(1/2) matrix factorization
NMF 알고리즘을 이용한 유사 문서 검색과 구현(2/2) sklearn을 이용한 구현
String Matching and Database Merging Machine Learning to compare and join heterogeneous data from heterogeneous sources
Brain's Pick: 단어 간 유사도 파악 방법
- ling.kakaobrain.com/wordweb
Siamese LSTM을 이용한 Quora 질문 유사도 판별
한글 데이터 머신러닝 및 word2vec을 이용한 유사도 분석
EUCLIDEAN DISTANCE FOR FINDING SIMILARITY
PEARSON CORRELATION SCORE
AWS 람다(Lambda)로 실시간 추천하기 – 로켓펀치의 전문기술 정보
WMD 문서 유사도 구하기 (word mover's distance)
Chapter 3 : 단어 임베딩을 사용하여 텍스트 유사성 계산하기
11. Deep Learning Cookbook/03. 단어 임베딩을 사용하여 텍스트 유사성 계산하기
엘라스틱서치의 벡터(Vector) 필드와 텐서플로우를 이용한 문서 유사도 검색 (1) > Similarity Search #elasticsearch
텍스트 요약 모델 성능 평가를 위한 새로운 척도, RDASS를 소개합니다. | Kakao Enterprise AI Research
Nearest Neighbor Indexes for Similarity Search | Pinecone
- The Rise of Vector Data - YouTube
Find anything blazingly fast with Google's vector search technology | Google Cloud Blog

Summary, Summarize

Automatic summarization
Text summarization with TensorFlow
How to Run Text Summarization with TensorFlow
Text summarization with TensorFlow
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
tldr - Text summarization service
24 A Serious NLP Application Text Auto Summarization using Python
Summarizing Tweets in a Disaster
Unsupervised Text Summarization using Sentence Embeddings
Understand Text Summarization and create your own summarizer in python - An Introduction to Text Summarization
Text Summarization on the Books of Harry Potter
Simple Text Summarizer Using Extractive Method
분석 DeepTitle : 한국어 기사 자동 요약
Natural Language Processing: A Road Map leading to Extractive Summarization | LinkedIn
summarizers: Package for controllable summarization
text-summarization
Text-Summarization-Repo: 텍스트 요약 관련 paper 및 관련 추천자료, data에 대한 정보를 축적해나가는 저장소입니다

Summary, Summarize TextRank

An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)
TextRank를 이용한 문서요약
TextRank for Korean
LexRank for Korean
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
Text Summarization with Gensim gensim의 textrank
파이썬으로 3줄 요약기
한국어 3줄 요약기 - TextRank 알고리즘을 사용한 3줄 요약기 크롬 확장 앱
Text Summarization (1) - TextRank 알고리즘
python-rake 키워드 추출 패키지
summariz3
textacy: higher-level NLP built on spaCy text analysis based on spaCy

Spark

Natural Language Processing With Apache Spark
Introducing the Natural Language Processing Library for Apache Spark
- spark-nlp - Natural Language Understanding Library for Apache Spark
Deep learning text NLP and Spark Collaboration . 한글 딥러닝 Text NLP & Spark
Deep Learning and NLP with Spark by Andy Petrella and Melanie Warrick
Classifying Text in Money Transfers with Apache Spark - Jose A. Rodriguez-Serrano
Deep Learning and NLP with Spark - by Andy Petrella
SF Scala, David Hall, ScalaNLP Epic
Natural Language Processing with CNTK and Apache Spark - Ali Zaidi
Text By the Bay 2015: Marek Kolodziej, Unsupervised NLP Tutorial using Apache Spark
TextMining과 NaiveBayes분류 알고리즘

Speller

How to Write a Spelling Corrector
- 철자 교정기 작성하기
Deep Spelling
How to Strike a Match
파이썬으로 네이버 맞춤법 검사하기
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축
사쿠라 훈민정음
Word Prediction using Convolutional Neural Networks
단디 - 한국어 맞춤법 검사기
비슷한 명령어 추천은 어떻게 하는걸까? – ~/xo.dev – Levenshtein
facebook 맞춤법 검사기 봇
Spelling Checker Program in Python - Python Programming - PyShark
py-hanspell - 파이썬 한글 맞춤법 검사 라이브러리. (네이버 맞춤법 검사기 사용)
- 파이썬 네이버 맞춤법 검사 API, Py-hanspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Text Mining

Kaggle Solution: What’s Cooking ? (Text Mining Competition)
How to create a text mining algorithm with Python
Python을 활용한 텍스트 마이닝
Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using Python)
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Training | Edureka
Mining English and Korean text with Python
teanaps: 텍스트 분석을 위한 교육용 Python 패키지 입니다

TFIDF, TF-IDF

TFIDF In Java
The fastest way to identify keywords in news articles — TFIDF with Wikipedia (Python version)
Machine Learning with Text - TFIDF Vectorizer MultinomialNB Sklearn (Spam Filtering example Part 2)
Tf-idf 가중치
입 개발자를 위한 TF-IDF
What is TF-IDF? The 10 minute guide
How I used text mining to decide which Ted Talk to watch
Keyword Extraction with TF-IDF and scikit-learn – Full Working Example
시멘틱 웹 검색 엔진 만들기 python, mssql - YouTube
- 웹 페이지 수집하기

Tokenization

한국어 데이터 Tokenizer
한국어 자연어처리 1편 서브워드 구축(Subword Tokenizer, huggingface VS SentencePiece)
kortok: The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)

Topic Modeling

Topic Modeling with LDA Introduction
Text Mining 101: Topic Modeling
Topic Modeling in Multi-Aspect Reviews
Topic Modeling of Twitter Followers
Topic Modeling With Python
Topic Modelling in Python with NLTK and Gensim
Extracting Hidden Topics in a Corpus
Topic Modeling with Scikit Learn
간편한 토픽 모델링 툴 Tomoto Gui
An NLP Approach to Mining Online Reviews using Topic Modeling (with Python codes)
Topic Modeling with LSA, PLSA, LDA & lda2Vec
Topic modeling using Khaiii
토픽 모델링으로 그리게 될 LINER의 미래 - The Highlights - 라이너 팀 블로그
tomotopy - Python package of Tomoto, the Topic Modeling Tool
- Python용 토픽 모델링 패키지 - tomotopy 개발
- Python tomotopy로 쉽게 토픽 모델링 실시하기

Translation

Introduction to Neural Machine Translation with GPUs (Part 1)
Introduction to Neural Machine Translation with GPUs (Part 2)
Introduction to Neural Machine Translation with GPUs (part 3)
Machine Translation Survey (vol1) : Background - YouTube
문자 단위의 Neural Machine Translation
Jointly Modeling Embedding and Translation to Bridge Video and Language
Tips on Building Neural Machine Translation Systems
- Subword Neural Machine Translation
Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences
Google's Neural Machine Translation System
Peeking into the neural network architecture used for Google's Neural Machine Translation
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation 여러 언어를 동시에 번역하도록 학습했더니 한번도 학습에 사용한 적이 없는 언어쌍에 대해서도 번역이 가능
ZERO-SHOT LEARNING FOR VISION AND MULTIMEDIA
GNMT로 알아보는 신경망 기반 기계번역
Deep Learning Takes on Translation
TensorFlow에서 나만의 신경 기계 번역 시스템 구축
Learned in translation: contextualized word vectors
OpenSubtitles2016
카카오번역기가 양질의 대규모 학습 데이터를 확보하는 방법
신경망 번역 모델의 진화 과정
Machine Translation Without the Data
How to Configure an Encoder-Decoder Model for Neural Machine Translation
Neural Korean to English Machine Translater with Gluon
신경망 한영 번역기 코드 공개
A history of machine translation from the Cold War to deep learning
UNdreaMT: Unsupervised Neural Machine Translation pytorch
Neural Machine Translation : Everything you need to know
Neural Translation Model with Attention
Word Piece Model (a.k.a sentencepiece) RNN
Character Word LSTM Language Models paper review
기계번역 시퀀스 투 시퀀스 + 어텐션 모델
- Neural Machine Translation with Attention
Neural Machine Translation With Attention Mechanism
Attn: Illustrated Attention
모두를 위한 기계번역
SK T아카데미 모두를 위한 기계번역
SKC_MachineTranslation 강의자료
토크ON 58차. 기계번역 입문 | T아카데미
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Combined Quality Estimation and Automatic Post Editing in Machine Translation (기계번역 품질예측과 사후처리의 기술 융합)
제주어 기계번역 모델과 음성합성 모델에 관한 연구를 소개합니다
Google Document Translation Now Generally Available
cjk_trans: Pre-trained Machine Translation Models of Korean from/to ECJ
gtbot - 구글 번역 API를 이용한 슬랙 번역 봇입니다
LibreTranslate: Free and Open Source Machine Translation API. 100% self-hosted, no limits, no ties to proprietary services. Built on top of Argos Translate
nmtpy - a suite of Python tools, primarily based on the starter code provided in github.com/nyu-dl/dl4mt-tutorial for training neural machine translation networks using Theano
onlinedoctranslator.com 구글 api를 사용해 만든 번역 서비스
OpenNMT - a industrial-strength, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit
- Open-Source Neural Machine Translation in PyTorch http://opennmt.net
- OpenNMT-Colab-Tutorial OpenNMT Colab Tutorial Pytorch && Tensorflow
- OpenNMT_Library_Tutorial_Using_Colab
- OpenNMT-py: Open Source Neural Machine Translation in PyTorch
py-googletrans - (unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge
word2word - Easy-to-use word-to-word translations for 3,564 language pairs

Tutorial

Natural Language Processing (NLP) Tutorial | Data Science Tutorial | Simplilearn
Over 200 of the Best Machine Learning, NLP, and Python Tutorials — 2018 Edition
Natural Language Processing Tutorial Part 1 | NLP Training Videos | Text Analysis
Natural Language Processing Tutorial Part 2 | NLP Training Videos | Text Analysis
Natural Language Processing Tutorial Part 3 | NLP Training Videos | Text Analysis
Natural Language Processing Tutorial Part 4 | NLP Training Videos | Text Analysis
Natural Language Processing Tutorial Part 5 | NLP Training Videos | Text Analysis
Natural Language Processing Tutorial Part 6 | NLP Training Videos | Text Analysis
Natural_language_Processing_self_study
NLP Tutorial with Deep Learning using tensorflow
NLP Tutorial with Deep Learning using tensorflow
Natural Language Processing with TensorFlow 2
Natural Language Processing Tutorial for Deep Learning Researchers TensorFlow and Pytorc
Concrete solutions to real problems
Natural Language Processing with TensorFlow 2 - Beginner's Course
Natural Language Processing Distinguish yourself by learning to work with text data
Natural Language Processing Tutorial for Deep Learning Researchers pytorch
Tutorial: Natural Language Processing (NLP) in Python - From Zero to Hero
Natural Language Processing using TensorFlow: From Zero To Hero
Ben Batorsky - Introduction to Natural Language processing | PyData Boston May Meetup - YouTube
MrBananaHuman/KorNlpTutorial: 한국어 자연어처리 튜토리얼
NLP 언제까지 미룰래? 일단 들어와!! #1.자연어 처리란? - DACON
NLP 언제까지 미룰래? 일단 들어와!! #2. NLP 전처리 - DACON
NLP 언제까지 미룰래? 일단 들어와!! #3. Vectorization - DACON
NLP 언제까지 미룰래? 일단 들어와!! #4. word embedding - DACON
NLP 언제까지 미룰래? 일단 들어와!! #5. Modeling(완) - DACON
nlp-review: nlp review repository for jiphyeonjeon group
NLP with Python for Machine Learning Essential Training
1/13~13/13 모음 국민청원으로 파이썬 자연어처리 입문하기 - YouTube
NLP Tutorial Playlist Python - YouTube
large-scale-lm-tutorials: Large-scale language modeling tutorials with PyTorch
nlp_tutorials: huggingface를 이용하여 downstream task 수행하기

Twitter

Analyzing Twitter Part 1
Analyzing Twitter Part 2
Analyzing Twitter Part 3

Voice

THE COMPUTERS ARE LISTENING HOW THE NSA CONVERTS SPOKEN WORDS INTO SEARCHABLE TEXT
“음성인식 기술로 만화 주인공과 대화 나눠요”
Google voice search: faster and more accurate
Baidu Deep Voice explained Part 2 — Training
Neural Voice Cloning with a Few Samples
Tutorial: Asynchronous Speech Recognition in Python
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
- Multi-speaker Tacotron in TensorFlow. 오픈소스 딥러닝 다중 화자 음성 합성 엔진. http://carpedm20.github.io/tacotron
카카오미니는 목소리를 어떻게 인식할까?
SPEECH TO TEXT(STT) 라이브러리와 프로세싱을 이용하여 음성인식 테스트하기
Getting robots to understand speech: Using Watson’s Natural Language Classifier service
DeepSpeech 0.6: Mozilla’s Speech-to-Text Engine Gets Fast, Lean, and Ubiquitous
- Mozilla, 음성데이터세트 ‘딥스피치(DeepSpeech)’ 공개
How to build a simple speech recognition app
딥러닝 음성합성 multi-speaker-tacotron(tacotron+deepvoice) 설치 및 사용법
딥 러닝 음성 인식에 필요한 훈련 데이터를 직접 만들어보자
Towards end-to-end speech recognition
컴퓨터는 어떻게 소리를 들을까?
How to Make a Speech Emotion Recognizer Using Python And Scikit-learn Librosa, Numpy, Soundfile, Scikit-learn, PyAudio
음성인식 코드 짜는 최단 경로 (With Naver Cloud Platform): 순식간에 STT 완성하기
텍스트를 음성 mp3로 간단하게 변환하기 (With Naver Cloud Platform)
연구자로 성장하기 Audio알못에서 VCC2020참가까지 (카카오엔터프라이즈 인턴 후기) - Subinium의 코딩일지
꿀벌개발일지 :: 클럽하우스와 음성 데이터
Transcribe Audio and Use Speech Recognition in Python - YouTube
이렇게 사용하세요! AI 음성인식 API로 음성 변환 서비스 쉽게 만들 (CLOVA Speech Recognition, CSR)
Neural Instrument Cloning from very few samples
AudioSet - A massive dataset of manually annotated audio events
DKTC: Dataset of Korean Threatening Conversations
Frill 텐서플로우 라이트 이용한 혁신적인 음성 임베딩... 음성 AI모델, 온 디바이스로 구현하는 'FRILL' 오픈 소스로 공개
g2pK: g2p module for Korean 발음 생성 모듈. TTS의 전처리 모듈로 흔히 사용
Hound Internal Demo
- 숨쉬기 힘들 때까지 말해도…놀라운 음성인식엔진
Kaldi Speech Recognition Toolkit
- Kaldi asr(automatic speech recognition) 음성인식 오픈소스 라이브러리 사용법 및 예제 정리
- install (feat. on Mac)
- Run sample script on Mac
- 음성인식모델로 음성합성 데이터 만들기 (kaldi 음성 인식 모델 환경 구현)
KoG2P - Korean grapheme-to-phone conversion in Python python 발음 생성 모듈
KoSpeech: Open Source Project for Korean End-to-End (E2E) Automatic Speech Recognition (ASR) in Pytorch for Deep Learning Researchers
KsponSpeech-preprocess: Pre-processing KsponSpeech corpus (Korean Speech dataset) provided by AI Hub
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
MockingBird: 🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
NVIDIA Jarvis | NVIDIA Developer
- 1. Live coding Jarvis Transcriptions for Speech to Text Dataset p.1 - YouTube
- 2. Live coding Jarvis Transcriptions for Speech to Text Dataset p.2 - YouTube
openspeech: Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra
pyttsx3 - Text-to-speech x-platform — pyttsx3 2.6 documentation
- pyttsx3 : 파이썬에서 사용가능한 TTS | Acid Paper
- Convert Text To Speech Using Python | Python Projects Tutorials - YouTube
ratsgo's speechbook
SEPIA Framework
SoundStream 구글, End-to-End 뉴럴 오디오 코덱 SoundStream 공개 | GeekNews
speech-recognition: Develop speech recognition models with Tensorflow 2
Tabletop Bringing Tabletop Audio to Actions on Google through media responses
Tacotron, Wavenet-Vocoder, Koearn TTS
- 딥러닝 음성합성 multi-speaker-tacotron(tacotron+deepvoice)설치 및 사용법
Toolkits for robust speech processing
tweepy 민트 초코 논란! 자연어 처리(NLP)로 종결해드림. - YouTube
wav2letter - a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research
wav2letter++ Introducing Wav2letter++ - How Facebook Implements Speech Recognition Systems Completely Based on Convolutional Neural Networks
- Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native
Wav2Lip: This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020
Wav2vec 2.0: Learning the structure of speech from raw audio
- Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers
- Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers
- Wav2vec: Semi and Unsupervised Speech Recognition | Vaclav Kosar’s Blog
WaveNet: A Generative Model for Raw Audio
- 인간처럼 톤․억양 재현한 음성을…
VocGAN 더 깨끗하고 완벽한 AI 음성을 위해, 뉴럴 보코더(Neural Vocoder)
voice Common Voice Project
voice2json | Command-line tools for speech and intent recognition on Linux

Wikipedia

practice - wikipedia
A Multilingual Corpus of Automatically Extracted Relations from Wikipedia
Exploring Wikipedia with Gremlin Graph Traversals
Fact Extraction from Wikipedia Text
LSA-ing Wikipedia with Apache Spark
wiki - Command line tool to fetch summaries from mediawiki wikis, like Wikipedia
What are the ten most cited sources on Wikipedia? Let’s ask the data
Transforming Wikipedia into an accurate cultural knowledge quiz
Wikipedia Data Science: Working with the World’s Largest Encyclopedia
한국어 위키백과내 주요 문서 16만개에 포함된 지식을 추출하여 객체(entity), 속성(attribute), 값(value)을 갖는 트리플 형식의 데이터 75만개
Data-Mining Wikipedia for Fun and Profit – 🦉 billpg industries™

Word2Vec

awesome-sentence-embedding - A curated list of pretrained sentence(and word) embedding models
awesome-network-embedding
An Idiot’s Guide to Word2vec Natural Language Processing
Modern Methods for Sentiment Analysis
Word vectors (word2vec) on named entities and phrases - I
w.elnn.kr
Five crazy abstractions my Deep Learning word2vec model just did
Neural Language Model and Word2Vec
2015 py con word2vec이 추천시스템을 만났을 때
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남
Word2Vec Vector Algebra Comparison - Python(Gensim) VS Scala(Spark)
FastText and Gensim word embeddings
word2vec with gensim
단어 임베딩의 원리와 gensim.word2vec 사용법
models.word2vec – Deep learning with word2vec
Word2vec with Gensim - Python
Getting started with Word2Vec in Gensim and making it work!
Gensim Word2Vec Tutorial – Full Working Example
Fast Sentence Embeddings is a Python library that serves as an addition to Gensim
word2vec tutorial
- word2vec_tutorial.ipynb
- doc2vec_tutorial.ipynb
Word2Vec Tutorial
- The Skip-Gram Model
- Part 2 - Negative Sampling
- Commented (but unaltered) version of original word2vec C implementation
- Word2Vec Resources
Demystifying Neural Network in Skip-Gram Language Modeling
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
Bag of Words Meets Bags of Popcorn
- Bag of Words Meets Bags of Popcorn
An introduction to Bag of Words and how to code it in Python for NLP
Vector Representations of Words
- 단어의 벡터 표현 (Vector Representations of Words)
브런치 작가 추천과 Word2Vec
word2vec_basic.ipynb
The Amazing Power of Word Vectors
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder
word2vec
How to giving a specific word to word2vec model in tensorflow
한국어 Word2Vec
tag2vec - 인스타그램 태그를 Word2vec으로 학습시킨 태그 벡터 공간입니다. https://tag2vec.herokuapp.com
Making Sense of Everything with words2map
github.com/leeyonghwan92/news_clustering 동국대학교 4학년 학생 졸업 프로젝트
한글을 이용한 데이터마이닝및 word2vec이용한 유사도 분석
5-1. 텐서플로우(TensorFlow)를 이용해 자연어를 처리하기(NLP) – Word Embedding(Word2vec)
On word embeddings - Part 3: The secret ingredients of word2vec
Ali Ghodsi, Lec [3,1]: Deep Learning, Word2vec
Play with word embeddings in your browser
Introduction to Natural Language Processing (NLP) and Bias in AI
NLP Research part 1. Vector Representations of Words
Word2Vec 그리고 추천 시스템의 Item2Vec
박근혜 탄핵 결정문 전문 Word2Vec Visualization w/Tensorflow
단어를 숫자로! Google의 Word2Vec
code.google.com/archive/p/word2vec
Sample code for vectorizing emotion words, visualize emotion word vectors, and find most similar words for "angry"
Simple NN with Keras
Deep Learning #4: Why You Need to Start Using Embedding Layers
A non-NLP application of Word2Vec
PR-027:GloVe - Global vectors for word representation
카카오 미니의 명령어 분류 방법
Lecture 2 | Word Vector Representations: word2vec
번역에서 배우기 : 문맥화된 단어 벡터(contextualized word vector)
쉽게 씌어진 word2vec
Stop Using word2vec
Word Tensors
Word embeddings in 2017: Trends and future directions
Aerin Kim - Phrase2Vec In Practice #AIWTB 2016
Using Word2vec for Music Recommendations
Use Neural Networks to Find the Best Words to Title Your eBook
Playing with word vectors
Transform anything into a vector; entity2vec: Using cooperative learning approaches to generate entity vectors
Learning meaningful location embeddings from unlabeled visits
Mapping Medium’s Tags
Exploring Word2Vec
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Word2Vec 모델 기초
- (1) - 개념 정리
- (2) - 코드 분석
Text Embedding Models Contain Bias. Here's Why That Matters
- WEAT 테스트는 목표 단어 세트(예 : 아프리카계 미국인 이름, 유럽계 미국인 이름, 꽃, 곤충)와 속성 단어 세트 (예 : "안정", "즐거운"또는 "불쾌한")를 모델이 연관시키는 정도를 측정
- 두개의 주어진 단어 사이의 연관성은 단어에 대한 임베딩 벡터 사이의 코사인 유사성으로 정의
An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec
node2vec: Embeddings for Graph Data
PyData Tel Aviv Meetup: Node2vec - Elior Cohen
Think your Data Different - Learn how node2vec works, and what kind of information it captures that word2vec doesn’t — includes case study
700x faster node2vec models: fastest random walks on a graph
Word2Vec
Text Classification With Word2Vec
Word2vec로 사용할 수 있는 벡터 모델들
딥러닝 프레임워크로 임베딩 제대로 학습해보기
word embedding 관련 정리
word2vec_cluster.py
cluster_vectors.py
K Means Clustering Example with Word2Vec in Data Mining or Machine Learning
ELMO DEEP CONTEXTUALIZED WORD REPRESENTATIONS
The Current Best of Universal Word Embeddings and Sentence Embeddings
Word Embeddings and Document Vectors
- Part 1. Similarity
- Part 1. Similarity
- Part 2. Classification
- Part 2. Order Reduction
- When in Doubt, Simplify
Various Optimisation Techniques and their Impact on Generation of Word Embeddings
Word Vector Representation for Korean: Evaluation Set
Word2Vec 강의 정리
Word2Vec — a baby step in Deep Learning but a giant leap towards Natural Language Processing
Neural Network Embeddings Explained
Beyond Word Embeddings Part 1
word_embedding.ipynb
Word2Vec For Phrases — Learning Embeddings For More Than One Word
How to incorporate phrases into Word2Vec – a text mining approach
Core Modeling at Instagram
Python을 이용한 콴다 리뷰 분석
When and Why does King - Man + Woman = Queen? (ACL 2019)
Word2vec: fish + music = bass
role2vec - A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018)
그래프 임베딩 요약
기계는 사람의 말을 어떻게 이해할까? 워드 임베딩(Word Embedding)
기초적이지만 꽤 재미있는 word embedding 놀이
성지석-Deep contextualized word representations
KCharEmb - Tutorial for character-level embeddings in Korean sentence classification
- 1909 paclic
Word2Vec 체크리스트
그래프 임베딩 요약
MODUCON 2019 자연어 처리 모델의 성능을 높이는 비결 Embedding - 이기창
Identifying Lexico-Semantic Word Relations — A Beginner’s Guide | by Karan Praharaj | Towards Data Science
bilm-tf
- word2vec, glove 등의 lookup 기반 embedding 기법과는 다르게 context word embedding을 사용해서 downstream task의 성능 향상
- 1. 대용량 corpus를 이용해서 2-layer bilstm lm 모델을 만들고
- 1. 각 timestep에 있는 h값에 대한 linear combination 결과를 현재 timestep의 word embedding으로 사용
- 1. combination weight는 downstream task의 cost function을 통해서 조정
graph2vec - A parallel implementation of "graph2vec: Learning Distributed Representations of Graphs" (MLGWorkshop 2017)
GraphWave - A scalable implementation of "Learning Structural Node Embeddings Via Diffusion Wavelets (KDD 2018)"
Magnitude: a fast, simple vector embedding utility library
moe: Misspelling Oblivious Word Embeddings
Word2Bits - Quantized Word Vectors
Word2GM (Word to Gaussian Mixture)
word2vec4kor
word2vec graph - This visualization builds graphs of nearest neighbors from high-dimensional word2vec embeddings
Word2Vec In Java
wordvectors Pre-trained word vectors of 30+ languages

Files

nlp.md

Latest commit

History

nlp.md

File metadata and controls

NLP

띄어쓰기

Annotation

BERT

Book

Category

ChatBot

ChatBot Python

Classification

Clustering

Conference

Corpus

Course MOOC Lecture

Data

Disambiguation

Doc2Vec

Filtering

Knowledge

Language Model LM

Language Model LM GPT

LDA Latent Dirichlet Allocation

Library

Library Java

Library JavaScript

Library Python

Library R

Library Scala

LSA

LSH

Named Entity

News

Ontology

Paper

Parser

QA Question Answer

Sentiment

Similarity

Summary, Summarize

Summary, Summarize TextRank

Spark

Speller

Text Mining

TFIDF, TF-IDF

Tokenization

Topic Modeling

Translation

Tutorial

Twitter

Voice

Wikipedia

Word2Vec