mmistakes · SeokHwanHong · Sep 10, 2024 · Sep 10, 2024 · Sep 10, 2024 · Sep 10, 2024
diff --git a/_config.yml b/_config.yml
@@ -17,12 +17,12 @@ minimal_mistakes_skin    : "default" # "air", "aqua", "contrast", "dark", "dirt"
 # Site Settings
 locale                   : "en-US"
 rtl                      : # true, false (default) # turns direction of the page into right to left for RTL languages
-title                    : "Site Title"
+title                    : "Note"
 title_separator          : "-"
 subtitle                 : # site tagline that appears below site title in masthead
-name                     : "Your Name"
-description              : "An amazing website."
-url                      : # the base hostname & protocol for your site e.g. "https://mmistakes.github.io"
+name                     : "Hong Seok Hwan"
+description              : "Note"
+url                      : "https://blackapple1.github.io"
 baseurl                  : # the subpath of your site, e.g. "/blog"
 repository               : # GitHub username/repo-name e.g. "mmistakes/minimal-mistakes"
 teaser                   : # path of fallback teaser image, e.g. "/assets/images/500x300.png"
@@ -125,18 +125,18 @@ author:
     - label: "Website"
       icon: "fas fa-fw fa-link"
       # url: "https://your-website.com"
-    - label: "Twitter"
-      icon: "fab fa-fw fa-twitter-square"
-      # url: "https://twitter.com/"
-    - label: "Facebook"
-      icon: "fab fa-fw fa-facebook-square"
-      # url: "https://facebook.com/"
-    - label: "GitHub"
-      icon: "fab fa-fw fa-github"
-      # url: "https://github.com/"
-    - label: "Instagram"
-      icon: "fab fa-fw fa-instagram"
-      # url: "https://instagram.com/"
+#    - label: "Twitter"
+#      icon: "fab fa-fw fa-twitter-square"
+#      # url: "https://twitter.com/"
+#    - label: "Facebook/"
+#      icon: "fab fa-fw fa-facebook-square"
+#      # url: "https://facebook.com/"
+#    - label: "GitHub"
+#      icon: "fab fa-fw fa-github"
+#      # url: "https://github.com/"
+#    - label: "Instagram"
+#      icon: "fab fa-fw fa-instagram"
+#      # url: "https://instagram.com/"
 
 # Site Footer
 footer:

diff --git a/_posts/2024-02-22-Attention is all you need copy.md b/_posts/2024-02-22-Attention is all you need copy.md
@@ -0,0 +1,120 @@
+---
+layout: single        # 문서 형식
+title: 'Attention is all you need 리뷰'         # 제목
+categories: Natural Language Process    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+---
+
+# 1. Introduction
+RNN, LSTM, g-RNN 등이 sequence modeling 과 transductive 문제에서 좋은 성능을 보임. 
+
+#### - Sequence Modeling
+순서를 가지는 데이터(sequential data)로부터 또 다른 순서를 가지는 데이터를 생성하는 작업(task)을 수행하는 모델을 의미함. 기존 순환모델들은 모든 데이터를 한번에 처리하기보단 순서 위치(sequence position)에 따라 순차적으로 입력해야함.
+
+#### - Trasduction (Transductive Problem) 
+학습 시 사전에 미리 train dataset 뿐만 아니라 test dataset도 관측한 상태. test dataset의 label은 미관측 상태지만 학습이 진행되는 동안 train dataset 내 labeled data의 특징이나 데이터간 연관성, 패턴 등을 공유하거나 전파하는 등 추가적인 정보를 활용함으로써 test dataset의 label을 추론함. 
+
+
+
+# 2. Background
+#### - 기존 NLP 연구들에서 CNN 기반 모델들의 특징
+sequential 계산을 줄이기 위해 CNN 기반 모델들(Extended Neural GPU, ByteNet, ConvS2S 등)은 블록을 쌓으며 모든 입출력 위치에 대한 숨겨진 표현을 병렬적으로 계산함. 이 모델들은 임의의 입력 또는 출력 위치 간 신호를 연결하는데 필요한 연산량이 증가하므로 먼 위치 간의 의존성 학습이 난해해짐. 이런 단점을 보완하기 위해 Transformer를 도입. 단, attention의 가중치가 적용된 position의 평균을 이용하기 때문에 유효 해상도가 낮아지는 단점이 존재.
+
+#### - Attention
+![alt text](selfattention.jpg)
+
+Attention Mechanism은 다양한 작업에서 강력한 sequence modeling 및 transductive model의 필수적인 부분이 되었으며 input과 output sequence에서 거리와 관계없이 의존성을 모델링 가능. 본 논문에서 순환과정(recurrence)을 피하는 대신 input과 output 사이의 global dependency를 찾는 attention mechanism만 사용. 또한 Transformer 구조는 더 많은 병렬처리가 가능해 최고 수준까지도 도달.
+
+- 
+- Position-wise Feed-Forward Networks
+- Embeddings and Softmax
+- Positional Encoding
+
+- 장점
+1. 층당 전체 연산 수 감소
+2. 병렬화 가능 계산
+3. 신경망 내 장거리 의존성 간 경로 길이가 짧아져 학습이 용이
+
+
+
+#### - Self-Attention
+Self-Attention은 input sequence 내에서 서로 관련된 부분들을 찾아 집중하는 방식으로 작동하는 메커니즘. 기존 RNN 모델처럼 sequence를 순차적으로 처리하지 않고, 모든 위치 간의 관계를 동시에 고려해 학습하도록 작동. 
+Query, Key, Value 의 시작값이 동일. 자기 자신과의 내적을 통해 (각각에 대한 weight matrix를 곱) 고유의 값을 가지도록함.
+
+
+# 3. Model Architecture
+- overall architecture
+![alt text](<model architecture.jpg>)
+
+## 3.1. Attention
+
+
+
+- Scaled Dot-Product Attention
+![alt text](sdpa-1.jpg) 
+
+$Attention(Q,K,V) = softmax({Q{K^{T}}/\sqrt{d_v}}) * V$ 
+위 수식으로 Attention 매커니즘을 작동
+
+input : queries and keys of dimensions $d_{k}$ (= $d_{q}$), values of $d_{v}$
+
+1. Q와 K의 내적
+2. 1번의 결과값을 $\sqrt{d_v}$로 나눠줌으로써 scaling
+3. Mask(opt.)
+
+4. 3번의 결과값에 SoftMax 함수를 취함 -> 왜 softmax 사용? 다른 활성화함수 있잖
+5. 마지막으로 4번의 결과값에 V를 곱함
+
+
+
+
+
+
+- Multi-Head Attention
+![alt text](mha-1.jpg) 
+
+## 3.1. Encoder & Decoder Stacks
+- Notation
+$(x_1, x_2, ... , x_n)$ : an input sequence of symbol representations
+$ \mathbf{z} = (z_1, z_2, ... , z_n)$ : a sequence of continuous representations
+$(y_1, y_2, ... , y_n)$ : an output sequence 
+
+- overall architecture
+#### 3.1.1. Encoder
+본 논문에서 N=6의 동일한 레이어로 구성된 스택으로 이루어짐.
+$(x_1, x_2, ... , x_n)$을 $(z_1, z_2, ... , z_n)$로 mapping
+
+#### 3.1.2. Decoder
+$\mathbf{z}$가 주어졌을 때, 한번에 하나씩 $(y_1, y_2, ... , y_n)$을 생성
+각 단계에서 모델은 autoregressive이며, 이전에 생성한 심볼은 다음 심볼을 생성할 때 추가 입력으로 사용.
+
+- Applications of Attention in our Model
+
+
+# 4. Why Self-Attention
+
+
+
+# 참고
+- attention is all you need
+https://brave-greenfrog.tistory.com/19
+
+- sequence model
+https://wooono.tistory.com/241
+https://dos-tacos.github.io/translation/transductive-learning/
+https://jadon.tistory.com/29
+
+- inductive problem
+https://velog.io/@kimdyun/Inductive-Transductive-Learning-%EC%B0%A8%EC%9D%B4%EC%A0%90
+
+- self-attention
+https://codingopera.tistory.com/43
+
+# 참고 논문
+BERT
+NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
+Vision Transformer
diff --git a/_posts/2024-02-22-BackPropagation.md b/_posts/2024-02-22-BackPropagation.md
@@ -0,0 +1,32 @@
+---
+layout: single        # 문서 형식
+title: Propagation        # 제목
+categories: Deep Learning    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+use_math: ture # 수식 필요한 경우 사용
+---
+
+# 1. Loss Function
+손실함수는 실제값과 예측값 사이의 차이를 표현한 것입니다. 주로 사용하는 손실함수로는 LogLoss, L2-Error, CrossEntropy, KL Divergence 등이 있습니다. 
+
+# 2. Forward Propagation
+입력층에서 출력층 방향으로 오차를 전파시키며 각 층의 가중치를 업데이트하는 방법입니다.
+
+![Alt text](E:/공부/Github/blog/images/Propagation/순전파-1.jpg)
+
+
+위 그림에서 확인할 수 있듯이, 입력층에서 출력층으로 연산을 진행하며 가중치와 기울기를 계산합니다. 이 과정에서 Hidden layer의 수가 증가하면 할수록 필요한 연산의 수는 기하급수적으로 증가합니다.
+
+# 3. BackPropagation
+Forward Propagation과 반대 반향으로 오차를 전파시키며 각 층의 가중치를 업데이트하는 방법입니다. 
+
+![alt text](역전파.jpg)
+Forward Propagation에서 계산한 오차 $(L(y_1-\widehat{y}_1))$를 기반으로, 출력층에서 입력층으로 오차에 대한 기울기를 연쇄 법칙으로 계산함으로써 모든 가중치를 업데이트합니다. 이러한 방법으로 모든 데이터에 대해 학습을 진행합니다. 이 과정을 반복해 손실 함수를 통해 계산되는 손실 점수를 최소화하는 가중치를 탐색합니다.
+
+# 참고
+https://www.philgineer.com/2021/09/27-5.html
+https://davinci-ai.tistory.com/20
diff --git a/_posts/2024-02-22-E2E Memory Networks.md b/_posts/2024-02-22-E2E Memory Networks.md
@@ -0,0 +1,40 @@
+---
+layout: single        # 문서 형식
+title: End to End Memory Netwroks        # 제목
+categories: Deep Learning    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+---
+
+# 1. Definition
+직역하면 "완전 연결 계층". 본 단어의 의미는 한 층(layer)의 모든 뉴런이 그 다음 층의 모든 뉴런과 연결된 상태를 의미.
+1차원 배열의 형태로 평탄화된 행렬을 통해 이미지를 분류하는데 사용하는 계층
+
+
+# 1. Intorduction
+
+
+# 2. Background
+
+
+# 3. Model Architecture
+## 3.1. Encoder & Decoder Stacks
+### Encoder
+### Decoder
+
+## 3.2. Attention
+### Scaled Dot-Product Attention
+### Multi-Head Attention
+### Applications of Attention in our Model
+
+## 3.3. Position-wise Feed-Forward Networks
+## 3.4. Embeddings and Softmax
+## 3.5. Positional Encoding
+
+# 4. Why Self-Attention
+
+# 참고
+https://velog.io/@grovy52/Fully-Connected-Layer-FCL-%EC%99%84%EC%A0%84-%EC%97%B0%EA%B2%B0-%EA%B3%84%EC%B8%B5
diff --git a/_posts/2024-02-22-Layer Normalization.md b/_posts/2024-02-22-Layer Normalization.md
@@ -0,0 +1,40 @@
+---
+layout: single        # 문서 형식
+title: Layer Normalization       # 제목
+categories: Deep Learning    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+---
+
+# 1. Definition
+직역하면 "완전 연결 계층". 본 단어의 의미는 한 층(layer)의 모든 뉴런이 그 다음 층의 모든 뉴런과 연결된 상태를 의미.
+1차원 배열의 형태로 평탄화된 행렬을 통해 이미지를 분류하는데 사용하는 계층
+
+
+# 2. Properties
+
+
+# 2. Background
+
+
+# 3. Model Architecture
+## 3.1. Encoder & Decoder Stacks
+### Encoder
+### Decoder
+
+## 3.2. Attention
+### Scaled Dot-Product Attention
+### Multi-Head Attention
+### Applications of Attention in our Model
+
+## 3.3. Position-wise Feed-Forward Networks
+## 3.4. Embeddings and Softmax
+## 3.5. Positional Encoding
+
+# 4. Why Self-Attention
+
+# 참고
+https://velog.io/@grovy52/Fully-Connected-Layer-FCL-%EC%99%84%EC%A0%84-%EC%97%B0%EA%B2%B0-%EA%B3%84%EC%B8%B5
diff --git a/_posts/2024-02-22-Multi Layer Perceptron.md b/_posts/2024-02-22-Multi Layer Perceptron.md
@@ -0,0 +1,40 @@
+---
+layout: single        # 문서 형식
+title: Multi-Layer Perceptron (MLP)         # 제목
+categories: DL    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+---
+
+# 1. Neuron
+### In Biology
+전기적 및 화학적 신호를 통해 정보를 처리하고 전송하는 전기적으로 흥분시키는 세포입니다. 뉴런 사이의 신호는 다른 세포와의 특별하게 연결된 시냅스에서 발생합니다. 뉴런은 신경망을 형성할 수 있도록 끝과 끝이 연결 가능합니다.
+![alt text](생물학뉴런.png)
+
+### In Deep Learning
+생물학적인 뉴런의 개념에 기조한 수학적인 함수를 의미합니다. 뉴런의 활성화 유무에 따라 활성함수가 결정됩니다. 이 때, 해당 뉴런의 결과가 0이라면, 신호를 주고받지 않는 비활성화 상태임을 알 수 있습니다.
+![alt text](딥러닝뉴런.png)
+
+
+# 2. Perceptron
+퍼셉트론은 인공신경망의 기본 구성 단위로 정보를 전파하는 역할을 합니다. 이는 생물학의 percept + neuron의 합성어로 학습 가능한 초창기 신경망 모델로 노드, 가중치, 층 등의 개념들이 도입되어 딥러닝을 포함한 현대 신경망들의 중요한 구성요소들을 이해하는데 의미가 있습니다. 여기서 활성함수 $\sigma$는 특정 퍼셉트론으로 들어오는 입력값이 일정 수준의 threshold를 넘어서면 일정값을 전파하고, 그렇지 않으면 어떠한 값도 전파하지 않습니다. 
+![alt text](image.png)
+
+
+# 3. Multi-Layer Perceptron
+다층 퍼셉트론(Multi-Layer Perceptron, MLP)은 층이 2개 이상 존재하는 신경망입니다. 이 때, 입력층(input layre)과 출력층(output layer)을 제외한 층을 은닉층(hidden layer)라고 합니다. 이 층이 1개일 경우 얕은 신경망(shallow neural network), 다수일 경우 깊은 신경망(deep neural network)이라고 합니다. 다층 퍼셉트론의 구성을 보면, 같은 층 내의 뉴런 간에는 연결되어 있지 않고 다른 층의 뉴런들과 모두 연결되어 있습니다.(Fully Connected Layer, FC Layer) 
+![alt text](다층퍼셉트론.png)
+
+# 4. Role of Hidden Layer
+다층 퍼셉트론은 단층 퍼셉트론과 달리 비선형으로 분포하는 데이터에 대해 학습이 가능합니다. 이 과정에서 가중치에 대해 선형 방정식을 계산하기 때문에 층과 층 사이에 선형으로 표현된 데이터를 활성화함수를 이용해 비선형으로 변환해 연산합니다. 
+
+
+
+# 참고
+http://kbrain.co.kr/board_FXki69/890
+https://davinci-ai.tistory.com/20
+https://yhyun225.tistory.com/21
+https://compmath.korea.ac.kr/deeplearning/Perceptron.html
diff --git a/_posts/2024-02-22-Residual Connection.md b/_posts/2024-02-22-Residual Connection.md
@@ -0,0 +1,40 @@
+---
+layout: single        # 문서 형식
+title: Residual Connection        # 제목
+categories: Deep Learning    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+---
+
+# 1. Definition
+직역하면 "완전 연결 계층". 본 단어의 의미는 한 층(layer)의 모든 뉴런이 그 다음 층의 모든 뉴런과 연결된 상태를 의미.
+1차원 배열의 형태로 평탄화된 행렬을 통해 이미지를 분류하는데 사용하는 계층
+
+
+# 1. Intorduction
+
+
+# 2. Background
+
+
+# 3. Model Architecture
+## 3.1. Encoder & Decoder Stacks
+### Encoder
+### Decoder
+
+## 3.2. Attention
+### Scaled Dot-Product Attention
+### Multi-Head Attention
+### Applications of Attention in our Model
+
+## 3.3. Position-wise Feed-Forward Networks
+## 3.4. Embeddings and Softmax
+## 3.5. Positional Encoding
+
+# 4. Why Self-Attention
+
+# 참고
+https://velog.io/@grovy52/Fully-Connected-Layer-FCL-%EC%99%84%EC%A0%84-%EC%97%B0%EA%B2%B0-%EA%B3%84%EC%B8%B5
diff --git a/_posts/2024-02-22-Residual Learning.md b/_posts/2024-02-22-Residual Learning.md
@@ -0,0 +1,30 @@
+---
+layout: single        # 문서 형식
+title: Residual Learning         # 제목
+categories: Deep Learning    # 카테고리
+toc: true             # 글 목차
+author_profiel: false # 홈페이지 프로필이 다른 페이지에도 뜨는지 여부
+sidebar:              # 페이지 왼쪽에 카테고리 지정
+    nav: "docs"       # sidebar의 주소 지정
+#search: false # 블로그 내 검색 비활성화
+---
+
+# 1. Vanishing / Exploding Gradient
+모델 성능 개선을 위해 가장 우선적으로 고려할 수 있는 경우는 모델의 층을 깊이 쌓는 것입니다. 즉, 은닉층의 수를 늘리는 것입니다. 이 과정에서 학습 중 계산이 필요한 파라미터의 수가 기하급수적으로 증가하고, 연속적이고 많은 수의 미분을 시행합니다. 그리고 이로 인해 가중치의 기울기가 사라지거나 폭발적으로 커지는 현상이 발생합니다. 이 현상을 Vanishing / Exploding Gradient 라고 합니다.
+
+# 2. Skip Connection
+![alt text](<skip connection.jpg>)
+
+# 3. Residual Learning
+
+
+
+# 2. Background
+
+
+# 참고
+
+https://arxiv.org/abs/1512.03385v1 (Deep Residual Learning for Image Recognition)
+https://meaningful96.github.io/deeplearning/skipconnection/
+
+