[neural network] notation and mathematics

basic notation

reference

논문이나 tensorflow같은 툴에서 neural net으로 forward, backward 계산할때, 
수식을 보면 notation이 헷갈릴때가 많다. 
input, output을 column vector로 간주할 때도 있고 row vector로 간주할 때도 있다. 
또한, weight matrix도 output을 row로 보는 경우도 있고 input을 row로 보는 경우도 있다. 
이렇게 볼때마다 다른 기술 방식에 대해 적응하기 위해서는 자신만의 이해 기준이 있어야하는데, 
나의 기준은 아래와 같다.

1. input, output, bias vector는 기본적으로 column vector

2. input -> weight matrix -> output 연결에서 matrix의 row는 output, column은 input이다.

즉, w(j, k)는 input의 k번째 node에서 output의 j번째 node로 연결
되는 weight이며, matrix에서 j-row, k-column을 참조하면 된다.

3. weight matrix는 기본적으로 output의 layer 번호와 연결된다.

즉, layer(l-1) -> w(l) -> layer(l)
이렇게 하면, forward는 아래와 같이 계산될 수 있다.

z(l) = w(l) * input(l-1) + bias(l)
output(l) = activation( z(l) )

backward propagation에서 error delta는

delta(l) = dot_product( w(l+1)T * delta(l+1) , activation'(z(l)) )

즉, back propagation에서만 w의 transpose를 사용한다.

만약 어떤 논문에서는 input을 row로 보고 output을 column으로 본다면 어떨까? 
그렇다면 forward 계산시에 transpose를 볼 수 있을 것이다.
z(l) = w(l)T * input(l-1) + bias(l)

transpose를 안쓴다면 
곱하기 순서를 바꾸면서 input, output, bias를 row vector로 본다는 의미이다.

z(l) = input(l-1) * w(l) + bias(l)

tensorflow에서 예를 들면 아래와 같다.

X = tf.placeholder("float", [None, 5]) # row : infinity, col : 5 for x
Y = tf.placeholder("float", [None, 3]) # row : infinity, col : 3 for y 
W = tf.Variable(tf.zeros([5,3])) # row : 5 for x, col : 3 for y
y = tf.nn.softmax(tf.matmul(X, W)) # softmax, (None x 5) * ( 5 x 3 )

만약, tf.matmul(W, X) 형태를 사용하고 싶다면
XT = tf.transpose(X)
WT = tf.transpose(W)
y = tf.nn.softmax(tf.matmul(WT, XT)) # softmax, (3 x 5) * (5 x None)
하지만, 별로 자연스럽지는 않다.