Skip to content

Commit

Permalink
Fix broken italisation in JA
Browse files Browse the repository at this point in the history
  • Loading branch information
Atcold committed Jul 4, 2021
1 parent a06cd9f commit 6faa366
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions docs/ja/week08/08-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,9 @@ $$
ここでは、コサイン類似性で2つの特徴マップ/ベクトル間の類似性メトリックを定義します。


<!-- What PIRL does differently is that it doesn't use the direct output of the convolutional feature extractor. It instead defines different _heads_ $f$ and $g$, which can be thought of as independent layers on top of the base convolutional feature extractor. -->
<!-- What PIRL does differently is that it doesn't use the direct output of the convolutional feature extractor. It instead defines different *heads* $f$ and $g$, which can be thought of as independent layers on top of the base convolutional feature extractor. -->

PIRLの違いは、畳み込み特徴を抽出することで直接出力を使用しないことです。代わりに、異なる_heads_ $ f $と$ g $を定義します。これらは、基本畳み込み特徴抽出器の上にある独立層としてと考えるでしょう。
PIRLの違いは、畳み込み特徴を抽出することで直接出力を使用しないことです。代わりに、異なる*heads* $ f $と$ g $を定義します。これらは、基本畳み込み特徴抽出器の上にある独立層としてと考えるでしょう。


<!-- Putting everything together, PIRL's NCE objective function works as follows. In a mini-batch, we will have one positive (similar) pair and many negative (dissimilar) pairs. We then compute the similarity between the transformed image's feature vector ($I^t$) and the rest of the feature vectors in the minibatch (one positive, the rest negative). We then compute the score of a softmax-like function on the positive pair. Maximizing a softmax score means minimizing the rest of the scores, which is exactly what we want for an energy-based model. The final loss function, therefore, allows us to build a model that pushes the energy down on similar pairs while pushing it up on dissimilar pairs. -->
Expand All @@ -142,7 +142,7 @@ Answer: With an L2 norm, it's very easy to make two vectors similar by making th
<b>Fig. 4</b>: SimCLR Results on ImageNet
</center>

<!-- SimCLR shows better results than previous methods. In fact, it reaches the performance of supervised methods on ImageNet, with top-1 linear accuracy on ImageNet. The technique uses a sophisticated data augmentation method to generate similar pairs, and they train for a massive amount of time (with very, very large batch sizes) on TPUs.
<!-- SimCLR shows better results than previous methods. In fact, it reaches the performance of supervised methods on ImageNet, with top-1 linear accuracy on ImageNet. The technique uses a sophisticated data augmentation method to generate similar pairs, and they train for a massive amount of time (with very, very large batch sizes) on TPUs.
Dr. LeCun believes that SimCLR, to a certain extend, shows the limit of contrastive methods. There are many, many regions in a high-dimensional space where you need to push up the energy to make sure it's actually higher than on the data manifold. As you increase the dimension of the representation, you need more and more negative samples to make sure the energy is higher in those places not on the manifold. -->

Expand All @@ -154,7 +154,7 @@ SimCLRは、ある程度、コントラスト的な方法の限界を示して
## [Denoising autoencoder](https://www.youtube.com/watch?v=ZaVP2SY23nc&t=1384s)
## [オートエンコーダのノイズ除去]

<!-- In [week 7's practicum](https://atcold.github.io/pytorch-Deep-Learning/en/week07/07-3/), we discussed denoising autoencoder. The model tends to learn the representation of the data by reconstructing corrupted input to the original input.
<!-- In [week 7's practicum](https://atcold.github.io/pytorch-Deep-Learning/en/week07/07-3/), we discussed denoising autoencoder. The model tends to learn the representation of the data by reconstructing corrupted input to the original input.
More specifically, we train the system to produce an energy function that grows quadratically as the corrupted data move away from the data manifold. -->

先週の講義により、オートエンコーダのノイズ除去について説明しました。このモデルは、破損した入力を元の入力に再構築することにより、データの表現を学習する傾向があります。特に、破損したデータがデータ多様体から離れるにつれて二次で成長したエネルギー関数を生成するようにそのシステムをトレーニングします。
Expand All @@ -167,9 +167,9 @@ More specifically, we train the system to produce an energy function that grows


<!-- ### Issues
However, there are several problems with denoising autoencoders. One problem is that in a high dimensional continuous space, there are uncountable ways to corrupt a piece of data. So there is no guarantee that we can shape the energy function by simply pushing up on lots of different locations.
However, there are several problems with denoising autoencoders. One problem is that in a high dimensional continuous space, there are uncountable ways to corrupt a piece of data. So there is no guarantee that we can shape the energy function by simply pushing up on lots of different locations.
Another problem with the model is that it performs poorly when dealing with images due to the lack of latent variables. Since there are many ways to reconstruct the images, the system produces various predictions and doesn't learn particularly good features.
Another problem with the model is that it performs poorly when dealing with images due to the lack of latent variables. Since there are many ways to reconstruct the images, the system produces various predictions and doesn't learn particularly good features.
Besides, corrupted points in the middle of the manifold could be reconstructed to both sides. This will create flat spots in the energy function and affect the overall performance. -->

Expand All @@ -189,7 +189,7 @@ There are other contrastive methods such as contrastive divergence, Ratio Matchi

<!-- ### Contrastive Divergence
Contrastive divergence (CD) is another model that learns the representation by smartly corrupting the input sample. In a continuous space, we first pick a training sample $y$ and lower its energy. For that sample, we use some sort of gradient-based process to move down on the energy surface with noise. If the input space is discrete, we can instead perturb the training sample randomly to modify the energy. If the energy we get is lower, we keep it. Otherwise, we discard it with some probability.
Contrastive divergence (CD) is another model that learns the representation by smartly corrupting the input sample. In a continuous space, we first pick a training sample $y$ and lower its energy. For that sample, we use some sort of gradient-based process to move down on the energy surface with noise. If the input space is discrete, we can instead perturb the training sample randomly to modify the energy. If the energy we get is lower, we keep it. Otherwise, we discard it with some probability.
Keep doing so will eventually lower the energy of $y$. We can then update the parameter of our energy function by comparing $y$ and the contrasted sample $\bar y$ with some loss function. -->

Expand All @@ -206,4 +206,4 @@ Eventually, they will find low energy places in our energy surface and will caus

###持続的なコントラストな発散

コントラスト的な発散(CD)からの改良したものは、持続的なコントラスト的な発散です。システムは一連の「粒子」を使用し、それらの位置を記憶します。これらの粒子は、通常のCDで行ったのと同じように、エネルギー的に下に移動します。最終的に、この粒子らはエネルギー面で低エネルギーの場所を見つけ、それらを押し上げさせます。ただし、次元が増加するにつれて、システムは適切にスケーリングされません。
コントラスト的な発散(CD)からの改良したものは、持続的なコントラスト的な発散です。システムは一連の「粒子」を使用し、それらの位置を記憶します。これらの粒子は、通常のCDで行ったのと同じように、エネルギー的に下に移動します。最終的に、この粒子らはエネルギー面で低エネルギーの場所を見つけ、それらを押し上げさせます。ただし、次元が増加するにつれて、システムは適切にスケーリングされません。

0 comments on commit 6faa366

Please sign in to comment.