fix docs

jonhue · Feb 22, 2024 · cd69974 · cd69974
1 parent b8c2e10
commit cd69974
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/afsl/acquisition_functions/kmeans_pp.py b/afsl/acquisition_functions/kmeans_pp.py
@@ -18,7 +18,7 @@ class KMeansPP(MaxDist):
     |------------|------------------|------------|--------------------|
     | ❌          | (✅)                | ✅          | embedding / kernel  |
 
-    Using the afsl.embeddings.classification.CrossEntropyEmbedding embeddings, this acquisition function is known as BADGE (*Batch Active learning by Diverse Gradient Embeddings*).[^4]
+    Using the afsl.embeddings.classification.HallucinatedCrossEntropyEmbedding embeddings, this acquisition function is known as BADGE (*Batch Active learning by Diverse Gradient Embeddings*).[^4]
 
     [^1]: See [here](max_dist#where-does-the-distance-come-from) for a discussion of how a distance is induced by embeddings or a kernel.
 

diff --git a/afsl/model.py b/afsl/model.py
@@ -48,7 +48,7 @@ class ModelWithEmbedding(Model, Protocol):
     - **Output Gradients (empirical NTK):** Another common choice is $\vphi(\vx) = \grad[\vtheta] \vf(\vx; \vtheta)$ where $\vtheta$ are the network parameters.
     Its associated kernel is known as the *(empirical) Neural Tangent Kernel* (NTK).[^4][^3][^5]
     If $\vtheta$ is restricted to the weights of the final linear layer, then this embedding is simply the last-layer embedding.
-    - **Loss Gradients:** Another possible choice is $\vphi(\vx) = \grad[\vtheta] \ell(\vf(\vx; \vtheta); \widehat{\vy}(\vx))$ where $\ell$ is a loss function and $\widehat{\vy}(\vx)$ is some hallucinated label (see afsl.embeddings.classification.CrossEntropyEmbedding).[^6]
+    - **Loss Gradients:** Another possible choice is $\vphi(\vx) = \grad[\vtheta] \ell(\vf(\vx; \vtheta); \widehat{\vy}(\vx))$ where $\ell$ is a loss function and $\widehat{\vy}(\vx)$ is some hallucinated label (see afsl.embeddings.classification.HallucinatedCrossEntropyEmbedding).[^6]
     - **Outputs (empirical NNGP):** Another possible choice is $\vphi(\vx) = \vf(\vx)$ (i.e., the output of the network).
     Its associated kernel is known as the *(empirical) Neural Network Gaussian Process* (NNGP) kernel.[^2]