cleanup docs (#74)

jonhue · Oct 1, 2024 · 20a5358 · 20a5358
1 parent 556fe97
commit 20a5358
Show file tree

Hide file tree

Showing 8 changed files with 122 additions and 54 deletions.
diff --git a/README.md b/README.md
@@ -57,7 +57,7 @@ To start a local server hosting the documentation run ```pdoc ./activeft --math`
 	title        = {Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs},
 	author       = {H{\"u}botter, Jonas and Bongni, Sascha and Hakimi, Ido and Krause, Andreas},
 	year         = 2024,
-	journal      = {TODO}
+	journal      = {arXiv Preprint}
 }
 
 @inproceedings{hubotter2024transductive,

diff --git a/activeft/__init__.py b/activeft/__init__.py
@@ -1,19 +1,20 @@
 r"""
-*Active Fine-Tuning* (`activeft`) is a Python package for informative data selection.
+*Active Fine-Tuning* (`activeft`) is a Python package for intelligent active data selection.
 
 ## Why Active Data Selection?
 
 As opposed to random data selection, active data selection chooses data adaptively utilizing the current model.
 In other words, <p style="text-align: center;">active data selection pays *attention* to the most useful data</p> which allows for faster learning and adaptation.
 There are mainly two reasons for why some data may be particularly useful:
 
-1. **Informativeness**: The data contains information that the model had previously been uncertain about.
-2. **Relevance**: The data is closely related to a particular task, such as answering a specific prompt.
+1. **Relevance**: The data is closely related to a particular task, such as answering a specific prompt.
+2. **Diversity**: The data contains non-redundant information that is not yet captured by the model.
 
+A dataset that is both relevant and diverse is *informative* for the model.
 This is related to memory recall, where the brain recalls informative and relevant memories (think "data") to make sense of the current sensory input.
-Focusing recall on useful data enables efficient few-shot learning.
+Focusing recall on useful data enables efficient learning from few examples.
 
-`activeft` provides a simple interface for active data selection, which can be used as a drop-in replacement for random data selection.
+`activeft` provides a simple interface for active data selection, which can be used as a drop-in replacement for random data selection or nearest neighbor retrieval.
 
 ## Getting Started
 
@@ -23,7 +24,7 @@
 pip install activeft
 ```
 
-We briefly discuss how to use `activeft` for [fine-tuning](#example-fine-tuning) and [in-context learning / retrieval-augmented generation](#example-in-context-learning).
+We briefly discuss how to use `activeft` for standard [fine-tuning](#example-fine-tuning) and [test-time fine-tuning](#example-test-time-fine-tuning).
 
 ### Example: Fine-tuning
 
@@ -81,7 +82,13 @@
 data_loader = ActiveDataLoader.initialize(dataset, target=None, batch_size=64)
 ```
 
-### Example: In-context Learning
+### Example: Test-Time Fine-Tuning
+
+The above example described active data selection in the context of training a model with multiple batches. This usually happens at "train-time" or during "post-training".
+
+The following example demonstrates how to use `activeft` at "test-time" to obtain a model that is as good as possible on a specific test instance.
+For example, with a language model, this would fine-tune the model for a few gradient steps on data selected specifically for a given prompt.
+We refer to the following paper for more details: [Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs](TODO).
 
 We can also use the intelligent retrieval of informative and relevant data outside a training loop — for example, for in-context learning and retrieval-augmented generation.
 
@@ -91,36 +98,59 @@
 ```python
 from activeft import ActiveDataLoader
 
-data_loader = ActiveDataLoader.initialize(dataset, target, batch_size=5)
-context = dataset[data_loader.next(model)]
-model.add_to_context(context)
+data_loader = ActiveDataLoader.initialize(dataset, target, batch_size=10)
+data = dataset[data_loader.next(model)]
+model.step(data)
 ```
 
 Again: very simple!
 
+### Scaling to Large Datasets
+
+By default `activeft` maintains a matrix of size of the dataset in memory. This is not feasible for very large datasets.
+Some acquisition functions (such as `activeft.acquisition_functions.LazyVTL`) allow for efficient computation of the acquisition function without storing the entire dataset in memory.
+An alternative approach is to pre-select a subset of the data using nearest neighbor retrieval (using [Faiss](https://github.com/facebookresearch/faiss)), before initializing the `ActiveDataLoader`.
+The following is an example of this approach in the context of [test-time fine-tuning](#example-test-time-fine-tuning):
+
+```python
+import torch
+import faiss
+from activeft.sift import Retriever
+
+# Before Test-Time
+embeddings = torch.randn(1000, 768)
+index = faiss.IndexFlatIP(embeddings.size(1))
+index.add(embeddings)
+retriever = Retriever(index)
+
+# At Test-Time, given query
+query_embeddings = torch.randn(1, 768)
+indices = retriever.search(query_embeddings, N=10, K=1_000)
+data = embeddings[indices]
+model.step(data)  # Use data to fine-tune base model, then forward pass query
+```
+
+`activeft.sift.Retriever` first pre-selects `K` nearest neighbors and then uses `activeft` to select the `N` most informative data for the given query from this subset.
+
 ## Citation
 
 If you use the code in a publication, please cite our papers:
 
 ```bibtex
-# Active fine-tuning:
-@inproceedings{huebotter2024active,
-    title={Active Few-Show Fine-Tuning},
-    author={Jonas Hübotter and Bhavya Sukhija and Lenart Treven and Yarden As and Andreas Krause},
-    booktitle={ICLR Workshop on Bridging the Gap Between Practice and Theory in Deep Learning},
-    year={2024},
-    pdf={https://arxiv.org/pdf/2402.15898.pdf},
-    url={https://github.com/jonhue/activeft}
+# Large-Scale Learning at Test-Time with SIFT
+@article{hubotter2024efficiently,
+	title        = {Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs},
+	author       = {H{\"u}botter, Jonas and Bongni, Sascha and Hakimi, Ido and Krause, Andreas},
+	year         = 2024,
+	journal      = {arXiv Preprint}
 }
 
-# Theoretical analysis of "directed" active learning:
-@inproceedings{huebotter2024information,
-    title={Information-based Transductive Active Learning},
-    author={Jonas Hübotter and Bhavya Sukhija and Lenart Treven and Yarden As and Andreas Krause},
-    booktitle={ICML},
-    year={2024},
-    pdf={https://arxiv.org/pdf/2402.15441.pdf},
-    url={https://github.com/jonhue/activeft}
+# Theory and Fundamental Algorithms for Transductive Active Learning
+@inproceedings{hubotter2024transductive,
+	title        = {Transductive Active Learning: Theory and Applications},
+	author       = {H{\"u}botter, Jonas and Sukhija, Bhavya and Treven, Lenart and As, Yarden and Krause, Andreas},
+	year         = 2024,
+	booktitle    = {Advances in Neural Information Processing Systems}
 }
 ```
 

diff --git a/activeft/acquisition_functions/lazy_vtl.py b/activeft/acquisition_functions/lazy_vtl.py
@@ -46,6 +46,8 @@ class LazyVTL(
     """
     Lazy Implementation of [VTL](vtl).[^1]
 
+    See Appendix F.2 of [Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs](TODO).
+
     [^1]: Hübotter, J., Bongni, S., Hakimi, I., and Krause, A. Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs. Preprint, 2024.
     """
 

diff --git a/docs/demo1.py b/docs/demo1.py
@@ -1,7 +1,12 @@
-from activeft import ActiveDataLoader
+import torch
+import faiss
+from activeft.sift import Retriever
 
-train_loader = ActiveDataLoader.initialize(dataset, target, batch_size=32)
+# Before Test-Time
+index = faiss.IndexFlatIP(embeddings.size(1))
+index.add(embeddings)
+retriever = Retriever(index)
 
-while not converged:
-    batch = dataset[train_loader.next(model)]
-    model.step(batch)
+# At Test-Time, given query
+indices = retriever.search(query_embeddings, N=10)
+model.step(dataset[indices])
diff --git a/docs/demo3.py b/docs/demo3.py
@@ -0,0 +1,7 @@
+from activeft import ActiveDataLoader
+
+train_loader = ActiveDataLoader.initialize(dataset, target, batch_size=32)
+
+while not converged:
+    batch = dataset[train_loader.next(model)]
+    model.step(batch)
diff --git a/docs/index.css b/docs/index.css
@@ -44,7 +44,12 @@ header {
   border-bottom-left-radius: 0;
 }
 
-#example-1, #example-2 {
+#example > label[for=view-3] {
+  border-top-left-radius: 0;
+  border-bottom-left-radius: 0;
+}
+
+#example-1, #example-2, #example-3 {
   display: none;
 }
 
@@ -56,6 +61,10 @@ header {
   display: block;
 }
 
+#view-3:checked ~ #example-3 {
+  display: block;
+}
+
 .example-code {
   box-shadow: rgba(0, 0, 0, 0.2) 0 20px 68px;
   border-radius: 5px;
@@ -80,6 +89,7 @@ header {
 
 .example-code .highlight {
   margin: 1em 1em 1.5em;
+  min-width: 40em;
 }
 
 .example-code pre {

diff --git a/docs/index.html.jinja2 b/docs/index.html.jinja2
@@ -55,17 +55,19 @@
     {%- endmacro %}
     <body>
         <header>
-            <h1>Active Few-Shot Learning</h1>
+            <h1>Active Fine-Tuning</h1>
             <p>
-                Efficient fine-tuning & in-context learning by intelligent active data selection.
+                Efficiently fine-tune large neural networks by intelligent active data selection.
             </p>
         </header>
 
         <aside id="example">
             <input type="radio" class="btn-check" name="view-selector" id="view-1" autocomplete="off" checked>
-            <label class="btn btn-outline-dark" for="view-1">Fine-tuning</label>
-            <input type="radio" class="btn-check" name="view-selector" id="view-2" autocomplete="off">
-            <label class="btn btn-outline-dark" for="view-2">In-context learning</label>
+            <label class="btn btn-outline-dark" for="view-1">at Test-Time</label>
+            {# <input type="radio" class="btn-check" name="view-selector" id="view-2" autocomplete="off">
+            <label class="btn btn-outline-dark" for="view-2">during Post-Training</label> #}
+            <input type="radio" class="btn-check" name="view-selector" id="view-3" autocomplete="off">
+            <label class="btn btn-outline-dark" for="view-3">within an Outer Loop</label>
             <div class="example-code" id="example-1">
                 <svg aria-hidden="true" xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14">
                     <g>
@@ -77,7 +79,7 @@
                 <div class="title"></div>
                 {{ example_html1 }}
             </div>
-            <div class="example-code" id="example-2">
+            {# <div class="example-code" id="example-2">
                 <svg aria-hidden="true" xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14">
                     <g>
                         <circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle>
@@ -87,6 +89,17 @@
                 </svg>
                 <div class="title"></div>
                 {{ example_html2 }}
+            </div> #}
+            <div class="example-code" id="example-3">
+                <svg aria-hidden="true" xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14">
+                    <g>
+                        <circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle>
+                        <circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle>
+                        <circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle>
+                    </g>
+                </svg>
+                <div class="title"></div>
+                {{ example_html3 }}
             </div>
         </aside>
 
@@ -102,31 +115,27 @@
                     {{ icon("book-half") }}
                     &nbsp;Documentation
                 </a>
-                <a href="https://arxiv.org/pdf/2402.15898.pdf"
+                <a href="TODO"
                 class="btn btn-dark shadow">
                     {{ icon("newspaper") }}
                     &nbsp;Paper
                 </a>
             </div>
-            <p>
+            <p><center>
                 <code>activeft</code> retrieves data intelligently to maximize the information gain about specified prediction targets.
-                This can be used, for example, to select data for efficient few-shot <i>fine-tuning</i> or to populate a context for <i>in-context learning</i>.
-                The documentation details <a href="https://jonhue.github.io/activeft/docs/activeft.html#getting-started">how to get started</a>.
-                To learn how <code>activeft</code> works, check out our <a href="https://arxiv.org/pdf/2402.15898.pdf">paper</a> or our <a href="https://yas.pub">blog post</a>.
-            </p>
+                This can be used to select data for efficient <i>fine-tuning</i> or to efficiently <i>learn at test-time</i>.
+            </center></p>
             <div id="publications">
                 <h3>Publications</h3>
                 <div>
-                    <h5>Active Few-Shot Fine-Tuning <a href="https://arxiv.org/pdf/2402.15441.pdf">{{ icon("newspaper") }}</a></h5>
-                    {# <p>Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause</p> #}
-                    {# <p>ICLR 2024 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning</p> #}
+                    <h5>Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs</h5>
+                    {# <p>Jonas Hübotter, Sascha Bongni, Ido Hakimi, Andreas Krause</p> #}
                     <p>Preprint</p>
                 </div>
                 <div>
-                    <h5>Information-based Transductive Active Learning <a href="https://arxiv.org/pdf/2402.15898.pdf">{{ icon("newspaper") }}</a></h5>
+                    <h5>Transductive Active Learning: Theory and Applications <a href="https://arxiv.org/abs/2402.15898">{{ icon("newspaper") }}</a></h5>
                     {# <p>Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause</p> #}
-                    {# <p>ICML 2024</p> #}
-                    <p>Preprint</p>
+                    <p>NeurIPS 2024</p>
                 </div>
             </div>
         </main>

diff --git a/docs/make.py b/docs/make.py
@@ -16,6 +16,7 @@
 if __name__ == "__main__":
     demo1 = here / "demo1.py"
     demo2 = here / "demo2.py"
+    demo3 = here / "demo3.py"
     env = Environment(
         loader=FileSystemLoader([here]),
         autoescape=True,
@@ -25,19 +26,23 @@
     formatter = pygments.formatters.html.HtmlFormatter(style="dracula")
     pygments_css = formatter.get_style_defs()
     example_html1 = Markup(
-        pygments.highlight(demo1.read_text("utf8"), lexer, formatter).replace(
-            "converged", '<span class="highlighted">converged</span>'
-        )
+        pygments.highlight(demo1.read_text("utf8"), lexer, formatter)
     )
     example_html2 = Markup(
         pygments.highlight(demo2.read_text("utf8"), lexer, formatter)
     )
+    example_html3 = Markup(
+        pygments.highlight(demo3.read_text("utf8"), lexer, formatter).replace(
+            "converged", '<span class="highlighted">converged</span>'
+        )
+    )
 
     (here / "index.html").write_bytes(
         env.get_template("index.html.jinja2")
         .render(
             example_html1=example_html1,
             example_html2=example_html2,
+            example_html3=example_html3,
             pygments_css=pygments_css,
         )
         .encode()
-Original file line number
+Diff line change
@@ Expand Up / @@ -46,6 +46,8 @@ class LazyVTL( @@
         """
         Lazy Implementation of [VTL](vtl).[^1]
+        See Appendix F.2 of [Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs](TODO).
         [^1]: Hübotter, J., Bongni, S., Hakimi, I., and Krause, A. Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs. Preprint, 2024.
         """
@@ Expand Down @@