rename to activeft (#70)

* rename to activeft * update citation key
jonhue · Sep 28, 2024 · ef5184c · ef5184c
1 parent 840b860
commit ef5184c
Show file tree

Hide file tree

Showing 52 changed files with 242 additions and 175 deletions.
diff --git a/README.md b/README.md
@@ -1,11 +1,36 @@
-# Active Few-Shot Learning
+# Active Fine-Tuning
 
-## Setup
+A library for automatic data selection in active fine-tuning of large neural networks.
 
-1. Navigate to the root folder of the project
-2. Run `pip install -e .`
+**[Website](https://jonhue.github.io/activeft)** | **[Documentation](https://jonhue.github.io/activeft/docs)**
 
-## Maintenance
+Please cite our work if you use this library in your research ([bibtex below](#citation)):
+
+- [Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs]()
+- [Transductive Active Learning: Theory and Applications](https://arxiv.org/abs/2402.15898) (Section 4)
+
+## Installation
+
+```
+pip install activeft
+```
+
+## Usage Example
+
+```python
+from activeft.sift import Retriever
+
+# Load embeddings
+embeddings = np.random.rand(1000, 512)
+query_embeddings = np.random.rand(1, 512)
+
+index = faiss.IndexFlatIP(d)
+index.add(embeddings)
+retriever = Retriever(index)
+indices = retriever.search(query_embeddings, N=10)
+```
+
+## Development
 
 ### CI checks
 
@@ -15,12 +40,30 @@
 
 ### Documentation
 
-To start a local server hosting the documentation run ```pdoc ./afsl --math```.
+To start a local server hosting the documentation run ```pdoc ./activeft --math```.
 
 ### Publishing
 
-1. update version number in `pyproject.toml` and `afsl/__init__.py`
+1. update version number in `pyproject.toml` and `activeft/__init__.py`
 2. build: `poetry build`
 3. publish: `poetry publish`
 4. push version update to GitHub
 5. create new release on GitHub
+
+## Citation
+
+```bibtex
+@article{hubotter2024efficiently,
+	title        = {Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs},
+	author       = {H{\"u}botter, Jonas and Bongni, Sascha and Hakimi, Ido and Krause, Andreas},
+	year         = 2024,
+	journal      = {TODO}
+}
+
+@inproceedings{hubotter2024transductive,
+	title        = {Transductive Active Learning: Theory and Applications},
+	author       = {H{\"u}botter, Jonas and Sukhija, Bhavya and Treven, Lenart and As, Yarden and Krause, Andreas},
+	year         = 2024,
+	booktitle    = {Advances in Neural Information Processing Systems}
+}
+```
diff --git a/afsl/__init__.py → activeft/__init__.py b/afsl/__init__.py → activeft/__init__.py
@@ -1,5 +1,5 @@
 r"""
-*Active Few-Shot Learning* (`afsl`) is a Python package for intelligent active data selection.
+*Active Fine-Tuning* (`activeft`) is a Python package for informative data selection.
 
 ## Why Active Data Selection?
 
@@ -13,30 +13,30 @@
 This is related to memory recall, where the brain recalls informative and relevant memories (think "data") to make sense of the current sensory input.
 Focusing recall on useful data enables efficient few-shot learning.
 
-`afsl` provides a simple interface for active data selection, which can be used as a drop-in replacement for random data selection.
+`activeft` provides a simple interface for active data selection, which can be used as a drop-in replacement for random data selection.
 
 ## Getting Started
 
-You can install `afsl` from [PyPI](https://pypi.org/project/afsl/) via pip:
+You can install `activeft` from [PyPI](https://pypi.org/project/activeft/) via pip:
 
 ```bash
-pip install afsl
+pip install activeft
 ```
 
-We briefly discuss how to use `afsl` for [fine-tuning](#example-fine-tuning) and [in-context learning / retrieval-augmented generation](#example-in-context-learning).
+We briefly discuss how to use `activeft` for [fine-tuning](#example-fine-tuning) and [in-context learning / retrieval-augmented generation](#example-in-context-learning).
 
 ### Example: Fine-tuning
 
-Given a [PyTorch](https://pytorch.org) model which may (but does not have to be!) pre-trained, we can use `afsl` to efficiently fine-tune the model.
+Given a [PyTorch](https://pytorch.org) model which may (but does not have to be!) pre-trained, we can use `activeft` to efficiently fine-tune the model.
 This model may be generative (e.g., a language model) or discriminative (e.g., a classifier), and can use any architecture.
 
 We only need the following things:
-- A dataset of inputs `dataset` (such that `dataset[i]` returns a vector of length $d$) from which we want to select batches for fine-tuning. If one has a supervised dataset returning input-label pairs, then `afsl.data.InputDataset(dataset)` can be used to obtain a dataset over the input space.
+- A dataset of inputs `dataset` (such that `dataset[i]` returns a vector of length $d$) from which we want to select batches for fine-tuning. If one has a supervised dataset returning input-label pairs, then `activeft.data.InputDataset(dataset)` can be used to obtain a dataset over the input space.
 - A tensor of prediction targets `target` ($m \times d$) which specifies the task we want to fine-tune the model for.
 Here, $m$ can be quite small, e.g., equal to the number of classes in a classification task.
 If there is no *specific* task for training, then active data selection can still be useful as we will see [later](#undirected-data-selection).
 - The `model` can be any PyTorch `nn.Module` with an `embed(x)` method that computes (latent) embeddings for the given inputs `x`, e.g., the representation of `x` from the penultimate layer.
-See `afsl.model.ModelWithEmbedding` for more details. Alternatively, the model can have a `kernel(x1,x2)` method that computes a kernel for given inputs `x1` and `x2` (see `afsl.model.ModelWithKernel`).
+See `activeft.model.ModelWithEmbedding` for more details. Alternatively, the model can have a `kernel(x1,x2)` method that computes a kernel for given inputs `x1` and `x2` (see `activeft.model.ModelWithKernel`).
 
 .. note::
 
@@ -46,7 +46,7 @@
 With this in place, we can initialize the "active" data loader
 
 ```python
-from afsl import ActiveDataLoader
+from activeft import ActiveDataLoader
 
 data_loader = ActiveDataLoader.initialize(dataset, target, batch_size=64)
 ```
@@ -86,10 +86,10 @@
 We can also use the intelligent retrieval of informative and relevant data outside a training loop — for example, for in-context learning and retrieval-augmented generation.
 
 The setup is analogous to the previous section: we have a pre-trained `model`, a dataset `data` to query from, and `target`s (e.g., a prompt) for which we want to retrieve relevant data.
-We can use `afsl` to query the most useful data and then add it to the model's context:
+We can use `activeft` to query the most useful data and then add it to the model's context:
 
 ```python
-from afsl import ActiveDataLoader
+from activeft import ActiveDataLoader
 
 data_loader = ActiveDataLoader.initialize(dataset, target, batch_size=5)
 context = dataset[data_loader.next(model)]
@@ -110,7 +110,7 @@
     booktitle={ICLR Workshop on Bridging the Gap Between Practice and Theory in Deep Learning},
     year={2024},
     pdf={https://arxiv.org/pdf/2402.15898.pdf},
-    url={https://github.com/jonhue/afsl}
+    url={https://github.com/jonhue/activeft}
 }
 
 # Theoretical analysis of "directed" active learning:
@@ -120,15 +120,15 @@
     booktitle={ICML},
     year={2024},
     pdf={https://arxiv.org/pdf/2402.15441.pdf},
-    url={https://github.com/jonhue/afsl}
+    url={https://github.com/jonhue/activeft}
 }
 ```
 
 ---
 """
 
-from afsl.active_data_loader import ActiveDataLoader
-from afsl import acquisition_functions, data, embeddings, model, sift
+from activeft.active_data_loader import ActiveDataLoader
+from activeft import acquisition_functions, data, embeddings, model, sift
 
 __all__ = [
     "ActiveDataLoader",

diff --git a/afsl/acquisition_functions/__init__.py → activeft/acquisition_functions/__init__.py b/afsl/acquisition_functions/__init__.py → activeft/acquisition_functions/__init__.py
@@ -1,13 +1,13 @@
 """
-`afsl` supports a wide range of acquisition functions which are summarized here.
+`activeft` supports a wide range of acquisition functions which are summarized here.
 The default implementation uses [VTL](acquisition_functions/vtl).
 You can use a custom acquisition function as follows:
 
 ```python
-from afsl.acquisition_functions.undirected_vtl import UndirectedVTL
+from activeft.acquisition_functions.undirected_vtl import UndirectedVTL
 
 acquisition_function = UndirectedVTL()
-data_loader = afsl.ActiveDataLoader(data, batch_size=64, acquisition_function=acquisition_function)
+data_loader = activeft.ActiveDataLoader(data, batch_size=64, acquisition_function=acquisition_function)
 ```
 
 ## Overview of Acquisition Functions
@@ -32,9 +32,9 @@
 | [Random](acquisition_functions/random)                             | ❌          | ❌                | (✅)        | -                   |
 
 
-- **Relevance** and **Informativeness** capture whether obtained data is "useful" as outlined [here](/afsl/docs/afsl#why-active-data-selection).
+- **Relevance** and **Informativeness** capture whether obtained data is "useful" as outlined [here](/activeft/docs/activeft#why-active-data-selection).
 - **Diversity** captures whether the selected batches are diverse, i.e., whether they cover different "useful" parts of the data space. In a non-diverse batch, most data is not useful conditional on the rest of the batch, meaning that most of the batch is "wasted".
-- **Model Requirement** describes the type of model required for the acquisition function. For example, some acquisition functions require an *embedding* or a *kernel* (see afsl.model), while others require the model to output a *softmax* distribution (typically in a classification context).
+- **Model Requirement** describes the type of model required for the acquisition function. For example, some acquisition functions require an *embedding* or a *kernel* (see activeft.model), while others require the model to output a *softmax* distribution (typically in a classification context).
 
 ---
 """
@@ -45,9 +45,9 @@
 import numpy as np
 import torch
 from torch.utils.data import DataLoader, Dataset as TorchDataset, Subset
-from afsl.data import Dataset
-from afsl.model import Model, ModelWithEmbedding
-from afsl.utils import (
+from activeft.data import Dataset
+from activeft.model import Model, ModelWithEmbedding
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,

diff --git a/afsl/acquisition_functions/bace.py → activeft/acquisition_functions/bace.py b/afsl/acquisition_functions/bace.py → activeft/acquisition_functions/bace.py
@@ -1,17 +1,17 @@
 from typing import NamedTuple
 import torch
-from afsl.acquisition_functions import (
+from activeft.acquisition_functions import (
     EmbeddingBased,
     SequentialAcquisitionFunction,
     Targeted,
 )
-from afsl.gaussian import GaussianCovarianceMatrix
-from afsl.model import (
+from activeft.gaussian import GaussianCovarianceMatrix
+from activeft.model import (
     ModelWithEmbeddingOrKernel,
     ModelWithKernel,
     ModelWithLatentCovariance,
 )
-from afsl.utils import (
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,
@@ -52,7 +52,7 @@ class BaCE(
 
     [^1]: Hübotter, J., Sukhija, B., Treven, L., As, Y., and Krause, A. Information-based Transductive Active Learning. arXiv preprint, 2024.
 
-    [^2]: A kernel is also induced by embeddings. See afsl.model.ModelWithEmbedding.
+    [^2]: A kernel is also induced by embeddings. See activeft.model.ModelWithEmbedding.
     """
 
     noise_std: float | None

diff --git a/...cquisition_functions/cosine_similarity.py → ...cquisition_functions/cosine_similarity.py b/...cquisition_functions/cosine_similarity.py → ...cquisition_functions/cosine_similarity.py
@@ -1,12 +1,12 @@
 import torch
 import torch.nn.functional as F
-from afsl.acquisition_functions import (
+from activeft.acquisition_functions import (
     BatchAcquisitionFunction,
     EmbeddingBased,
     Targeted,
 )
-from afsl.model import ModelWithEmbedding
-from afsl.utils import (
+from activeft.model import ModelWithEmbedding
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,

diff --git a/afsl/acquisition_functions/ctl.py → activeft/acquisition_functions/ctl.py b/afsl/acquisition_functions/ctl.py → activeft/acquisition_functions/ctl.py
@@ -1,5 +1,5 @@
 import torch
-from afsl.acquisition_functions.bace import TargetedBaCE, BaCEState
+from activeft.acquisition_functions.bace import TargetedBaCE, BaCEState
 
 
 class CTL(TargetedBaCE):
@@ -23,11 +23,11 @@ class CTL(TargetedBaCE):
     |------------|------------------|------------|--------------------|
     | ✅          | (✅)                | ✅          | embedding / kernel  |
 
-    [^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See afsl.model.ModelWithKernel.
+    [^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See activeft.model.ModelWithKernel.
 
     [^3]: Hübotter, J., Sukhija, B., Treven, L., As, Y., and Krause, A. Information-based Transductive Active Learning. arXiv preprint, 2024.
 
-    [^4]: see afsl.acquisition_functions.bace.BaCE
+    [^4]: see activeft.acquisition_functions.bace.BaCE
     """
 
     def compute(self, state: BaCEState) -> torch.Tensor:

diff --git a/...uisition_functions/information_density.py → ...uisition_functions/information_density.py b/...uisition_functions/information_density.py → ...uisition_functions/information_density.py
@@ -1,9 +1,9 @@
 import torch
-from afsl.acquisition_functions import BatchAcquisitionFunction
-from afsl.acquisition_functions.cosine_similarity import CosineSimilarity
-from afsl.acquisition_functions.max_entropy import MaxEntropy
-from afsl.model import ModelWithEmbedding
-from afsl.utils import (
+from activeft.acquisition_functions import BatchAcquisitionFunction
+from activeft.acquisition_functions.cosine_similarity import CosineSimilarity
+from activeft.acquisition_functions.max_entropy import MaxEntropy
+from activeft.model import ModelWithEmbedding
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,

diff --git a/afsl/acquisition_functions/itl.py → activeft/acquisition_functions/itl.py b/afsl/acquisition_functions/itl.py → activeft/acquisition_functions/itl.py
@@ -1,6 +1,6 @@
 import torch
 import wandb
-from afsl.acquisition_functions.bace import TargetedBaCE, BaCEState
+from activeft.acquisition_functions.bace import TargetedBaCE, BaCEState
 
 
 class ITL(TargetedBaCE):
@@ -39,13 +39,13 @@ class ITL(TargetedBaCE):
     `ITL` is computed using $\I{\vf(\spA)}{y(\vx) \mid \spD_i} \approx \I{\vy(\spA)}{y(\vx) \mid \spD_i}$ with \\[\begin{align}
         \I{\vy(\spA)}{y(\vx) \mid \spD_i} &= \frac{1}{2} \log\left( \frac{k_i(\vx,\vx) + \sigma^2}{\tilde{k}_i(\vx,\vx) + \sigma^2} \right) \qquad\text{where} \\\\
         \tilde{k}_i(\vx,\vx) &= k_i(\vx,\vx) - \vk_i(\vx,\spA) (\mK_i(\spA,\spA) + \sigma^2 \mI)^{-1} \vk_i(\spA,\vx)
-    \end{align}\\] where $\sigma^2$ is the noise variance and $k_i$ denotes the conditional kernel (see afsl.acquisition_functions.bace.BaCE).
+    \end{align}\\] where $\sigma^2$ is the noise variance and $k_i$ denotes the conditional kernel (see activeft.acquisition_functions.bace.BaCE).
 
-    [^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See afsl.model.ModelWithKernel.
+    [^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See activeft.model.ModelWithKernel.
 
     [^3]: Hübotter, J., Sukhija, B., Treven, L., As, Y., and Krause, A. Information-based Transductive Active Learning. arXiv preprint, 2024.
 
-    [^4]: see afsl.acquisition_functions.bace.BaCE
+    [^4]: see activeft.acquisition_functions.bace.BaCE
     """
 
     def compute(self, state: BaCEState) -> torch.Tensor:

diff --git a/afsl/acquisition_functions/itl_noiseless.py → ...ft/acquisition_functions/itl_noiseless.py b/afsl/acquisition_functions/itl_noiseless.py → ...ft/acquisition_functions/itl_noiseless.py
@@ -1,7 +1,7 @@
 import torch
 import wandb
-from afsl.acquisition_functions.bace import TargetedBaCE, BaCEState
-from afsl.utils import (
+from activeft.acquisition_functions.bace import TargetedBaCE, BaCEState
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,

diff --git a/afsl/acquisition_functions/kmeans_pp.py → activeft/acquisition_functions/kmeans_pp.py b/afsl/acquisition_functions/kmeans_pp.py → activeft/acquisition_functions/kmeans_pp.py
@@ -1,6 +1,6 @@
 import random
 import torch
-from afsl.acquisition_functions.max_dist import MaxDist
+from activeft.acquisition_functions.max_dist import MaxDist
 
 
 class KMeansPP(MaxDist):
@@ -19,7 +19,7 @@ class KMeansPP(MaxDist):
     |------------|------------------|------------|--------------------|
     | ❌          | (✅)                | ✅          | embedding / kernel  |
 
-    Using the afsl.embeddings.classification.HallucinatedCrossEntropyEmbedding embeddings, this acquisition function is known as BADGE (*Batch Active learning by Diverse Gradient Embeddings*).[^4]
+    Using the activeft.embeddings.classification.HallucinatedCrossEntropyEmbedding embeddings, this acquisition function is known as BADGE (*Batch Active learning by Diverse Gradient Embeddings*).[^4]
 
     [^1]: See [here](max_dist#where-does-the-distance-come-from) for a discussion of how a distance is induced by embeddings or a kernel.
 

diff --git a/afsl/acquisition_functions/lazy_vtl.py → activeft/acquisition_functions/lazy_vtl.py b/afsl/acquisition_functions/lazy_vtl.py → activeft/acquisition_functions/lazy_vtl.py
@@ -1,14 +1,14 @@
 from typing import List, NamedTuple, Tuple
 import numpy as np
 import torch
-from afsl.acquisition_functions import (
+from activeft.acquisition_functions import (
     EmbeddingBased,
     SequentialAcquisitionFunction,
     Targeted,
 )
-from afsl.gaussian import GaussianCovarianceMatrix
-from afsl.model import ModelWithEmbeddingOrKernel
-from afsl.utils import (
+from activeft.gaussian import GaussianCovarianceMatrix
+from activeft.model import ModelWithEmbeddingOrKernel
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,

diff --git a/...acquisition_functions/least_confidence.py → ...acquisition_functions/least_confidence.py b/...acquisition_functions/least_confidence.py → ...acquisition_functions/least_confidence.py
@@ -1,7 +1,7 @@
 import torch
-from afsl.acquisition_functions import BatchAcquisitionFunction
-from afsl.model import Model
-from afsl.utils import get_device, mini_batch_wrapper
+from activeft.acquisition_functions import BatchAcquisitionFunction
+from activeft.model import Model
+from activeft.utils import get_device, mini_batch_wrapper
 
 
 class LeastConfidence(BatchAcquisitionFunction):

diff --git a/afsl/acquisition_functions/max_dist.py → activeft/acquisition_functions/max_dist.py b/afsl/acquisition_functions/max_dist.py → activeft/acquisition_functions/max_dist.py
@@ -1,8 +1,12 @@
 from typing import NamedTuple
 import torch
-from afsl.acquisition_functions import EmbeddingBased, SequentialAcquisitionFunction
-from afsl.model import ModelWithEmbedding, ModelWithEmbeddingOrKernel, ModelWithKernel
-from afsl.utils import (
+from activeft.acquisition_functions import EmbeddingBased, SequentialAcquisitionFunction
+from activeft.model import (
+    ModelWithEmbedding,
+    ModelWithEmbeddingOrKernel,
+    ModelWithKernel,
+)
+from activeft.utils import (
     DEFAULT_EMBEDDING_BATCH_SIZE,
     DEFAULT_MINI_BATCH_SIZE,
     DEFAULT_NUM_WORKERS,