Skip to content

Commit

Permalink
rename to activeft (#70)
Browse files Browse the repository at this point in the history
* rename to activeft

* update citation key
  • Loading branch information
jonhue authored Sep 28, 2024
1 parent 840b860 commit ef5184c
Show file tree
Hide file tree
Showing 52 changed files with 242 additions and 175 deletions.
57 changes: 50 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,36 @@
# Active Few-Shot Learning
# Active Fine-Tuning

## Setup
A library for automatic data selection in active fine-tuning of large neural networks.

1. Navigate to the root folder of the project
2. Run `pip install -e .`
**[Website](https://jonhue.github.io/activeft)** | **[Documentation](https://jonhue.github.io/activeft/docs)**

## Maintenance
Please cite our work if you use this library in your research ([bibtex below](#citation)):

- [Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs]()
- [Transductive Active Learning: Theory and Applications](https://arxiv.org/abs/2402.15898) (Section 4)

## Installation

```
pip install activeft
```

## Usage Example

```python
from activeft.sift import Retriever

# Load embeddings
embeddings = np.random.rand(1000, 512)
query_embeddings = np.random.rand(1, 512)

index = faiss.IndexFlatIP(d)
index.add(embeddings)
retriever = Retriever(index)
indices = retriever.search(query_embeddings, N=10)
```

## Development

### CI checks

Expand All @@ -15,12 +40,30 @@

### Documentation

To start a local server hosting the documentation run ```pdoc ./afsl --math```.
To start a local server hosting the documentation run ```pdoc ./activeft --math```.

### Publishing

1. update version number in `pyproject.toml` and `afsl/__init__.py`
1. update version number in `pyproject.toml` and `activeft/__init__.py`
2. build: `poetry build`
3. publish: `poetry publish`
4. push version update to GitHub
5. create new release on GitHub

## Citation

```bibtex
@article{hubotter2024efficiently,
title = {Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs},
author = {H{\"u}botter, Jonas and Bongni, Sascha and Hakimi, Ido and Krause, Andreas},
year = 2024,
journal = {TODO}
}
@inproceedings{hubotter2024transductive,
title = {Transductive Active Learning: Theory and Applications},
author = {H{\"u}botter, Jonas and Sukhija, Bhavya and Treven, Lenart and As, Yarden and Krause, Andreas},
year = 2024,
booktitle = {Advances in Neural Information Processing Systems}
}
```
30 changes: 15 additions & 15 deletions afsl/__init__.py → activeft/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
r"""
*Active Few-Shot Learning* (`afsl`) is a Python package for intelligent active data selection.
*Active Fine-Tuning* (`activeft`) is a Python package for informative data selection.
## Why Active Data Selection?
Expand All @@ -13,30 +13,30 @@
This is related to memory recall, where the brain recalls informative and relevant memories (think "data") to make sense of the current sensory input.
Focusing recall on useful data enables efficient few-shot learning.
`afsl` provides a simple interface for active data selection, which can be used as a drop-in replacement for random data selection.
`activeft` provides a simple interface for active data selection, which can be used as a drop-in replacement for random data selection.
## Getting Started
You can install `afsl` from [PyPI](https://pypi.org/project/afsl/) via pip:
You can install `activeft` from [PyPI](https://pypi.org/project/activeft/) via pip:
```bash
pip install afsl
pip install activeft
```
We briefly discuss how to use `afsl` for [fine-tuning](#example-fine-tuning) and [in-context learning / retrieval-augmented generation](#example-in-context-learning).
We briefly discuss how to use `activeft` for [fine-tuning](#example-fine-tuning) and [in-context learning / retrieval-augmented generation](#example-in-context-learning).
### Example: Fine-tuning
Given a [PyTorch](https://pytorch.org) model which may (but does not have to be!) pre-trained, we can use `afsl` to efficiently fine-tune the model.
Given a [PyTorch](https://pytorch.org) model which may (but does not have to be!) pre-trained, we can use `activeft` to efficiently fine-tune the model.
This model may be generative (e.g., a language model) or discriminative (e.g., a classifier), and can use any architecture.
We only need the following things:
- A dataset of inputs `dataset` (such that `dataset[i]` returns a vector of length $d$) from which we want to select batches for fine-tuning. If one has a supervised dataset returning input-label pairs, then `afsl.data.InputDataset(dataset)` can be used to obtain a dataset over the input space.
- A dataset of inputs `dataset` (such that `dataset[i]` returns a vector of length $d$) from which we want to select batches for fine-tuning. If one has a supervised dataset returning input-label pairs, then `activeft.data.InputDataset(dataset)` can be used to obtain a dataset over the input space.
- A tensor of prediction targets `target` ($m \times d$) which specifies the task we want to fine-tune the model for.
Here, $m$ can be quite small, e.g., equal to the number of classes in a classification task.
If there is no *specific* task for training, then active data selection can still be useful as we will see [later](#undirected-data-selection).
- The `model` can be any PyTorch `nn.Module` with an `embed(x)` method that computes (latent) embeddings for the given inputs `x`, e.g., the representation of `x` from the penultimate layer.
See `afsl.model.ModelWithEmbedding` for more details. Alternatively, the model can have a `kernel(x1,x2)` method that computes a kernel for given inputs `x1` and `x2` (see `afsl.model.ModelWithKernel`).
See `activeft.model.ModelWithEmbedding` for more details. Alternatively, the model can have a `kernel(x1,x2)` method that computes a kernel for given inputs `x1` and `x2` (see `activeft.model.ModelWithKernel`).
.. note::
Expand All @@ -46,7 +46,7 @@
With this in place, we can initialize the "active" data loader
```python
from afsl import ActiveDataLoader
from activeft import ActiveDataLoader
data_loader = ActiveDataLoader.initialize(dataset, target, batch_size=64)
```
Expand Down Expand Up @@ -86,10 +86,10 @@
We can also use the intelligent retrieval of informative and relevant data outside a training loop — for example, for in-context learning and retrieval-augmented generation.
The setup is analogous to the previous section: we have a pre-trained `model`, a dataset `data` to query from, and `target`s (e.g., a prompt) for which we want to retrieve relevant data.
We can use `afsl` to query the most useful data and then add it to the model's context:
We can use `activeft` to query the most useful data and then add it to the model's context:
```python
from afsl import ActiveDataLoader
from activeft import ActiveDataLoader
data_loader = ActiveDataLoader.initialize(dataset, target, batch_size=5)
context = dataset[data_loader.next(model)]
Expand All @@ -110,7 +110,7 @@
booktitle={ICLR Workshop on Bridging the Gap Between Practice and Theory in Deep Learning},
year={2024},
pdf={https://arxiv.org/pdf/2402.15898.pdf},
url={https://github.com/jonhue/afsl}
url={https://github.com/jonhue/activeft}
}
# Theoretical analysis of "directed" active learning:
Expand All @@ -120,15 +120,15 @@
booktitle={ICML},
year={2024},
pdf={https://arxiv.org/pdf/2402.15441.pdf},
url={https://github.com/jonhue/afsl}
url={https://github.com/jonhue/activeft}
}
```
---
"""

from afsl.active_data_loader import ActiveDataLoader
from afsl import acquisition_functions, data, embeddings, model, sift
from activeft.active_data_loader import ActiveDataLoader
from activeft import acquisition_functions, data, embeddings, model, sift

__all__ = [
"ActiveDataLoader",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
"""
`afsl` supports a wide range of acquisition functions which are summarized here.
`activeft` supports a wide range of acquisition functions which are summarized here.
The default implementation uses [VTL](acquisition_functions/vtl).
You can use a custom acquisition function as follows:
```python
from afsl.acquisition_functions.undirected_vtl import UndirectedVTL
from activeft.acquisition_functions.undirected_vtl import UndirectedVTL
acquisition_function = UndirectedVTL()
data_loader = afsl.ActiveDataLoader(data, batch_size=64, acquisition_function=acquisition_function)
data_loader = activeft.ActiveDataLoader(data, batch_size=64, acquisition_function=acquisition_function)
```
## Overview of Acquisition Functions
Expand All @@ -32,9 +32,9 @@
| [Random](acquisition_functions/random) | ❌ | ❌ | (✅) | - |
- **Relevance** and **Informativeness** capture whether obtained data is "useful" as outlined [here](/afsl/docs/afsl#why-active-data-selection).
- **Relevance** and **Informativeness** capture whether obtained data is "useful" as outlined [here](/activeft/docs/activeft#why-active-data-selection).
- **Diversity** captures whether the selected batches are diverse, i.e., whether they cover different "useful" parts of the data space. In a non-diverse batch, most data is not useful conditional on the rest of the batch, meaning that most of the batch is "wasted".
- **Model Requirement** describes the type of model required for the acquisition function. For example, some acquisition functions require an *embedding* or a *kernel* (see afsl.model), while others require the model to output a *softmax* distribution (typically in a classification context).
- **Model Requirement** describes the type of model required for the acquisition function. For example, some acquisition functions require an *embedding* or a *kernel* (see activeft.model), while others require the model to output a *softmax* distribution (typically in a classification context).
---
"""
Expand All @@ -45,9 +45,9 @@
import numpy as np
import torch
from torch.utils.data import DataLoader, Dataset as TorchDataset, Subset
from afsl.data import Dataset
from afsl.model import Model, ModelWithEmbedding
from afsl.utils import (
from activeft.data import Dataset
from activeft.model import Model, ModelWithEmbedding
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
from typing import NamedTuple
import torch
from afsl.acquisition_functions import (
from activeft.acquisition_functions import (
EmbeddingBased,
SequentialAcquisitionFunction,
Targeted,
)
from afsl.gaussian import GaussianCovarianceMatrix
from afsl.model import (
from activeft.gaussian import GaussianCovarianceMatrix
from activeft.model import (
ModelWithEmbeddingOrKernel,
ModelWithKernel,
ModelWithLatentCovariance,
)
from afsl.utils import (
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down Expand Up @@ -52,7 +52,7 @@ class BaCE(
[^1]: Hübotter, J., Sukhija, B., Treven, L., As, Y., and Krause, A. Information-based Transductive Active Learning. arXiv preprint, 2024.
[^2]: A kernel is also induced by embeddings. See afsl.model.ModelWithEmbedding.
[^2]: A kernel is also induced by embeddings. See activeft.model.ModelWithEmbedding.
"""

noise_std: float | None
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import torch
import torch.nn.functional as F
from afsl.acquisition_functions import (
from activeft.acquisition_functions import (
BatchAcquisitionFunction,
EmbeddingBased,
Targeted,
)
from afsl.model import ModelWithEmbedding
from afsl.utils import (
from activeft.model import ModelWithEmbedding
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import torch
from afsl.acquisition_functions.bace import TargetedBaCE, BaCEState
from activeft.acquisition_functions.bace import TargetedBaCE, BaCEState


class CTL(TargetedBaCE):
Expand All @@ -23,11 +23,11 @@ class CTL(TargetedBaCE):
|------------|------------------|------------|--------------------|
| ✅ | (✅) | ✅ | embedding / kernel |
[^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See afsl.model.ModelWithKernel.
[^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See activeft.model.ModelWithKernel.
[^3]: Hübotter, J., Sukhija, B., Treven, L., As, Y., and Krause, A. Information-based Transductive Active Learning. arXiv preprint, 2024.
[^4]: see afsl.acquisition_functions.bace.BaCE
[^4]: see activeft.acquisition_functions.bace.BaCE
"""

def compute(self, state: BaCEState) -> torch.Tensor:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import torch
from afsl.acquisition_functions import BatchAcquisitionFunction
from afsl.acquisition_functions.cosine_similarity import CosineSimilarity
from afsl.acquisition_functions.max_entropy import MaxEntropy
from afsl.model import ModelWithEmbedding
from afsl.utils import (
from activeft.acquisition_functions import BatchAcquisitionFunction
from activeft.acquisition_functions.cosine_similarity import CosineSimilarity
from activeft.acquisition_functions.max_entropy import MaxEntropy
from activeft.model import ModelWithEmbedding
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import torch
import wandb
from afsl.acquisition_functions.bace import TargetedBaCE, BaCEState
from activeft.acquisition_functions.bace import TargetedBaCE, BaCEState


class ITL(TargetedBaCE):
Expand Down Expand Up @@ -39,13 +39,13 @@ class ITL(TargetedBaCE):
`ITL` is computed using $\I{\vf(\spA)}{y(\vx) \mid \spD_i} \approx \I{\vy(\spA)}{y(\vx) \mid \spD_i}$ with \\[\begin{align}
\I{\vy(\spA)}{y(\vx) \mid \spD_i} &= \frac{1}{2} \log\left( \frac{k_i(\vx,\vx) + \sigma^2}{\tilde{k}_i(\vx,\vx) + \sigma^2} \right) \qquad\text{where} \\\\
\tilde{k}_i(\vx,\vx) &= k_i(\vx,\vx) - \vk_i(\vx,\spA) (\mK_i(\spA,\spA) + \sigma^2 \mI)^{-1} \vk_i(\spA,\vx)
\end{align}\\] where $\sigma^2$ is the noise variance and $k_i$ denotes the conditional kernel (see afsl.acquisition_functions.bace.BaCE).
\end{align}\\] where $\sigma^2$ is the noise variance and $k_i$ denotes the conditional kernel (see activeft.acquisition_functions.bace.BaCE).
[^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See afsl.model.ModelWithKernel.
[^1]: A kernel $k$ on domain $\spX$ induces a stochastic process $\\{f(\vx)\\}_{\vx \in \spX}$. See activeft.model.ModelWithKernel.
[^3]: Hübotter, J., Sukhija, B., Treven, L., As, Y., and Krause, A. Information-based Transductive Active Learning. arXiv preprint, 2024.
[^4]: see afsl.acquisition_functions.bace.BaCE
[^4]: see activeft.acquisition_functions.bace.BaCE
"""

def compute(self, state: BaCEState) -> torch.Tensor:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import torch
import wandb
from afsl.acquisition_functions.bace import TargetedBaCE, BaCEState
from afsl.utils import (
from activeft.acquisition_functions.bace import TargetedBaCE, BaCEState
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import random
import torch
from afsl.acquisition_functions.max_dist import MaxDist
from activeft.acquisition_functions.max_dist import MaxDist


class KMeansPP(MaxDist):
Expand All @@ -19,7 +19,7 @@ class KMeansPP(MaxDist):
|------------|------------------|------------|--------------------|
| ❌ | (✅) | ✅ | embedding / kernel |
Using the afsl.embeddings.classification.HallucinatedCrossEntropyEmbedding embeddings, this acquisition function is known as BADGE (*Batch Active learning by Diverse Gradient Embeddings*).[^4]
Using the activeft.embeddings.classification.HallucinatedCrossEntropyEmbedding embeddings, this acquisition function is known as BADGE (*Batch Active learning by Diverse Gradient Embeddings*).[^4]
[^1]: See [here](max_dist#where-does-the-distance-come-from) for a discussion of how a distance is induced by embeddings or a kernel.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
from typing import List, NamedTuple, Tuple
import numpy as np
import torch
from afsl.acquisition_functions import (
from activeft.acquisition_functions import (
EmbeddingBased,
SequentialAcquisitionFunction,
Targeted,
)
from afsl.gaussian import GaussianCovarianceMatrix
from afsl.model import ModelWithEmbeddingOrKernel
from afsl.utils import (
from activeft.gaussian import GaussianCovarianceMatrix
from activeft.model import ModelWithEmbeddingOrKernel
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import torch
from afsl.acquisition_functions import BatchAcquisitionFunction
from afsl.model import Model
from afsl.utils import get_device, mini_batch_wrapper
from activeft.acquisition_functions import BatchAcquisitionFunction
from activeft.model import Model
from activeft.utils import get_device, mini_batch_wrapper


class LeastConfidence(BatchAcquisitionFunction):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
from typing import NamedTuple
import torch
from afsl.acquisition_functions import EmbeddingBased, SequentialAcquisitionFunction
from afsl.model import ModelWithEmbedding, ModelWithEmbeddingOrKernel, ModelWithKernel
from afsl.utils import (
from activeft.acquisition_functions import EmbeddingBased, SequentialAcquisitionFunction
from activeft.model import (
ModelWithEmbedding,
ModelWithEmbeddingOrKernel,
ModelWithKernel,
)
from activeft.utils import (
DEFAULT_EMBEDDING_BATCH_SIZE,
DEFAULT_MINI_BATCH_SIZE,
DEFAULT_NUM_WORKERS,
Expand Down
Loading

0 comments on commit ef5184c

Please sign in to comment.