Skip to content

Commit

Permalink
Add README
Browse files Browse the repository at this point in the history
  • Loading branch information
azoz01 committed Aug 5, 2024
1 parent db42453 commit e58de99
Show file tree
Hide file tree
Showing 2 changed files with 95 additions and 9 deletions.
102 changes: 94 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,96 @@
## Running code
### Setup
# WSMF
## tl;dr
This package contains implementations of two novel approaches to warm-starting encoder-based warm-start of the Bayesian Hyperparameter Optimization. It allows both training and using meta-models which can help in this meta-task.


## Contents
**Meta-models** - As for now it contains two approaches to encoder-based warm-start:

* Metric learning (`Dataset2VecMetricLearning`) - As an encoder it uses Dataset2Vec which is trained in a way that it produces representations whose distances to each other correspond to distances of the landmarkers (vectors of performances of a predefined set of hyperparameter configuration)
* Landmarker reconstruction (`LandmarkerReconstructionTrainingInterface`) - As an encoder it uses Dataset2Vec which produces a latent representation of the entire dataset (of any size) and passes it to MLP which outputs predictions of the landmarker vector

**Selectors** - for usage for this intended meta-task `wsmf` provides API to use encoder for proposing hyperparameter configuration. It contains the following samplers:

* Selector which is choosing based on the learned representation that is applicable in the metric learning approach (`RepresentationBasedHpSelector`)
* Selector which is based on the reconstructed landmarkers (`ReconstructionBasedHpSelector`)
* Random selector from the predefined portfolio (`RandomHpSelector`)
* Selector which chooses the best configuration on average (`RankBasedHpSelector`)
* Selector that chooses configurations based on the vector of landmarkers itself (`LandmarkerHpSelector`)

## Examples of usage
**Training metric learning based meta-model**
```Python
# tensors X, y are torch.Tensor objects which correspond to feature and target matrices
train_datasets = { # training meta-dataset
"dataset_train_1": (tensor_X1, tensor_y1),
"dataset_train_2": (tensor_X2, tensor_y2),
...
}
val_datasets = { # validation meta-dataset
"dataset_val_1": (tensor_X1, tensor_y1),
"dataset_val_2": (tensor_X2, tensor_y2),
...
}

# tensors l1, l2, .. corresponds to vector of landmarkers in torch.Tensor format
train_landmarkers = { # training meta-dataset
"dataset_train_1": l1,
"dataset_train_2": l2,
...
}
val_landmarkers = { # validation meta-dataset
"dataset_val_1": l1,
"dataset_val_2": l2,
...
}

train_dataset = EncoderHpoDataset(train_datasets, train_landmarkers)
train_dataloader = EncoderMetricLearningLoader(train_dataset, train_num_batches, train_batch_size)
val_dataset = EncoderHpoDataset(val_datasets, val_landmarkers)
val_dataloader = EncoderMetricLearningLoader(val_dataset, val_num_batches, val_batch_size)
val_dataloader = GenericRepeatableDataLoader(val_dataloader) # Loader which produces repeatable batches

model = Dataset2VecMetricLearning()
trainer = pl.Trainer()
trainer.fit(model, train_loader, val_loader)
```
pip install -r requirements.txt
export PYTHONPATH=`pwd`

**Using selector based on reconstruction**
```Python
datasets = { # datasets to search from (in this case used for closest dataset search)
"dataset_1": (tensor_X1, tensor_y1),
"dataset_2": (tensor_X2, tensor_y2),
...
}
landmarkers = { # landmarkers to search from (is this case used for proposing best configurations)
"dataset_val_1": l1,
"dataset_val_2": l2,
...
}
configurations = [
{"hp1": val1, "hp2": val2},
{"hp1": val3, "hp2": val4},
...
]

meta_model = Dataset2VecForLandmarkerReconstruction.load_from_checkpoint("path_to_meta_model.ckpt")
selector = ReconstructionBasedHpSelector(
meta_model,
datasets,
landmarkers,
configurations
)
# Usage
new_dataset = (X, y) # torch.Tensor
n_configurations = 10
configurations = selector.propose_configurations(new_dataset, n_configurations)
```
### Loading data
```
python bin/load_data.py
```


## Development
Commands useful during development:
* Seting env variables - `export PYTHONPATH=(backtick)pwd(backtick)`
* Install dependencies - `pip install -r requirements_dev.txt`
* To run unit tests - `pytest`
* Check code quality - `./scripts/check_code.sh`
* Relase - `python -m build && twine upload dist/*`
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ include = ["wsmf"]

[project]
name = "wsmf"
version = "0.0.1"
version = "0.0.2"
authors = [
{ name="Antoni Zajko", email="antoni.zajko.1@gmail.com" }
]
Expand Down

0 comments on commit e58de99

Please sign in to comment.