Skip to content

Commit

Permalink
Updated CONTRIBUTING and user_guide with better explanations
Browse files Browse the repository at this point in the history
  • Loading branch information
gAldeia committed Oct 16, 2024
1 parent 8507b55 commit 1347936
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 6 deletions.
5 changes: 2 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ This folder should contain:
- `eval_kwargs` (optional): a dictionary that can specify method-specific arguments to `evaluate_model.py`.
- `get_population(est) --> List[RegressorMixin]`: a function that return a list of at most 100 expressions, if using pareto front, population-based optimization, beam search, or any strategy that allows your algorithm to explore several expressions. If this is not valid for your algorithm, you can just wrap the estimator in a list (_i.e._, `return [est]`). Every element from the returned list must be a compatible `Regressor`, meaning that calling `predict(X)` should work, as well as your custom `model(est, X=None)` method for getting a string representation.
- `get_best_solution(est)`: should provide an easy way of accessing the best solution from the current population, if this feature is valid for your algorithm. If not, then return the estimator itself `return est`.
- We expect your algorithm to have a `max_time` parameter that lets us control the maximum execution time in seconds. When running the experiments in a cluster, we will give extra time to compensate for the overhead of initializing everything, and the maximum time considered is just the fit process. A signal `signal.SIGALRM` will be sent to your process if `fit(X, y)` exceeds the maximum time, and you can implement strategies to handle this signal. One idea is to store a random initial solution as the best and update it during the execution to ensure the `evaluate_model.py` script will find an equation to work on.
3. `LICENSE` *(optional)* A license file
4. `environment.yml` *(optional)*: a [conda environment file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) that specifies dependencies for your submission.
It will be used to update the baseline environment (`environment.yml` in the root directory).
Expand All @@ -58,13 +59,11 @@ If your method names variables some other way, e.g. `[x_0 ... x_m]`, you can
specify a mapping in the `model` function such as:

```python
def model(est, X):
def model(est, X=None):
mapping = {'x_'+str(i):k for i,k in enumerate(X.columns)}
new_model = est.model_
for k,v in reversed(mapping.items()):
new_model = new_model.replace(k,v)
```

2. The operators/functions in the model are available in [sympy's function set](https://docs.sympy.org/latest/modules/functions/index.html).

### using populations
2 changes: 1 addition & 1 deletion docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ done

When a new algorithm is submitted to SRBench, a GitHub workflow will generate a docker image and push it to [Docker Hub](hub.docker.com). Ths means that you can also easily pull the images, without having to deal with local installations.

To use docker, you first run `scripts/make_docker_compose_file.sh`. Then `docker compose up` should create the images.
To use docker, you first run `bash scripts/make_docker_compose_file.sh` in the root directory. Then `docker compose up` should create the images.

You can now submit arbitrary python commands to the image, _e.g._ `docker compose run feat bash test.sh`

Expand Down
4 changes: 2 additions & 2 deletions experiment/test_population.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ def test_population(ml):
# Few samples to try to make it quick
sample_idx = np.random.choice(np.arange(len(X_train)), size=10)

y_train = y_train[sample_idx]
X_train = X_train.iloc[sample_idx]
y_train = y_train.iloc[sample_idx]
X_train = X_train.iloc[sample_idx, :]

##################################################
# fit with max_time
Expand Down

0 comments on commit 1347936

Please sign in to comment.