Updated CONTRIBUTING and user_guide with better explanations

cavalab · Oct 16, 2024 · 1347936 · 1347936
1 parent 8507b55
commit 1347936
Show file tree

Hide file tree

Showing 3 changed files with 5 additions and 6 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -37,6 +37,7 @@ This folder should contain:
       -   `eval_kwargs` (optional): a dictionary that can specify method-specific arguments to `evaluate_model.py`.
       -   `get_population(est) --> List[RegressorMixin]`: a function that return a list of at most 100 expressions, if using pareto front, population-based optimization, beam search, or any strategy that allows your algorithm to explore several expressions. If this is not valid for your algorithm, you can just wrap the estimator in a list (_i.e._, `return [est]`). Every element from the returned list must be a compatible `Regressor`, meaning that calling `predict(X)` should work, as well as your custom `model(est, X=None)` method for getting a string representation.
       -   `get_best_solution(est)`: should provide an easy way of accessing the best solution from the current population, if this feature is valid for your algorithm. If not, then return the estimator itself `return est`.
+      -   We expect your algorithm to have a `max_time` parameter that lets us control the maximum execution time in seconds. When running the experiments in a cluster, we will give extra time to compensate for the overhead of initializing everything, and the maximum time considered is just the fit process. A signal `signal.SIGALRM` will be sent to your process if `fit(X, y)` exceeds the maximum time, and you can implement strategies to handle this signal. One idea is to store a random initial solution as the best and update it during the execution to ensure the `evaluate_model.py` script will find an equation to work on.
   3. `LICENSE` *(optional)* A license file
   4. `environment.yml` *(optional)*: a [conda environment file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) that specifies dependencies for your submission. 
   It will be used to update the baseline environment (`environment.yml` in the root directory). 
@@ -58,13 +59,11 @@ If your method names variables some other way, e.g. `[x_0 ... x_m]`, you can
 specify a mapping in the `model` function such as:
 
 ```python
-def model(est, X):
+def model(est, X=None):
     mapping = {'x_'+str(i):k for i,k in enumerate(X.columns)}
     new_model = est.model_
     for k,v in reversed(mapping.items()):
         new_model = new_model.replace(k,v)
 ```
 
 2. The operators/functions in the model are available in [sympy's function set](https://docs.sympy.org/latest/modules/functions/index.html). 
-
-### using populations
diff --git a/docs/user_guide.md b/docs/user_guide.md
@@ -122,7 +122,7 @@ done
 
 When a new algorithm is submitted to SRBench, a GitHub workflow will generate a docker image and push it to [Docker Hub](hub.docker.com). Ths means that you can also easily pull the images, without having to deal with local installations.
 
-To use docker, you first run `scripts/make_docker_compose_file.sh`. Then `docker compose up` should create the images.
+To use docker, you first run `bash scripts/make_docker_compose_file.sh` in the root directory. Then `docker compose up` should create the images.
 
 You can now submit arbitrary python commands to the image, _e.g._ `docker compose run feat bash test.sh`
 

diff --git a/experiment/test_population.py b/experiment/test_population.py
@@ -45,8 +45,8 @@ def test_population(ml):
     # Few samples to try to make it quick
     sample_idx = np.random.choice(np.arange(len(X_train)), size=10)
 
-    y_train = y_train[sample_idx]
-    X_train = X_train.iloc[sample_idx]
+    y_train = y_train.iloc[sample_idx]
+    X_train = X_train.iloc[sample_idx, :]
 
     ##################################################
     # fit with max_time