Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📑 Feature Request: Playground documentation and usage #1491

Open
GemmaTuron opened this issue Jan 7, 2025 · 40 comments
Open

📑 Feature Request: Playground documentation and usage #1491

GemmaTuron opened this issue Jan 7, 2025 · 40 comments
Assignees
Labels
enhancement New feature or request

Comments

@GemmaTuron
Copy link
Member

GemmaTuron commented Jan 7, 2025

Describe your feature request.

Hi @Abellegese

I have tried to use the Playground for model testing but I find it extremely difficult to use with the documentation currently provided in GitBook. User-friendly steps should be detailed as well as instructions on how to interact with the Nox command, similar to the issues I found in the Model Tester documentation. Below a few questions to clarify before I can rewrite the docs as well as some bugs I am encountering:

  1. The playground consistently fails on MacOS with the following error. Is it only set up to work in Linux?
ERROR commands.py::test_command[fetch-eos3b5e] - FileNotFoundError: [Errno 2] No such file or directory: 'systemctl'
ERROR commands.py::test_command[serve-eos3b5e] - FileNotFoundError: [Errno 2] No such file or directory: 'systemctl'
ERROR commands.py::test_command[run-eos3b5e] - FileNotFoundError: [Errno 2] No such file or directory: 'systemctl'
ERROR commands.py::test_command[close-eos3b5e] - FileNotFoundError: [Errno 2] No such file or directory: 'systemctl'
  1. What is the sequence of commands that one should use for the playground. By guessing from the last section, but truly, this needs to be the first thing in the documentation, it should be something like:
pip install ersilia[test]
nox -f test/playground/noxfile.py -s setup
nox -f test/playground/noxfile.py -s test...
  1. How many tests are currently available? Is only these four?
    - test_from_github
    - test_from_dockerhub
    - test_auto_fetcher_decider
    - test_conventional_run

  2. Inside each test, aside from how the model is fetched, the tests that are run are the same? I struggle to understand the difference between the playground and the test command except in the way the model is fetched. All the tests that we want to do in the model outputs (ie not nulls, not wildly different between runs) only happen in the test command? Then isn't the playground incomplete?

  3. The playground is not testing H5 files, but I guess this is part of the refactoring being done?

  4. The models are not deleted, only closed? Meaning they will remain in the users computers after being tested? I see the delete command first in the line of Command Execution Summary, I do not know if this is a bug and is actually run after close or not:

                                                                     Command Execution Summary                                                                      
┌────────────────────────────────────────────────────┬─────────────────┬─────────────────┬─────────────────┬──────────────────────┬────────────────────────────────┐
│ Command                                            │ Description     │ Time Taken      │ Max Memory      │ Status               │ Checkups                       │
├────────────────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼──────────────────────┼────────────────────────────────┤
│ ersilia -v delete eos3b5e                          │ delete          │ 0.04 min        │ 142.82 MB       │ PASSED               │                                │
│ ersilia -v fetch eos3b5e --from_dockerhub          │ fetch           │ 0.79 min        │ 143.92 MB       │ PASSED               │ ✔ Folder exists at             │
│                                                    │                 │                 │                 │                      │ /home/gturon/eos/dest/eos3b5e  │
│                                                    │                 │                 │                 │                      │ ✔ DockerHub status is True     │
│ ersilia -v serve eos3b5e                           │ serve           │ 0.04 min        │ 142.90 MB       │ PASSED               │ ✔ DockerHub status is True     │
│ ersilia run -i files/input.csv -o files/result.csv │ run             │ 0.02 min        │ 175.70 MB       │ PASSED               │ ✔ File exists at               │
│                                                    │                 │                 │                 │                      │ files/result.csv               │
│                                                    │                 │                 │                 │                      │ ✔ File content check at        │
│                                                    │                 │                 │                 │                      │ files/result.csv               │
│ ersilia close                                      │ close           │ 0.18 min        │ 141.09 MB       │ PASSED               │                                │
└────────────────────────────────────────────────────┴─────────────────┴─────────────────┴─────────────────┴──────────────────────┴────────────────────────────────┘
  1. The example is generated with a simple function, instead of the example command? This can lead tofailing with models that have more complex inputs, why is the example command not implemented instead?
  2. How do I specify the model I want to test? Do I need to modify the config.yml manually? and if so, when does it use the model_id field vs the model_ids field? Shouldn't this be passed as a parameter of the nox command instead? Each model that you want to test. Currently I see both the model_id and model_ids lines with models but the nox command is only running the single model not the list.
  3. Same for python version, is it specified in the config.yml file only?
  4. When I run the command: nox -f test/playground/noxfile.py -s test_from_github if the config.yml file is not manually edited on the fetch_flags line, it will still pull the model from dockerhub according to what I see in the command execution summary: ersilia -v fetch eos5guo --from_dockerhub. It is not clear to me how this should happen, I believe it is a bug?
  5. cc for @DhanshreeA the testing from_github with model eos5guo fails, same as the test command, so there seems to be an issue with the model? It does work from dockerhub though. I do not understand @Abellegese why the serve and run appear as "PASSED" if the fetch has failed:
                                                                     Command Execution Summary                                                                      
┌────────────────────────────────────────────────────┬─────────────────┬─────────────────┬─────────────────┬──────────────────────┬────────────────────────────────┐
│ Command                                            │ Description     │ Time Taken      │ Max Memory      │ Status               │ Checkups                       │
├────────────────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼──────────────────────┼────────────────────────────────┤
│ ersilia -v delete eos5guo                          │ delete          │ 0.03 min        │ 142.53 MB       │ PASSED               │                                │
│ ersilia -v fetch eos5guo --from_github             │ fetch           │ 0.32 min        │ 146.68 MB       │ FAILED               │ ✔ Folder exists at             │
│                                                    │                 │                 │                 │                      │ /home/gturon/eos/repository/e… │
│ ersilia -v serve eos5guo                           │ serve           │ 1.67 min        │ 145.94 MB       │ PASSED               │                                │
│ ersilia run -i files/input.csv -o files/result.csv │ run             │ 0.02 min        │ 179.25 MB       │ PASSED               │ ✔ File exists at               │
│                                                    │                 │                 │                 │                      │ files/result.csv               │
│                                                    │                 │                 │                 │                      │ ✔ File content check at        │
│                                                    │                 │                 │                 │                      │ files/result.csv               │
│ ersilia close                                      │ close           │ 0.18 min        │ 140.75 MB       │ PASSED               │                                │
└────────────────────────────────────────────────────┴─────────────────┴─────────────────┴─────────────────┴──────────────────────┴────────────────────────────────┘
========================================================================= short test summary info ==========================================================================
FAILED commands.py::test_command[fetch-eos5guo] - AssertionError: Command 'fetch' failed for model ID eos5guo
================================================================= 1 failed, 3 passed in 133.17s (0:02:13) ==================================================================
nox > Command pytest commands.py -v failed with exit code 1
nox > Session test_from_github failed.

Same for model eos3b5e --from_github, it does fail at fetch time, I do not know if due to the example command as mentioned in the model test issue or a different reason

@GemmaTuron GemmaTuron added the enhancement New feature or request label Jan 7, 2025
@DhanshreeA DhanshreeA changed the title 📑 Feature Request: Playground documentatio and usage 📑 Feature Request: Playground documentation and usage Jan 7, 2025
@Abellegese
Copy link
Contributor

Hi @GemmaTuron thanks for the comments. Your comments highly valuable for better functionality of this testing systems.

for Q1) I created a PR that supports macOS but to only check the docker status, unfortunately we can not manipulate docker from python and subprocess in the macOS machine.
Q2) I will update this in the docs, that is correct. But you can also pick any session and run, wont necessary be sequential. But if you want to run test_fetch_multiple_models and test_serve_multiple_models in the github workflow, requires grouping them for parallelization since they are dependent.
Q3) There are six including test_fetch_multiple_models and test_serve_multiple_models. Originally decided in this issue #1368.
Q4) So playground was mainly inspired by what can we check after we performed some command (could be fetch, serve, run...). We do this check using rules defined in rules,py. For instance lets take this rule:

@register_rule("folder_exists")
class FolderExistsRule(CommandRule):
    def __init__(self):
        pass

    def check(self, folder_path, expected_status):
        actual_status = Path(folder_path).exists() and any(Path(folder_path).iterdir())
        if actual_status != expected_status:
            raise AssertionError(
                f"Expectation failed for FolderExistsRule: "
                f"Expected folder to {'exist' if expected_status else 'not exist'}, "
                f"but it {'exists' if actual_status else 'does not exist'}."
            )
        return {
            "name": f"Folder exists at {folder_path}",
            "status": actual_status,
        }

This rule will be executed after we fetch a command. Using this rule for instance I can check if the necessary model folder is existed in the required folder after fetch. My expected status is true, since in this case I want the folder exists in eos folder.

Other rule example:

@register_rule("file_exists")
class FileExistsRule(CommandRule):
    def __init__(self):
        pass

    def check(self, file_path, expected_status):
        actual_status = Path(file_path).exists()
        if actual_status != expected_status:
            raise AssertionError(
                f"Expectation failed for FileExistsRule: "
                f"Expected file to {'exist' if expected_status else 'not exist'}, "
                f"but it {'exists' if actual_status else 'does not exist'}."
            )
        return {
            "name": f"File exists at {file_path}",
            "status": actual_status,
        }

The above rule can executed after we execute the run command. If I specify the output of the file in the run cmd, I can check if the CLI created that file or not. After this we have FileContentCheckRule if the file exists it check its content are valid or not (previously it only supported json and csv not it h5 as well as it supports datastructures).

So on a quick glance above, you can see that we play on those ersilia commands using any rule we defined to check the healthiness of the model or the CLI.

Q5) Now it supports h5.
Q6) The reason putting delete command first is used to prevent fetching command failure. In the first PR created for this playground, didnt have this delete flag, but in the workflow fetch command starts to fail because if the model fetched on the previous jobs in the workflow, we can fetch it here again, requires model removal and it works. Its a feature,
Q7) This also included in the new PR, but the example command could not produce more than one example smiles, its a bug, I was running into this problem before.
Q8) This also included in the new PR, you can specify everything (all options in the config.yml) from the nox session like this:

nox -f noxfile.py test_from_github model_id=eos2db3 python_version=3.8 delete_mode=false
nox -f noxfile.py test_serve_multiple_models model_ids=[eos2db3, eos3b5e] runner=multiple 

Q9) Now you can also pass python version from the command as above
Q10) You are right, it was a bug and fixed in the new PR.
Q11) The fetch failed because you have model already fetch and the serve and run worked because of that, serving and running the existing model. Thats why you can specifiy the delete_model option.

@Abellegese
Copy link
Contributor

@GemmaTuron those comments of yours are very useful. I will update the pipelines for better use.

@GemmaTuron
Copy link
Member Author

Thanks @Abellegese we will discuss more in detail about all of this tomorrow. I think there is too much redundancy between the playground and the test module and it does not really make sense to duplicate all these efforts. Please do not modify the documentation at this point, I will take care of it.

@GemmaTuron
Copy link
Member Author

Hi @Abellegese I am noting this down in preparation for tomorrow's meeting. Please do not modify any more code or open more PRs before we can discuss everything.

Q1: Where was the documentation that specified the playground only worked for Linux? We were considering adding the Playground as the Model Test Workflow, if it cannot be used in a MacOS platform it is not useful to that end
Q2: I will modify the docs this time to make sure they are comprehensive. I hope this serves as a good example for future documentation. I need to understand what you mean about the multiple models command for example, as this is not documented anywhere. What / how do you "group for parallelization"?
Q3: Where are these six explained? I cannot see them anywhere
Q4: the question still stands. What is the difference between the checks used by the playground and the checks used by the Test Module? Shouldn't those be consolidated into a single one? Otherwise you are defining those rules for the Playground and the same ones as checks in the test. Feels like a duplication of effort.
Q5: ok thanks
Q6: If the model is not fetched the delete command still shows PASS, which is weird
Q7: What is the problem with the example command? Where is this bug explained? @DhanshreeA please can you confirm that the example command is working as it should?
Q8: how to pass all the information is not clear nor documented. Please before making any new PRs let's discuss
Q9: Same as above
Q10: So how does it work now? If I want to test from GitHub do I need to edit the config.yml or not?
Q11: I don't understand. The Model is First Deleted as you mention in Q6, so if deleted why is it still in the system? The delete option I think should be always by default on

@GemmaTuron
Copy link
Member Author

And as I play with it some more questions:
Q1: wouldn't it be better to specify a place outside the Ersilia repo to save the files? Now it saves the files in the /test folder, so if I do changes to ersilia I need to revert the ones in the /test folder least I push those test files as well.

Q2: about the different commands. can you explain why we have separately a fetch and serve for multiple models whereas for the single models they are fetched, served and run inside the same command (or I understand they are)

Q3: why / when is the autofetcher used?

Q4: In the conventional run, why are all those values hardcoded? the output file name for example?

@nox.session(venv_backend="conda", python=get_python_version())
def test_conventional_run(session):
    """Run pytest for standard and conventional run."""
    install_dependencies(session)
    update_yaml_values(
        {
            "runner": "single",
            "cli_type": "all",
            "fetch_flags": "--from_dockerhub",
            "output_file": "files/output_eos9gg2_0.json",
            "output_redirection": "true",
            "delete_model": True,
        }
    )
    logger.info("Standard and Conventional Run: Conventional")
    session.run("pytest", "commands.py", "-v", silent=False)

Q5: minor point -- if Docker is not active, the error you get is the following. I don't know if we want a more informative message or because this is aimed at developers it is enough:

Command: ersilia -v fetch eos3b5e --from_dockerhub
Description: fetch
Error: Expectation failed for FolderExistsRule: Expected folder to exist, but it does not exist.

@Abellegese
Copy link
Contributor

Hi @GemmaTuron we will discuss them in detail in the meeting.

@DhanshreeA
Copy link
Member

@Abellegese I don't understand, where is the example command not producing more than one input?

@Abellegese
Copy link
Contributor

Hi @DhanshreeA I attached it below. Same problem from the python API.
example_cmd

@DhanshreeA
Copy link
Member

@Abellegese Right, I see. The example command is fetching from the predefined example input file. Could you try ersilia example -n 3 -f input.csv --random? I will update the GitBook docs to reflect that by default, predefined is set, and if an example file exists, then the input will always be fetched from there.

@Abellegese
Copy link
Contributor

Thanks @DhanshreeA

@GemmaTuron
Copy link
Member Author

More info on the usage of the playground. @Abellegese I see these weird conda environments created by Nox, how do I delete them?

(ersilia) GemmaErsilia:ersilia gemmaturon$ conda env list
# conda environments:
#
                         /Users/gemmaturon/github/ersilia-os/ersilia/test/playground/.nox/setup
                         /Users/gemmaturon/github/ersilia-os/ersilia/test/playground/.nox/test_auto_fetcher_decider
                         /Users/gemmaturon/github/ersilia-os/ersilia/test/playground/.nox/test_fetch_multiple_models
                         /Users/gemmaturon/github/ersilia-os/ersilia/test/playground/.nox/test_from_dockerhub
                         /Users/gemmaturon/github/ersilia-os/ersilia/test/playground/.nox/test_from_github
base                     /Users/gemmaturon/miniconda3
chem                     /Users/gemmaturon/miniconda3/envs/chem
eos3b5e                  /Users/gemmaturon/miniconda3/envs/eos3b5e
eosbase-bentoml-0.11.0-py310     /Users/gemmaturon/miniconda3/envs/eosbase-bentoml-0.11.0-py310
eosbase-bentoml-0.11.0-py311     /Users/gemmaturon/miniconda3/envs/eosbase-bentoml-0.11.0-py311
ersilia               *  /Users/gemmaturon/miniconda3/envs/ersilia

@GemmaTuron
Copy link
Member Author

Update and please @Abellegese confirm if this is correct:

There were legacy files inside a ./nox folder and an entire /ersilia copy inside the playground after running it. I need to delete them I understand?

@Abellegese
Copy link
Contributor

Hi @GemmaTuron for the first question, nox create an isolated venv thats why it creates them. Second question you specifiy overwrite_ersilia_repo that deletes the ersilia folder if it exists. Deleting the files inside .nox folder I dont think will change things. Inside it, there sessions with isolated venv there not ersilia. Ersilia cloned ersilia from github is saved inside the playground folder. If that answer you questions.

@GemmaTuron
Copy link
Member Author

Hi @Abellegese

Sorry I do not understand.

  1. The venv(s) that .nox creates, how do I delete them?
  2. The ersilia folder that is cloned from github inside the playground folder: this is not a good practice - no files that are only user-specific (any tests etc) should be in a git folder that can inadvertently be pushed back into the main GitHub folder. We should create a playground folder somewhere else if anything. Let's discuss this on Thursday's meeting.

Please let me know how can I delete the .nox created venvs

@GemmaTuron
Copy link
Member Author

@Abellegese I found that there is a hidden folder inside ersilia/test/playground/.nox that if it is deleted eliminates these environments. Is that right?
And I still think we should move the testing folders outside the cloned github repo folder.

@Abellegese
Copy link
Contributor

Yes @GemmaTuron, those are the session isolated environment. Yes indeed there has to be some ways to clean them up after some sessions. Also note that when you run nox it removes them and creates a new one. In the PR I have created, I added some feature to reuse if the venv for sessions already created.

The github repo yes we need to move it maybe to ~/eos/tmp/plaground. Here we can store things that was dynamically created. Also I was thinking to move the entire nox session folder there in the above dir.

@DhanshreeA DhanshreeA self-assigned this Jan 22, 2025
@DhanshreeA
Copy link
Member

My observations from the current playground implementation:

  1. Ersilia CLI repo gets cloned wherever the playground is run, it should ideally not be cloned afresh if Ersilia is already being used to run the playground which is what we will end up doing in the CI pipelines.
  2. The playground does not clean up after itself in terms of environments, ersilia installation, extra files created, etc. This is a non issue on CI because those are ephemeral machines, however this will definitely be an issue with users running the playground on their machines.
  3. The playground tests are quite flaky, with a success rate of 1 in every 3 runs running end to end successfully.
  4. In the test_cli_single particularly, fetch fails over and over again but the serve, run, and close commands seem to work. This does not make sense, because how can a model that didn't fetch be served or run at all?
  5. The logs for the playground tests are limited and do not give a clear picture of what's going on and why a test failed. I see that the logs are generated for each nox session, however those don't obviously show up in the pipeline run, and currently aren't being uploaded as artifacts, which should happen so we can go and figure out what went wrong.

@Abellegese
Copy link
Contributor

Thanks @DhanshreeA .

@Abellegese
Copy link
Contributor

Abellegese commented Jan 28, 2025

Hey @DhanshreeA below are the rules I came up with for the commands

1. Fetch

Flags:

  • auto_fetcher, from_dockerhub, from_s3, from_dockerhub, version

Checks:

  • Verify the destination and repository folder in eos for the fetched model.
  • Verify if the the dest folder contains a necessary files and content as below:
    • Check the model_source.text file and check if the it has the correct source for the fetch models.
    • Check the api_schema.json file existence and content.
    • Check the file from_dockerhub.json and check also check if it is "docker_hub": true if fetched from the dockerhub else "docker_hub": false
    • Check if status.json contains done: true value.
  • Ensure the Docker image exists if fetched from DockerHub.
  • Ensure the Conda env exists if fetched from from_github or from_s3.
  • Exit with status 0 if no runtime error encountered.

Notes:

  • The rules selected for fetch believed to be check the main important functionality of the fetch commands, for instance:
    • Copying information
    • Copying schecma files
    • Downloading and saving models (including from dockerhub)
    • and more

2. Serve

Flags:

  • No flag used

Checks:

  • If a session folder is created for the served model.
  • Check the session.json and eosxxxx.pi file existence and validate its content.
  • Check the session.json "service_class": "pulled_docker" if it matches the correct service class of the fetched model
    • Apply necessary Docker-related operations.
  • Verify if the api respond status_code of 200, to check if it served correctly
  • Exit with status 0 from the click CLI runner .invoke function if no runtime error encountered.

3. Run

Flags:

  • inp_types (str, list, csv)
  • output_types (csv, json, h5)

Checks:

  • Verify the existence of the output files generated by run command.
  • Ensure the generated file content is valid (not None, null, or empty).
  • Exit with status 0 from the click CLI runner .invoke function if no runtime error encountered.

4. Catalog

Flags:

  • --more, --as-json, --f, --local, --hub

Checks:

  • If local, display the fetched models with their sources.
  • Validate the correctness of JSON structure (specifically if all the keys and the values is not empty or missing).
  • Ensure the file generated using the -f flag contains all required entries.

5. Example

Flags:

  • --sample, -n, --random, --predefined, -c, -f

Additional Checks:

  • Verify the input key has compound entries for --simple and --predefined flags when displaying in the terminal.
  • Same for file generated file.
  • Matching length between generated and requested sample sizes.
  • Verify valid compound entries (optional)

6. Delete

Flags:

  • --all

Checks:

  • Verify that all containers and images if model fetched from Docker Hub are removed.
  • Verify the destination and repository for the model should not exist after cleanup.
  • Conda environemt should be removed.

7. Close

Flags:

  • No flag used

Checks:

  • Verify if session files removed

8. Test

Flags:

  • --deep, from_dockerhub, as_json

Checks

  • No checks used here because most of its opertion depends on other commands which their execption will be caught in thier own session.
  • Simple exit status 0 from the click CLI runner .invoke function if no runtime error encountered is enough for test commands.

@Abellegese
Copy link
Contributor

I will write here how to use the commands in detail:

@Abellegese
Copy link
Contributor

Abellegese commented Feb 1, 2025

Playground CLI Usage Guidelines

Installation

To use Playground CLI, install ersilia first using instruction given here. Then install package for testing purposes as given below in ersilia activated venv:

pip install -e ".[test]"

nox installs ersilia in its isolated virtual environment from local source when everytime we run nox session such as execute.

The playground test folder found in test/playground. Either you go into this folder, which does not require to specify the nox file or you specify the nox file but in the ersilia root directory. For instace, if you go into playground fodler, you can then run a simple command like clean as given below:

nox -s clean -- --cli <command> [options]

Or, from ersilia root directory, simply:

nox -s clean -f test/playground/noxfile.py [path/to/noxfile.py] -- --cli <command> [options]

Command mutual dependency

The commands in Ersilia are interdependent, meaning that running a single command often requires executing a series of prerequisite commands. For example, to run a model, you must first execute fetch, followed by serve, and finally run. If we want to test, for instance the healthiness of the run command, we need to execute the prerequisite commands. To simplify this process in the testing playground, we have introduced a CLI Dependency Map. This map outlines the necessary commands required to execute a given command. The details are provided below.

  • serve: fetch
  • run: fetch, serve
  • close: serve
  • example: serve

delete command require fetching but we can run it after all commmands that we specified are finished execution. Also we can specify it before fetch. Now if we specify those command alone, for instance:

nox -s execute -- --cli run

other required commands will be executed first, which in the above case fetch and serve. This a bit simplify the commands.

Handling python virtual and files

  • nox venv files will be stored at ~/eos/playground/.nox
  • Other files such as input and output will be stored at ~/eos/playground/files and error logs will be stored at ~/eos/playground/logs.
  • Those files gets cleared out with nox session called clean.

<style> table { width: 100%; border-collapse: collapse; } .flag-column { width: 150px; } th, td { border: 1px solid #000; padding: 8px; text-align: left; } </style>

Options & Flags

🔹 Nox built-in flags

Nox provides built-in commands that can be used to run tasks. These commands are:

  • -p: for specifying the python version. If you dont specify the python version, the sessions by default will be executed on these python versions, 3.8, 3.9, 3.10, 3.11, and 3.12.
  • -fb: stands for force backend, used to change the backend to run nox sessions. By default the sessions will be executed on conda, but we can change it to virtualenv using this command. More detailed example given as below in the table.

Note that, both of this commands should be specified before -- args that separate nox and custom flags (eg.nox -s execute [nox flags] -- [custom]).

Flag Description Example
-p Used to specify the python verison nox -s execute -p 3.8 -- [other flags after this]
nox -s execute -p 3.8 3.9 --
-fb Used to change python backends (eg. from conda to virtualenv) nox -s execute -fb virtualenv -- [other flags after this]

🔹 General Settings

Flag Description Default Example
--activate_docker Activates or deactivates Docker.
Use: To test if auto autofetcher decides not fetch from dockerhub if the docker is inactive and vice versa.
true nox -s execute -- --activate_docker true
--log_error Enables or disables logging of errors as file, which will be stored in ~/eos/playground/logs/. Each command failures will create a standalone file, with datetime on it in a string format. For instance catalog_20250129_145802.txt true nox -s execute -- --log_error false
--silent Enable or disable logs from ersilia command execution true nox -s execute -- --silent false
--show_remark Displays a remark column in the final execution summary table. Remark is the output being displayed in the terminal if the ersilia commnads executed successfully false nox -s execute -- --show_remark true
--max_runtime_minutes Sets the maximum execution time for a run command.
Use: To test model speed if its appeared to be slow.
10 nox -s execute -- --max_runtime_minutes 5
--num_samples Sets the sample size to create input for run command. 10 nox -s execute -- --max_runtime_minutes 5

🔹 Command Flags

Note that when you pass values for any of the flags given below, will overwrite the default values.

Command Selection (--cli)

Flag                          Description Default Example
--cli Specifies ersilia commands to run in order (fetch, serve, run, catalog, example, test, close, delete). Default is all, which executes commands in this order: "fetch", "serve", "run", "close", "catalog", "example", "delete", "test". all nox -s execute -- --cli fetch serve run
nox -s execute -- --cli run

Ersilia Command flags

Note: every ersilia flags for the commands such as fetch: --from_gituhb, delete: --all, etc, should bas passed without --. For example nox -s execute -- --fetch from_dockerhub version [img-tag].

Flag                     Description Default Example
--fetch Fetches models from sources (from_github, from_dockerhub, from_s3, version) --from_github nox -s execute -- --fetch from_dockerhub version dev
nox -s execute -- --fetch from_s3
--run We don't specifically use this flag, instead we use --input_types and --outputs which are used to specifiy the input types such as str, list, csv and output file types result.csv, result.json, result.h5. Then the flag will be generate automatically in such format ["-i", "input", "-o", "output"] None None
--example Generates example input for a model (-n, --random, -f). If we specify file name eg. example.csv, it will be saved in the path we specify to it, meaning we should pass the path to the file. ["-n", 10, "--random"] nox -s execute -- --example -n 10 random/predefined -c -f example.csv
--catalog Retrieves model catalog from local or hub ["--more", "--local", "--as-json"] nox -s execute -- --catalog hub
--test Tests models at different levels (shallow, deep, from_github, from_dockerhub, from_s3) ["--shallow", "--from_github"] nox -s execute -- --test deep from_dockerhub/from_s3/from_github
--delete Used to delete modes and has one flag all None nox -s execute -- --delete all

Note that when you pass values for any of the flags given below, will overwrite the default values.

🔹 Other flags

Flag                     Description Default Example
--outputs This is used with run command and used to specify output files (result.csv, result.h5). Note that we only specified the file name, the path will be automatically set to ~/eos/playground/files/{file_name.{csv, json, h5}} [results.{csv, json, h5}] nox -s execute -- --outputs result.csv result.h5
--input_types This is also used with run command to define input formats (str, list, csv). The same List of (str, list, csv) nox -s execute -- --input_types str list csv
--runner Specifies execution mode (single, multiple). The single mode is used to execute commands using one model (for instance, the default model ID for this mode is eos3b5e). Whereas the multiple mode will use multiple models to execute the given commands. For this mode, by default, we run eos5axz, eos4e40, eos2r5a, eos4zfy, eos8fma. single nox -s execute -- --runner multiple
--single Used to specify or override the default model ID used for single running mode. eos3b5e nox -s execute -- --single eosxxxx
--multiple Used to specify or override the default model IDs used for multiple running mode. [eos5axz, eos4e40, eos2r5a, eos4zfy, eos8fma] nox -s execute -- --multiple eosxxxx eosxxxx eosxxxx

Example Usage

All example given below assumes you are at dir test/playground. This does not require to specify the noxfile. Nox by default use noxfile.py which found in test/playground.

Run all commands with their default values

nox -s execute -p 3.11

Fetch Models from DockerHub, Serve it and run it. With python 3.10

In this example the run input types and output files are default.

nox -s execute -p 3.10 -- --cli fetch serve run --fetch from_dockerhub

1. Fetch Models from DockerHub, Serve it and run it just with by specifying one command "run"

In this example we specifies input types and output files.

nox -s execute -p 3.10 -- --cli run --fetch from_dockerhub --input_types str list --outputs result.csv result.h5

2. Other examples

nox -s execute -p 3.10 -- --cli serve run catalog example

This will clean all nox related resources (venv, files, logs...)

nox -s clean

To test the closing command, if it successfull cleared out sessions created during serve.

nox-s execute -p 3.10 -- --cli close

Test a model in shallow mode. We use delete first because there is a fetching process during test and we want to cleared out previously existing models in the syste.

nox -s execute -p 3.10 -- --cli delete test

Running with multiple mode

nox -s execute -p 3.10 -- --cli fetch serve run --fetch from_dockerhub --runner multiple --outputs result.csv

Environment Variables

Variable Description Example Value
TEST_ENV Used to pass yes_or_no prompt at test command TEST_ENV=true
CONFIG_DATA used to pass config data to the pytest file CONFIG_DATA=config_json_data

@GemmaTuron
Copy link
Member Author

GemmaTuron commented Feb 3, 2025

Okay, a lot of work here thanks @Abellegese !

Help me clarify a bit so we can write solid documentation for end-users. I don't think there is need for additions to the playground at the moment just to better understand what goes on under the hood and maybe some small bugfixes. We can then decide if any final edits are required.

Basic steps I am running:

  1. Checkout the corresponding Playground branch as it was not merged at the time of testing
  2. Pip install ersilia in editable mode and [test] extension on a conda env py3.12
  3. cd into ersilia/tests/playground
  4. run nox -s execute -p 3.11

Comments:

  • It seems to require sudo password the first time you run it but not after? At this step commands.py [sudo] password for gturon:. If that is the behaviour needs to be added to the documentation. Is that correct?
  • About flags. If I understood the documentation correctly, you need to pass the -- twice. For example nox -s execute -- --activate_docker true? The only commands that do not use this are -p and -fb? I think -v as well
  • Error logs: there is a sample_error_log.txt file in ersilia/test/playground/files. Maybe this is no longer needed if the logs go into eos? As for the logs, I don't get anything really informative even if the models are failing. For example see attached, the model failed because docker was not active but the only thing that popped in the logs was this:
    delete_20250203_113011.txt
  • Input/output files: while some files appear in the eos/files, I also see files in ersilia/test/playground being created. In particular, when I simply run nox -s execute -p 3.11 which I believe uses eos3b5e by default, I see the following files: file.csv, file.h5, file.json, input.csv, output1.csv, output2.csv, result.csv in my playground folder in ersilia/tests, not in eos, and I need to delete them manually.
  • docker_active: perhaps I am not understanding this correctly. It is true by default, so does it mean it will try to activate docker if it is not active? At the moment the tests fail unless I activate docker manually from my end.
  • I tried to use the command with docker_active false to see what it does but it fails. I understand with docker_active false it will simply try to run all the commands fetching from github? This is the output, can you help me understand where it fails? I think this is because the test command is failing but it should not right? Also all the tests in the Playground table that is printed at the end "Command Execution Summary" appear as passed but those do not include the test command: test_20250203_120017.txt and nox_test.txt
  • Seeing the flags listed under nox:
usage: nox [-h] [--version] [-l] [--json] [-s [SESSIONS ...]] [-p [PYTHONS ...]] [-k KEYWORDS] [-t [TAGS ...]] [-v] [-ts] [-db {conda,mamba,micromamba,virtualenv,venv,uv,none}]
           [-fb {conda,mamba,micromamba,virtualenv,venv,uv,none}] [--no-venv] [--reuse-venv {yes,no,always,never}] [-r] [-N] [-R] [-f NOXFILE] [--envdir ENVDIR]
           [--extra-pythons [EXTRA_PYTHONS ...]] [-P [FORCE_PYTHONS ...]] [-x] [--no-stop-on-first-error] [--error-on-missing-interpreters] [--no-error-on-missing-interpreters]
           [--error-on-external-run] [--no-error-on-external-run] [--install-only] [--no-install] [--report REPORT] [--non-interactive] [--nocolor] [--forcecolor] 

are the more technical ones explained somewhere? and is the -db the same as -fb ?

  • Is there a way to get the results of the playground in a table?

@Abellegese
Copy link
Contributor

Abellegese commented Feb 3, 2025

Q1) Yes as you know we have docker manipulation system so it requires priviliges
Q2) Yes so basically -- is a nox way of recognizing or separating out flags that are not built-in.
Q3)) Correct we dont need those logs inside the playground test. We have now traceback supported error logs. But as I see in your logs not showed up which confuses me. I could not reproduce the error. But seems there is a "permission denied" which has been blocking me doing testing on delete command, @DhanshreeA solved the issue but seems coming again.
Q4)) This files iles: file.csv, file.h5, file.json, input.csv, output1.csv, output2.csv, result.csv are coming from the test command and we agreed to put there (cwd) so you might need to delete them manually.

@Abellegese
Copy link
Contributor

Abellegese commented Feb 3, 2025

Q5) On macOS you need to do it manually and if your OS is mac it will raise runtime error telling user to do it manually

 elif system_platform == "Darwin":  # macOS
      print("Stopping Docker programmatically is not supported on macOS.")
      raise RuntimeError("Cannot stop Docker programmatically on macOS.")

This is also true for starting docker but at least I implemented opening desktop version of docker if it is installed in user system.

@GemmaTuron
Copy link
Member Author

GemmaTuron commented Feb 3, 2025

Q5) On macOS you need to do it manually and if your OS is mac it will raise runtime error telling user to do it manually

elif system_platform == "Darwin": # macOS
print("Stopping Docker programmatically is not supported on macOS.")
raise RuntimeError("Cannot stop Docker programmatically on macOS.")
This is also true for starting docker but at least I implemented opening desktop version of docker if it is installed in user system.

Thanks @Abellegese this is what I understood but I am working on Ubuntu, so shouldn't it be automatic? Maybe it is a user setting somewhere on Docker Desktop. At the moment it does not activate Docker Desktop if I pass the flag as true

@Abellegese
Copy link
Contributor

Q6) Exactly if docker is not activated, since fetch flag by default is None, it will decide to fetch from github. But I saw the log and its confusing to know what happened. But the traceback locate error point in ersilia.utils.docker, which I used simple docker to get running containers. This might be related to the docker inactive issue.

  • I think @GemmaTuron I need to put small exception handling to raise Error specific to this docker inactive issue in user computer.

@Abellegese
Copy link
Contributor

Q5) On macOS you need to do it manually and if your OS is mac it will raise runtime error telling user to do it manually
elif system_platform == "Darwin": # macOS
print("Stopping Docker programmatically is not supported on macOS.")
raise RuntimeError("Cannot stop Docker programmatically on macOS.")
This is also true for starting docker but at least I implemented opening desktop version of docker if it is installed in user system.

Thanks @Abellegese this is what I understood but I am working on Ubuntu, so shouldn't it be automatic? Maybe it is a user setting somewhere on Docker Desktop

Yes if this is ubuntu it should be automatic. I am using ubuntu but I could not reproduce this in my computer

@Abellegese
Copy link
Contributor

Q7) db: stands for default backend is not always guarantee to change the backend. Whereas fb (stands for force backend) can always change the backend to whatever we choose.

@GemmaTuron
Copy link
Member Author

Q7) db: stands for default backend is not always guarantee to change the backend. Whereas fb (stands for force backend) can always change the backend to whatever we choose.

that helps. I will only reference -fb in the docs

@GemmaTuron
Copy link
Member Author

Ok @Abellegese

Before I ask more questions summary of what we have discussed agreed so far:

Q1) Yes as you know we have docker manipulation system so it requires priviliges - Ok this is added in the documentation now
Q2) Yes so basically -- is a nox way of recognizing or separating out flags that are not built-in. - Ok this is also clear in the documentation
Q3) Correct we dont need those logs inside the playground test. We have now traceback supported error logs. But as I see in your logs not showed up which confuses me. I could not reproduce the error. But seems there is a "permission denied" which has been blocking me doing testing on delete command, @DhanshreeA solved the issue but seems coming again. - We can remove this folder from ersilia then? Will you modify your PR? About the Permission Denied Error I'll let Dhanshree share more about it.
Q4) This files: file.csv, file.h5, file.json, input.csv, output1.csv, output2.csv, result.csv are coming from the test command and we agreed to put there (cwd) so you might need to delete them manually. - I see, I did not know those came from the test command. I believe it would be best they are stored somewhere else easy to delete, what do you think? @DhanshreeA had you thought of that already?
Q5) Docker. I have tried with a model for which I know Docker works (eos4e40) and indeed I think it worked! Starting Docker... Maybe it was a problem of the model eos3b5e as I report in the test command
Q6) I believe this is a problem with the model itself and the test command and I have followed up on that in the Model tester issue.
Q7) Ok, I will add only the -fb in the docs

@Abellegese
Copy link
Contributor

Yep thats it @GemmaTuron nicely summerized.

@GemmaTuron
Copy link
Member Author

And then a few questions to better understand:

  1. Running nox -s execute -p 3.10 -- --cli fetch is equivalent to running nox -s execute -p -- --fetch ?
  2. If I do not specify from, in the nox -s execute -p 3.10 -- --cli fetch it will fetch from_github only or like the baseline nox command it will fetch from all sources?
  3. If the --single or --multiple flag is not passed everything runs by default on eos3b5e?
  4. I don't trylu understand the explanation on the --run flag. If I run nox -s execute -p -- --run what is the default behaviour? Where is the model fetched from?
  5. Delete flag: it deletes the model specified at fetch time, or only works with all by default?
  6. Close flag: is not in the table but I guess it exists? does it make sense?
  7. I need help understanding the output of the playground. I have run nox -s execute -p 3.10 --activate_docker true --silent false --single eos4e40 and I get 2 failed, 6 passed, 5 warnings in 1196.95s (0:19:56) but the only real fail I see is in the catalog command. I do not see where the other fail is, and the warnings I cannot find. The catalog error can be due to the .json file not being updated appropiately? catalog_20250203_162302.txt and eos4e40_playground.txt

I have updated the documentation based on the information you provided! Hope you like it, its here

@Abellegese
Copy link
Contributor

Abellegese commented Feb 3, 2025

Q1) Nope its not equivalent --cli used to define ersilia commands that we want to run. --fetch is used to pass ersilia's fetch flags to nox. Same for other nox flags such as --example, --catalog.
Q2) If you don't specify anything, it fetches by automatically deciding. If docker is inactive from github.
Q3) Yep but you can pass any model you are interested in. For example --single eos2db3 for --multiple eos2db3 eos3b5e....
Q4) I just explained it there but better to remove it from the documentation. I used it internally to build the run flags. Note that if you don't specify any fetching sources such as --fetch from_github for instance, it decides automatically. Run commands by default run model for input types of str, list, csv and output types of result.{csv, json, h5}. We can pass those using --input_types and --outputs.
Q5) By default it deletes eos3b5e but you can pass --cli delete --delete all to delete all.
Q6) Yep it does need any flag thats why I didn't put serve and close. But we can show them in usage example
Q7) So according to the error log :

  • the first error is that it founds None in the generated catalog command json results Input Shape': None and same for 'Output Shape': None. Thats why it raises the error.
1) Check 'Catalog json content is valid': False and Details: 'Validation failed for key 'Input Shape' in object: {'Index': 2, 'Identifier': 'eos3mk2', 'Slug': 'bbbp-marine-kinase-inhibitors', 'Title': 'BBBP model tested on marine-derived kinase inhibitors', 'Task': ['Classification'], 'Input Shape': None, 'Output': ['Probability'], 'Output Shape': None, 'Model Source': 'Local Repository'}'

@GemmaTuron
Copy link
Member Author

Hi @Abellegese

Thanks, but then for Q1 it is the same to run --cli fetch than to run directly --fetch, correct? osrry if that was not clear.
How come the catalog is not correct? I think this has to do with issues in the Model test. does the model test modify the local catalogue by any chance?

@Abellegese
Copy link
Contributor

Hi @GemmaTuron this one is important question that weather test command is modifying the catalog result. Can you tell me more details on how you used the test command for this model (eos3mk2)? Or you go to eos/temp/eos3mk2/ and see if the metadata.yml is recently updated ?

@Abellegese
Copy link
Contributor

The first question is, its not --fetch is a way to pass you fetch flags so that you can fetch from you desired sources, But --cli fetch uses autodecider by default to fetch models.

@GemmaTuron
Copy link
Member Author

The first question is, its not --fetch is a way to pass you fetch flags so that you can fetch from you desired sources, But --cli fetch uses autodecider by default to fetch models.

ahh makes sense thanks!

@GemmaTuron
Copy link
Member Author

Okay @Abellegese
To summarise and wrap up the Playground features, a summary of everything I think needs to be updated:

  1. There is a block "permission denied" on the delete command. This needs to be fixed system wide?
  2. The logs in the playground test are not required and can be eliminated
  3. the test command (not the playground) produces some files that are stored locally from where you run the test. Would it make sense to store them somewhere specific, like a temporal folder?
  4. The issue with the catalog is difficult to debug. I do not think it is related to the playground itself so we can consider it in the test command if anything

For the rest I think the documentation is basic but sufficient also because this feature is aimed at more advanced developers and they can play with the different command combinations as they see fit

@Abellegese
Copy link
Contributor

Thanks @GemmaTuron I will a change for the 2 point.

@Abellegese
Copy link
Contributor

The rest of the problem will be addressed in the test command fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: On Hold
Development

No branches or pull requests

3 participants