Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Independent irace Run #34

Closed
DE0CH opened this issue Jul 12, 2022 · 7 comments
Closed

File Independent irace Run #34

DE0CH opened this issue Jul 12, 2022 · 7 comments

Comments

@DE0CH
Copy link
Contributor

DE0CH commented Jul 12, 2022

I think it would be more flexible for some advanced workflow if irace can pass all its information to and from command line argument, stdin, and stderr without using any file and spawning any process. This is useful when irace is used as an intermediate step in an automated workflow and creating interfaces with other programming languages.

Particularly, creating files as parameters to pass into irace is not flexible if there are multiple simultaneous runs of irace in a machine with different configurations because each parameter file needs an unique name, which can clash if not done carefully and leak disk space if the files are not cleaned up after they are used. If everything can be passed in from the command line, there would be no clashes or leaked resources. An easy option is to just pass in the content that would be in parameters.txt and scenario.txt as strings in the command line argument. Escape sequences (such as spaces, quotes and new lines) shouldn't be too much of an issue because a lot of programming languages have ways to pass a list of args (instead of a string of args separated by spaces) and receive args as a list.

Furthermore, it would be good if irace can pass the arguments that it uses to call the target runner back by using stdout instead of spawning a new process itself. This would give the user more control over how the target runner is run. For example, they can more easily implement a custom load distributor for a very custom (or hacky) cluster. It will also make it easier to create interfaces with other programming languages because the wrapper just needs to parse arguments given by irace into a native data structure and do whatever next step the user wants (e.g. calling a function passed in as a parameter).

@MLopez-Ibanez
Copy link
Owner

Particularly, creating files as parameters to pass into irace is not flexible if there are multiple simultaneous runs of irace in a machine with different configurations because each parameter file needs an unique name, which can clash if not done carefully and leak disk space if the files are not cleaned up after they are used. If everything can be passed in from the command line, there would be no clashes or leaked resources. An easy option is to just pass in the content that would be in parameters.txt and scenario.txt as strings in the command line argument. Escape sequences (such as spaces, quotes and new lines) shouldn't be too much of an issue because a lot of programming languages have ways to pass a list of args (instead of a string of args separated by spaces) and receive args as a list.

Perhaps define a JSON format and pass all the setup as a single input string in stdin so one can do cat conf.json | irace --stdin?

Furthermore, it would be good if irace can pass the arguments that it uses to call the target runner back by using stdout instead of spawning a new process itself. This would give the user more control over how the target runner is run. For example, they can more easily implement a custom load distributor for a very custom (or hacky) cluster. It will also make it easier to create interfaces with other programming languages because the wrapper just needs to parse arguments given by irace into a native data structure and do whatever next step the user wants (e.g. calling a function passed in as a parameter).

I don't understand completely what you mean. You still need to launch a new process (the target-runner process). If you mean that irace would send messages to some pipe and the pipe will be read by a continuously running process and use each line to do something, this seems like a good idea and probably not too hard to implement by adding some option "--pipe pipe_name" that instead of launching a process for each target-runner call, prints the call to the pipe pipe_name.in and reads output from pipe_name.out. But I wonder how parallelization would work in that case. I'm happy to incorporate an implementation of this idea but I don't have the time to implement it myself. Having a dummy but fully functional example that at least works in Linux would be helpful.

Still, I don't see how the above helps with interfacing with other programming languages. Could you explain? Perhaps with an example?

@DE0CH
Copy link
Contributor Author

DE0CH commented Jul 13, 2022

Perhaps define a JSON format and pass all the setup as a single input string in stdin so one can do cat conf.json | irace --stdin?

That would work too. I imagine this to be used by the Python package (instead of directly by the user), so whichever is easiest to implement.

I don't understand completely what you mean. You still need to launch a new process (the target-runner process). If you mean that irace would send messages to some pipe and the pipe will be read by a continuously running process and use each line to do something, this seems like a good idea and probably not too hard to implement by adding some option "--pipe pipe_name" that instead of launching a process for each target-runner call, prints the call to the pipe pipe_name.in and reads output from pipe_name.out. But I wonder how parallelization would work in that case. I'm happy to incorporate an implementation of this idea but I don't have the time to implement it myself. Having a dummy but fully functional example that at least works in Linux would be helpful.

The purpose of this is to let the user control how a new process is spawned. I was thinking instead of calling an executable and passing in the parameters as command line options, irace would just print those to stdout, and whichever calling process would just read it, execute the target runner and feed the result back to irace through its stdin.

User would be responsible for spawning new process with possibly custom scheduler and load balancer to improve performance. To keep track of which run is which, irace can specify an run id, which the user would keep track of and return the result with the run id so irace knows how to keep track of it.

Still, I don't see how the above helps with interfacing with other programming languages. Could you explain? Perhaps with an example?

It is useful for allowing the Python (or other language) package to accept a function as the target runner. Let's assume the function that launches irace looks like this (pseudocode):

def run_irace(target_runner, scenarios, parameters, ...):
    call irace and input scenarios, parameters from stdin and capture stdout
    for line in stdout:
        res = target_runner(line)
        write res to stdin

This would be quite hard to implement if irace executes a file because of closure, for example, if users writes:

size = 10
def target_runner():
    for i in range(size):
        do something 
run_irace(target_runner, ...)

Python captures the variable size in the closure, but if the package simply writes the body of the function into a python file and pass it to irace, it would not work.

@MLopez-Ibanez
Copy link
Owner

This would be quite hard to implement if irace executes a file because of closure, for example, if users writes:

size = 10
def target_runner():
    for i in range(size):
        do something 
run_irace(target_runner, ...)

Python captures the variable size in the closure, but if the package simply writes the body of the function into a python file and pass it to irace, it would not work.

The above already works in iracepy: See the updated https://github.com/auto-optimization/iracepy/blob/main/example_dual_annealing.py

I'm not sure why a package will write the body of a function to a file without all the code required to make the function work. If the function uses numpy but the package only writes the body and not the line "import numpy", then the function will not work. Maybe I'm missing something but I see only two usecases:

  1. The user wants to do everything within Python: Then define target_runner and everything it needs in the same file as the call to irace, or in a different file (using a class to keep all state, closures, etc.) or in a proper package that is imported.

  2. The user wants to keep irace and target-runner separated. Then define target_runner within an executable .py file that accepts command-line arguments (there are many examples in the irace R package) and let irace call that executable.

The purpose of this is to let the user control how a new process is spawned. I was thinking instead of calling an executable and passing in the parameters as command line options, irace would just print those to stdout, and whichever calling process would just read it, execute the target runner and feed the result back to irace through its stdin.

You can do this now. Just make your target_runner spawn a daemon, fork or some other technique for parallelism. Subsequent calls will just connect to that daemon. How irace communicates with target_runner is independent of how target_runner communicates with the daemon (it can communicate via internet if you wish to).

User would be responsible for spawning new process with possibly custom scheduler and load balancer to improve performance. To keep track of which run is which, irace can specify an run id, which the user would keep track of and return the result with the run id so irace knows how to keep track of it.

irace already has a load balancer for parallel executions, and you can also implement your own by setting the scenario option targetRunnerParallel to your own function (it may even work with a python function, but I haven't tried). This is more flexible than what you propose as you can easily combine different targetRunnerParallel with different target_runner without having to create a function for every possible combination.

If you implement a load balancer that is better (or for a different purpose) than the ones currently available in irace, I will be happy to add it as an option or as an example (either in the R package or the iracepy package).

However, irace currently synchronizes parallelization for every instance. To get the full benefits of load balancing and parallelism one needs to make irace itself be asynchronous, which requires changes within irace and the best way to do that would using 'futures' (https://future.futureverse.org/). This will allow it to work for any user and with multiple parallelizaton back-ends (including custom backends). Futures also exist in Python: https://docs.python.org/3/library/asyncio-future.html

@DE0CH
Copy link
Contributor Author

DE0CH commented Nov 22, 2022

The above already works in iracepy: See the updated https://github.com/auto-optimization/iracepy/blob/main/example_dual_annealing.py

I saw that. I agree. It works better than I thought.

The user wants to do everything within Python: Then define target_runner and everything it needs in the same file as the call to irace, or in a different file (using a class to keep all state, closures, etc.) or in a proper package that is imported.

I think this use case will be covered by iracepy once it's done. So I'm happy about it.

But what if the user wants to use other langauges like rust, for which a convient language binding doens't exist (as far as I know). Then perhaps passing information around through stdin and stdout will be easier than asking the user to figure out how to run an embedded R? I am not really sure because I don't really know how to do dynamically link library etc.

You can do this now. Just make your target_runner spawn a daemon, fork or some other technique for parallelism. Subsequent calls will just connect to that daemon. How irace communicates with target_runner is independent of how target_runner communicates with the daemon (it can communicate via internet if you wish to).

Yeah, but that requires a client-server model which complicates things quite a bit. You have to manage TCP port / unix socket, and session on top of it with http or websocket or whatnot.

If you implement a load balancer that is better (or for a different purpose) than the ones currently available in irace, I will be happy to add it as an option or as an example (either in the R package or the iracepy package).

I don't think my implementation would be applicable to anyone else. We have a lot of desktop computers I can ssh into in school. So I built some hacky way of ssh into the machine, build a docker container and run irace and then sync files back with rsync.

However, irace currently synchronizes parallelization for every instance. To get the full benefits of load balancing and parallelism one needs to make irace itself be asynchronous, which requires changes within irace and the best way to do that would using 'futures' (https://future.futureverse.org/). This will allow it to work for any user and with multiple parallelizaton back-ends (including custom backends). Futures also exist in Python: https://docs.python.org/3/library/asyncio-future.html

I see.

@DE0CH
Copy link
Contributor Author

DE0CH commented Feb 21, 2023

Developing iracepy, handling all the data conversion and all sorts of edge cases through rpy2 have proven to be way too painful to deal with. So I am suggesting an alternative.

We define a communication interface through stdin/out. Stockfish does this too. Instead of irace calling the target runner, it prints its "command" into stdout such as "targetRunner 1 113 734718556 /home/user/instances/tsp/2000-533.tsp
--eas --localsearch 0 --alpha 2.92 --beta 3.06 --rho 0.6 --ants 80" for example. For log it normally prints and targetEvaluator it can just prefix it with "log " or "targetEvaluator " to differentiate it. The targetRunner would print to stdin “targetRunner <id.configuraiton> <id.instance> <cost> <time>” or "targetEvaluartor <id.configuration> <id.instance> <cost>"

In addition, we can add parameters "trainingInstancesText", "parameterText", "forbiddenText", "configurationText", "trainInstancesText", and "testInstancesText", which accepts the contents of these configurations normally in a file. Users can shell escape to use special characters like space and newline, or they can just construct the list of argv from a programming language.

User can specify this option by using setting targetRunner or targetEvaluator to "stdout://" (defining a protocol like "https://" and it's very unlikely someone names their target runner file).

This is easy to implement. We just need to make the targetRunner to print to stdout instead of calling shell command, and read from stdin instead of reading from the output of the program. This can also be easily understood by users who are familiar with irace and easy to port their code to it and it can be run anywhere with a shell, instead of rpy2 which requires some fiddly dynamically linked library / shared objects.

@MLopez-Ibanez
Copy link
Owner

We define a communication interface through stdin/out. Stockfish does this too. Instead of irace calling the target runner, it prints its "command" into stdout such as "targetRunner 1 113 734718556 /home/user/instances/tsp/2000-533.tsp --eas --localsearch 0 --alpha 2.92 --beta 3.06 --rho 0.6 --ants 80" for example. For log it normally prints and targetEvaluator it can just prefix it with "log " or "targetEvaluator " to differentiate it. The targetRunner would print to stdin “targetRunner <id.configuraiton> <id.instance> ” or "targetEvaluartor <id.configuration> <id.instance> "

You could add an alternative to target.runner.default

https://github.com/MLopez-Ibanez/irace/blob/master/R/race-wrapper.R#L481

that prints to stdout instead of calling the target-runner. This alternative could be chosen automatically in checkScenario:

https://github.com/MLopez-Ibanez/irace/blob/master/R/readConfiguration.R#L407

if the targetRunner is "stdout://" as you suggest.

If you are going to have a text-based interface, it would be better to print in json format.

In addition, we can add parameters "trainingInstancesText", "parameterText", "forbiddenText", "configurationText", "trainInstancesText", and "testInstancesText", which accepts the contents of these configurations normally in a file. Users can shell escape to use special characters like space and newline, or they can just construct the list of argv from a programming language.

@DE0CH
Copy link
Contributor Author

DE0CH commented Jan 4, 2024

Seems like the idea isn't going anywhere nor will it be implemented. Closing it as not planned

@DE0CH DE0CH closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants