-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Independent irace Run #34
Comments
Perhaps define a JSON format and pass all the setup as a single input string in stdin so one can do
I don't understand completely what you mean. You still need to launch a new process (the target-runner process). If you mean that irace would send messages to some pipe and the pipe will be read by a continuously running process and use each line to do something, this seems like a good idea and probably not too hard to implement by adding some option "--pipe pipe_name" that instead of launching a process for each target-runner call, prints the call to the pipe pipe_name.in and reads output from pipe_name.out. But I wonder how parallelization would work in that case. I'm happy to incorporate an implementation of this idea but I don't have the time to implement it myself. Having a dummy but fully functional example that at least works in Linux would be helpful. Still, I don't see how the above helps with interfacing with other programming languages. Could you explain? Perhaps with an example? |
That would work too. I imagine this to be used by the Python package (instead of directly by the user), so whichever is easiest to implement.
The purpose of this is to let the user control how a new process is spawned. I was thinking instead of calling an executable and passing in the parameters as command line options, irace would just print those to stdout, and whichever calling process would just read it, execute the target runner and feed the result back to irace through its stdin. User would be responsible for spawning new process with possibly custom scheduler and load balancer to improve performance. To keep track of which run is which, irace can specify an run id, which the user would keep track of and return the result with the run id so irace knows how to keep track of it.
It is useful for allowing the Python (or other language) package to accept a function as the target runner. Let's assume the function that launches irace looks like this (pseudocode):
This would be quite hard to implement if irace executes a file because of closure, for example, if users writes:
Python captures the variable size in the closure, but if the package simply writes the body of the function into a python file and pass it to irace, it would not work. |
The above already works in iracepy: See the updated https://github.com/auto-optimization/iracepy/blob/main/example_dual_annealing.py I'm not sure why a package will write the body of a function to a file without all the code required to make the function work. If the function uses numpy but the package only writes the body and not the line "import numpy", then the function will not work. Maybe I'm missing something but I see only two usecases:
You can do this now. Just make your target_runner spawn a
irace already has a load balancer for parallel executions, and you can also implement your own by setting the scenario option If you implement a load balancer that is better (or for a different purpose) than the ones currently available in irace, I will be happy to add it as an option or as an example (either in the R package or the iracepy package). However, irace currently synchronizes parallelization for every instance. To get the full benefits of load balancing and parallelism one needs to make irace itself be asynchronous, which requires changes within irace and the best way to do that would using 'futures' (https://future.futureverse.org/). This will allow it to work for any user and with multiple parallelizaton back-ends (including custom backends). Futures also exist in Python: https://docs.python.org/3/library/asyncio-future.html |
I saw that. I agree. It works better than I thought.
I think this use case will be covered by But what if the user wants to use other langauges like rust, for which a convient language binding doens't exist (as far as I know). Then perhaps passing information around through stdin and stdout will be easier than asking the user to figure out how to run an embedded R? I am not really sure because I don't really know how to do dynamically link library etc.
Yeah, but that requires a client-server model which complicates things quite a bit. You have to manage TCP port / unix socket, and session on top of it with http or websocket or whatnot.
I don't think my implementation would be applicable to anyone else. We have a lot of desktop computers I can ssh into in school. So I built some hacky way of ssh into the machine, build a docker container and run irace and then sync files back with rsync.
I see. |
Developing iracepy, handling all the data conversion and all sorts of edge cases through rpy2 have proven to be way too painful to deal with. So I am suggesting an alternative. We define a communication interface through stdin/out. Stockfish does this too. Instead of irace calling the target runner, it prints its "command" into stdout such as "targetRunner 1 113 734718556 /home/user/instances/tsp/2000-533.tsp In addition, we can add parameters "trainingInstancesText", "parameterText", "forbiddenText", "configurationText", "trainInstancesText", and "testInstancesText", which accepts the contents of these configurations normally in a file. Users can shell escape to use special characters like space and newline, or they can just construct the list of argv from a programming language. User can specify this option by using setting targetRunner or targetEvaluator to "stdout://" (defining a protocol like "https://" and it's very unlikely someone names their target runner file). This is easy to implement. We just need to make the targetRunner to print to stdout instead of calling shell command, and read from stdin instead of reading from the output of the program. This can also be easily understood by users who are familiar with irace and easy to port their code to it and it can be run anywhere with a shell, instead of rpy2 which requires some fiddly dynamically linked library / shared objects. |
You could add an alternative to https://github.com/MLopez-Ibanez/irace/blob/master/R/race-wrapper.R#L481 that prints to stdout instead of calling the target-runner. This alternative could be chosen automatically in checkScenario: https://github.com/MLopez-Ibanez/irace/blob/master/R/readConfiguration.R#L407 if the targetRunner is "stdout://" as you suggest. If you are going to have a text-based interface, it would be better to print in json format.
|
Seems like the idea isn't going anywhere nor will it be implemented. Closing it as not planned |
I think it would be more flexible for some advanced workflow if irace can pass all its information to and from command line argument, stdin, and stderr without using any file and spawning any process. This is useful when irace is used as an intermediate step in an automated workflow and creating interfaces with other programming languages.
Particularly, creating files as parameters to pass into irace is not flexible if there are multiple simultaneous runs of irace in a machine with different configurations because each parameter file needs an unique name, which can clash if not done carefully and leak disk space if the files are not cleaned up after they are used. If everything can be passed in from the command line, there would be no clashes or leaked resources. An easy option is to just pass in the content that would be in parameters.txt and scenario.txt as strings in the command line argument. Escape sequences (such as spaces, quotes and new lines) shouldn't be too much of an issue because a lot of programming languages have ways to pass a list of args (instead of a string of args separated by spaces) and receive args as a list.
Furthermore, it would be good if irace can pass the arguments that it uses to call the target runner back by using stdout instead of spawning a new process itself. This would give the user more control over how the target runner is run. For example, they can more easily implement a custom load distributor for a very custom (or hacky) cluster. It will also make it easier to create interfaces with other programming languages because the wrapper just needs to parse arguments given by irace into a native data structure and do whatever next step the user wants (e.g. calling a function passed in as a parameter).
The text was updated successfully, but these errors were encountered: