Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use model archiver utility for complex projects? #566

Closed
sagjounkani opened this issue Jul 29, 2020 · 9 comments
Closed

How to use model archiver utility for complex projects? #566

sagjounkani opened this issue Jul 29, 2020 · 9 comments
Assignees
Labels
triaged_wait Waiting for the Reporter's resp

Comments

@sagjounkani
Copy link

I am trying to serve the model from the NCRFpp project using torchserve. The files that are required by my custom handler.py file are in multiple folders and have import statements which refer to this folder hierarchy. The model archiver zips all these files and extracts into a temporary folder at the same level without this folder hierarchy, leading to runtime errors because the import statements fail. How do I ensure that the import statements work, with same folder hierarchy as used during development, while using torchserve?

@harshbafna
Copy link
Contributor

@sagjounkani: You could create a zip file of the dependency python files in the required folder hierarchy and supply this zip file using the --extra-files parameter while creating the mar. Later while initializing the handler you could extract this zip file, in the model's temporary directory which is already added in the PYTHONPATH

You can refer to the waveglow text-to-speech-synthesizer example

@harshbafna harshbafna self-assigned this Jul 29, 2020
@harshbafna harshbafna added the triaged_wait Waiting for the Reporter's resp label Jul 29, 2020
@misrasaurabh1
Copy link

Yep, I encountered this problem as well. Even though the files specified by --extra-files even though may belong to a directory structure, torchserve copies them all to a single top level directory while serving. This complicates things for applications. Also there is a big possibility of filename collisions. I would like the directory structure of the files within the --extra-files to be preserved.
Also, if the current behavior is what is expected it should be made clear in the documentation.

@sagjounkani
Copy link
Author

@harshbafna Thank you for the resolution, I was able to deploy the model. I found it easier to append the path for dependencies in my custom handler file compared to the process followed in the waveglow text-to-speech-synthesizer example. Not sure of a standard way to approach this. Agree with @misrasaurabh1 that if somehow the directory structure is preserved within the --extra-files it will make things easier.

@harshbafna
Copy link
Contributor

harshbafna commented Jul 30, 2020

@sagjounkani: Your solution may work when TorchServe is deployed on your localhost, however, it will fail in case you need to register the model on a remote host where the extra files will not be available on the server's file system.

There are multiple ways you can add you dependency python files in the model-archive :

  • zip your directory structure as required and unzip in the handler (as explained earlier)
  • You could also create an egg file for all the dependency python packages and add that in the model-archive(.mar).
  • if your model is dependent on a third-party python package, you could also supply a requirements.txt file to include a list of all the python modules required in the mar file. Refer documentataion for more details. Note that, this feature is not available in the current GA release but is available in the latest master. It will be a part of the upcoming release.
  • In case the project is not available on PyPi repo, you can manually create the build for that project and supply the generated .tar.gz or .zip file in the and include it in the requirements.txt.

@misrasaurabh1:

Also, if the current behavior is what is expected it should be made clear in the documentation.

We will take this up as a part of #561 . Also, if you think the above-provided options are too complicated please feel free to raise a feature request.

@hatzel
Copy link

hatzel commented Nov 9, 2021

While I think this is certainly an area where torchserve should support a simpler approach, let me share my work around with you.

The implementation here just recursively copies any sub directories (of the included directories) over. You can actually trick this into doing what you want by creating a temporary directory into which you symlink the directories you originally wanted to include.

TEMP_DIR=$(mktemp -d)
ln -s "$(pwd)/dir_a" $TEMP_DIR
ln -s "$(pwd)/dir_b" $TEMP_DIR

# call torch-model-archiver with `--extra-files $TEMP_DIR` here

rm -rf $TEMP_DIR

The resulting archive will include the top level directories named dir_a and dir_b.

@mhashas
Copy link

mhashas commented Jan 11, 2023

@harshbafna

How does it work exactly with the egg file / wheel file? You still need to sys.path.append("egg_file_location") right? And you can only get the location from the context model_dir + the name you gave it, correct?

My use case is as follows. Let's assume I implement my own basehandler in the shared module.
Now I work on my_project, and create a my_project_handler. However, I want my baseclass to be BaseHandler from shared.
But shared does not exist in the torchserve/mar environment until my handler either

  • unzips the code
  • adds the egg file location

Is that correct?
Is there any solution for this?

@harshbafna
Copy link
Contributor

@mhashas :

@harshbafna

How does it work exactly with the egg file / wheel file? You still need to sys.path.append("egg_file_location") right? And you can only get the location from the context model_dir + the name you gave it, correct?

My use case is as follows. Let's assume I implement my own basehandler in the shared module. Now I work on my_project, and create a my_project_handler. However, I want my baseclass to be BaseHandler from shared. But shared does not exist in the torchserve/mar environment until my handler either

  • unzips the code
  • adds the egg file location

Is that correct? Is there any solution for this?

The model's temporary directory, where the model archive (.mar) is extracted, is already added in the PYTHONPATH.

In your case, you can add the zip file in your model archive using --extra-files flag and then add step in your custom handler to unzip/extract the zip file in the model's temporary directory.

You can refer to the waveglow text-to-speech-synthesizer example

@mhashas
Copy link

mhashas commented Jan 11, 2023

@harshbafna
Yes, and a drawback of your approach is that all imports need to be done locally in functions, after the unzipping was done: https://github.com/pytorch/serve/blob/master/examples/text_to_speech_synthesizer/waveglow_handler.py#L40. If you move this import at the top-level, it would fail, because the source files don't exist yet. For my use-case, I cannot add the shared handler as a base class to the handler, because it only exists after the handler was initialized and the zip unzipped.

I think in the end @hatzel's response is the easiest one to implement and does solve my problem.

@harshbafna
Copy link
Contributor

harshbafna commented Jan 12, 2023

@harshbafna Yes, and a drawback of your approach is that all imports need to be done locally in functions, after the unzipping was done: https://github.com/pytorch/serve/blob/master/examples/text_to_speech_synthesizer/waveglow_handler.py#L40. If you move this import at the top-level, it would fail, because the source files don't exist yet. For my use-case, I cannot add the shared handler as a base class to the handler, because it only exists after the handler was initialized and the zip unzipped.

I think in the end @hatzel's response is the easiest one to implement and does solve my problem.

@mhashas
Yes, that is one way to workaround the problem.

Other way can be to package the wheel file and a custom requirements.txt file in your model archive. TorchServe will automatically install the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged_wait Waiting for the Reporter's resp
Projects
None yet
Development

No branches or pull requests

5 participants