Skip to content

c3sr/pytorch

 
 

Repository files navigation

MLModelScope PyTorch Agent

Build StatusGo Report CardLicense

This is the Pytorch agent for MLModelScope, an open-source framework and hardware agnostic, extensible and customizable platform for evaluating and profiling ML models across datasets / frameworks / systems, and within AI application pipelines.

Currently it has most of the models from Pytorch Model Zoo built in, plus many models acquired from public repositories. Although the agent supports different modalities including Object Detection and Image Enhancement, most of the built-in models are for Image Classification. More built-in models are coming. One can evaluate the ~50 models on any system of interest with either local Pytorch installation or Pytorch docker images.

Check out MLModelScope and welcome to contribute.

Bare Minimum Installation

Prerequsite System Library Installation

We first discuss a bare minimum pytorch-agent installation without the tracing and profiling capabilities. To make this work, you will need to have the following system libraries preinstalled in your system.

  • The CUDA library (required)
  • The CUPTI library (required)
  • The Pytorch C++ (libtorch) library (required)
  • The libjpeg-turbo library (optional, but preferred)

The CUDA Library

Please refer to Nvidia CUDA library installation on this. Find the localation of your local CUDA installation, which is typically at /usr/local/cuda/, and setup the path to the libcublas.so library. Place the following in either your ~/.bashrc or ~/.zshrc file:

export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

The CUPTI Library

Please refer to Nvidia CUPTI library installation on this. Find the localation of your local CUPTI installation, which is typically at /usr/local/cuda/extras/CUPTI, and setup the path to the libcupti.so library. Place the following in either your ~/.bashrc or ~/.zshrc file:

export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

The Pytorch C++ (libtorch) Library

The Pytorch C++ library is required for our Pytorch Go package.

You can download pre-built Pytorch C++ (libtorch) library from Pytorch. Choose Pytorch Build = Stable (1.8.1), Your OS = <fill>, Package = LibTorch, Language = C++ and CUDA = <fill>. Download Pre-cxx11 ABI or cxx11 ABI version based on local gcc/g++ version.

Extract the downloaded archive to /opt/libtorch/.

tar -C /opt/libtorch -xzf (downloaded file)

Configure the linker environmental variables since the Pytorch C++ library is extracted to a non-system directory. Place the following in either your ~/.bashrc or ~/.zshrc file

Linux

export LIBRARY_PATH=$LIBRARY_PATH:/opt/libtorch/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/libtorch/lib

macOS

export LIBRARY_PATH=$LIBRARY_PATH:/opt/libtorch/lib
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/opt/libtorch/lib

You can test the installed Pytorch C++ library using an example C++ program.

Use libjpeg-turbo for Image Preprocessing

libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, AVX2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression. It outperforms libjpeg by a significant amount.

You need libjpeg installed.

sudo apt-get install libjpeg-dev

The default is to use libjpeg-turbo, to opt-out, use build tag nolibjpeg.

To install libjpeg-turbo, refer to libjpeg-turbo.

Linux

  export TURBO_VER=2.0.2
  cd /tmp
  wget https://cfhcable.dl.sourceforge.net/project/libjpeg-turbo/${TURBO_VER}/libjpeg-turbo-official_${TURBO_VER}_amd64.deb
  sudo dpkg -i libjpeg-turbo-official_${TURBO_VER}_amd64.deb

macOS

brew install jpeg-turbo

Installation of GO for Compilation

Since we use go for MLModelScope development, it's required to have go installed in your system before proceeding.

Please follow Installing Go Compiler to have go installed.

Bare Minimum Pytorch-agent Installation

Download and install the MLModelScope Pytorch Agent by running the following command in any location, assuming you have installed go following the above instruction.

git clone https://github.com/c3sr/pytorch.git

The CGO interface passes go pointers to the C API. There is an error in the CGO runtime. We can disable the error by placing

export GODEBUG=cgocheck=0

in your ~/.bashrc or ~/.zshrc file and then run either source ~/.bashrc or source ~/.zshrc

Build the Pytorch agent with GPU enabled

cd pytorch/pytorch-agent
go build

Build the Pytorch agent without GPU or libjpeg-turbo

cd pytorch/pytorch-agent
go build -tags=nogpu,nolibjpeg

If everything is successful, you should have an executable pytorch-agent binary in the current directory.

Configuration Setup

To run the agent, you need to setup the correct configuration file for the agent. Some of the information may not make perfect sense for all testing scenarios, but they are required and will be needed for later stage testing. Some of the port numbers as specified below can be changed depending on your later setup for those service.

So let's just set them up as is, and worry about the detailed configuration parameter values later.

You must have a carml config file called .carml_config.yml under your home directory.

The following configuration file can be placed in $HOME/.carml_config.yml or can be specified via the --config="path" option.

app:
  name: carml
  debug: true
  verbose: true
  tempdir: ~/data/mlmodelscope
registry:
  provider: consul
  endpoints:
    - localhost:8500
  timeout: 20s
  serializer: jsonpb
database:
  provider: mongodb
  endpoints:
    - localhost
tracer:
  enabled: true
  provider: jaeger
  endpoints:
    - localhost:9411
  level: FULL_TRACE
logger:
  hooks:
    - syslog

Test Installation

With the configuration and the above bare minimumn installation, you should be ready to test the installation and see how things works.

To see a list of help

./pytorch-agent -h

To see a list of models that we can run with this agent

./pytorch-agent info models

To run an inference using the default DNN model alexnet with a default input image.

./pytorch-agent predict urls --model_name TorchVision_Alexnet

External Service Installation to Enable Tracing and Profiling

We now discuss how to install a few external services that make the agent fully useful in terms of collecting tracing and profiling data.

External Srvices

MLModelScope relies on a few external services. These services provide tracing, registry, and database servers.

These services can be installed and enabled in different ways. We discuss how we use docker below to show how this can be done. You can also not use docker but install those services from either binaries or source codes directly.

Installing Docker

Refer to Install Docker.

On Ubuntu, an easy way is using

curl -fsSL get.docker.com -o get-docker.sh | sudo sh
sudo usermod -aG docker $USER

On macOS, intsall Docker Destop

Starting Trace Server

This service is required.

  • On x86 (e.g. intel) machines, start jaeger by
docker run -d -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 -p5775:5775/udp -p6831:6831/udp -p6832:6832/udp \
  -p5778:5778 -p16686:16686 -p14268:14268 -p9411:9411 jaegertracing/all-in-one:latest
  • On ppc64le (e.g. minsky) machines, start jaeger machine by
docker run -d -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 -p5775:5775/udp -p6831:6831/udp -p6832:6832/udp \
  -p5778:5778 -p16686:16686 -p14268:14268 -p9411:9411 carml/jaeger:ppc64le-latest

The trace server runs on http://localhost:16686

Starting Registry Server

This service is not required if using pytorch-agent for local evaluation.

  • On x86 (e.g. intel) machines, start consul by
docker run -p 8500:8500 -p 8600:8600 -d consul
  • On ppc64le (e.g. minsky) machines, start consul by
docker run -p 8500:8500 -p 8600:8600 -d carml/consul:ppc64le-latest

The registry server runs on http://localhost:8500

Starting Database Server

This service is not required if not using database to publish evaluation results.

  • On x86 (e.g. intel) machines, start mongodb by
docker run -p 27017:27017 --restart always -d mongo:3.0

You can also mount the database volume to a local directory using

docker run -p 27017:27017 --restart always -d  -v $HOME/data/carml/mongo:/data/db mongo:3.0

Use the Agent through Command Line

Run ./pytorch-agent -h to list the available commands.

Run ./pytorch-agent info models to list the available models.

Run ./pytorch-agent predict to evaluate a model. This runs the default evaluation. ./pytorch-agent predict -h shows the available flags you can set.

An example run is

./pytorch-agent predict general --model_name TorchVision_Alexnet --inputs_file_path $INPUT_FILE_PATH

The $INPUT_FILE_PATH is the path to a file that contains the list of files we want to inference.

Use the Agent through Pre-built Docker Images

We have pre-built docker images on Dockerhub.

An example run is

docker run --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --privileged=true \
    --network host \
    -v ~/data:/data \
    -v ~/.carml_config.yml:/root/.carml_config.yml \
    -v ~/results:/c3sr/pytorch/results \
    c3sr/pytorch-agent:amd64-cpu-pytorch1.8.1-latest predict general --model_name TorchVision_Alexnet \
    --inputs_file_path $INPUT_FILE_PATH

The $INPUT_FILE_PATH is the path to a file that contains the list of files we want to inference.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for PyTorch. NVIDIA recommends the use of the following flags: --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 31.6%
  • Roff 30.3%
  • Makefile 20.9%
  • Shell 10.7%
  • Dockerfile 6.5%