Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example: DeepSpeed deferred init with opt-30b #2419

Merged
merged 20 commits into from
Jun 24, 2023
Merged

Conversation

agunapal
Copy link
Collaborator

Description

This PR shows how to use DeepSpeed using deferred model loading with a large model like opt-30b

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Loading opt-30b model
2023-06-21T23:14:55,795 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-06-21T23:14:55,810 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: opt.tar.gz
2023-06-21T23:14:55,836 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model opt
2023-06-21T23:14:55,837 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model opt
2023-06-21T23:14:55,837 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model opt loaded.
2023-06-21T23:14:55,837 [INFO ] main org.pytorch.serve.wlm.ModelManager - model opt set minWorkers: 1, maxWorkers: 1 for parallelLevel: 4 
2023-06-21T23:14:55,838 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: opt, count: 1
2023-06-21T23:14:55,845 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-06-21T23:14:55,845 [DEBUG] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [torchrun, --nnodes, 1, --nproc-per-node, 4, --log-dir, /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts, --rdzv-backend, c10d, --rdzv-id, opt_29500, --max-restarts, 1, /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.29500, --metrics-config, /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2023-06-21T23:14:55,896 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-06-21T23:14:55,896 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-06-21T23:14:55,897 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-06-21T23:14:55,897 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-06-21T23:14:55,898 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-06-21T23:14:56,058 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-06-21T23:14:57,337 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
2023-06-21T23:14:57,338 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
2023-06-21T23:14:57,339 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   entrypoint       : /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/model_service_worker.py
2023-06-21T23:14:57,339 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   min_nodes        : 1
2023-06-21T23:14:57,339 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   max_nodes        : 1
2023-06-21T23:14:57,340 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   nproc_per_node   : 4
2023-06-21T23:14:57,340 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   run_id           : opt_29500
2023-06-21T23:14:57,340 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   rdzv_backend     : c10d
2023-06-21T23:14:57,340 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   rdzv_endpoint    : 
2023-06-21T23:14:57,340 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   rdzv_configs     : {'timeout': 900}
2023-06-21T23:14:57,341 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   max_restarts     : 1
2023-06-21T23:14:57,341 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   monitor_interval : 5
2023-06-21T23:14:57,341 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   log_dir          : /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts
2023-06-21T23:14:57,341 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   metrics_cfg      : {}
2023-06-21T23:14:57,342 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - 
2023-06-21T23:14:57,342 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts/opt_29500_6o3mkelj
2023-06-21T23:14:57,342 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
2023-06-21T23:14:57,342 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
2023-06-21T23:14:57,504 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
2023-06-21T23:14:57,505 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   restart_count=0
2023-06-21T23:14:57,505 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   master_addr=ip-172-31-2-198.us-west-2.compute.internal
2023-06-21T23:14:57,505 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   master_port=56255
2023-06-21T23:14:57,505 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   group_rank=0
2023-06-21T23:14:57,506 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   group_world_size=1
2023-06-21T23:14:57,506 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   local_ranks=[0, 1, 2, 3]
2023-06-21T23:14:57,506 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   role_ranks=[0, 1, 2, 3]
2023-06-21T23:14:57,507 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   global_ranks=[0, 1, 2, 3]
2023-06-21T23:14:57,507 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   role_world_sizes=[4, 4, 4, 4]
2023-06-21T23:14:57,507 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG -   global_world_sizes=[4, 4, 4, 4]
2023-06-21T23:14:57,507 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - 
2023-06-21T23:14:57,508 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
2023-06-21T23:14:57,508 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.
2023-06-21T23:14:57,508 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts/opt_29500_6o3mkelj/attempt_0/0/error.json
2023-06-21T23:14:57,508 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts/opt_29500_6o3mkelj/attempt_0/1/error.json
2023-06-21T23:14:57,508 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts/opt_29500_6o3mkelj/attempt_0/2/error.json
2023-06-21T23:14:57,508 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /home/ubuntu/serve/examples/large_models/deepspeed/opt/logs/torchelastic_ts/opt_29500_6o3mkelj/attempt_0/3/error.json
2023-06-21T23:14:57,658 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,658 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:798.8775291442871|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,659 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:170.09454727172852|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,659 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:17.6|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,659 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,659 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,660 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,660 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:1|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,660 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,660 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:2|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,660 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,661 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:3|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,661 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,661 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,661 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,662 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,662 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:186218.84765625|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,662 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:3257.27734375|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:57,662 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:2.6|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389297
2023-06-21T23:14:58,655 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=29500, pid=89391
2023-06-21T23:14:58,656 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.29503
2023-06-21T23:14:58,664 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-06-21T23:14:58,664 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [PID]89391
2023-06-21T23:14:58,664 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Torch worker started.
2023-06-21T23:14:58,665 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Python runtime: 3.10.0
2023-06-21T23:14:58,672 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=29500, pid=89389
2023-06-21T23:14:58,672 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.29501
2023-06-21T23:14:58,674 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=29500, pid=89390
2023-06-21T23:14:58,675 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.29502
2023-06-21T23:14:58,681 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=29500, pid=89388
2023-06-21T23:14:58,681 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.29500
2023-06-21T23:14:58,681 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-06-21T23:14:58,681 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [PID]89389
2023-06-21T23:14:58,682 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Torch worker started.
2023-06-21T23:14:58,682 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Python runtime: 3.10.0
2023-06-21T23:14:58,684 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-06-21T23:14:58,684 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [PID]89390
2023-06-21T23:14:58,685 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Torch worker started.
2023-06-21T23:14:58,685 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Python runtime: 3.10.0
2023-06-21T23:14:58,690 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-06-21T23:14:58,690 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [PID]89388
2023-06-21T23:14:58,690 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Torch worker started.
2023-06-21T23:14:58,691 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Python runtime: 3.10.0
2023-06-21T23:14:58,691 [DEBUG] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - W-29500-opt_1.0 State change null -> WORKER_STARTED
2023-06-21T23:14:58,694 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.29500
2023-06-21T23:14:58,709 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.29500.
2023-06-21T23:14:58,710 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.29501
2023-06-21T23:14:58,711 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.29502
2023-06-21T23:14:58,712 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.29501.
2023-06-21T23:14:58,712 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.29503
2023-06-21T23:14:58,715 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.29502.
2023-06-21T23:14:58,716 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.29503.
2023-06-21T23:14:58,718 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1687389298717
2023-06-21T23:14:58,740 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - model_name: opt, batchSize: 1
2023-06-21T23:14:58,753 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - model_name: opt, batchSize: 1
2023-06-21T23:14:58,755 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:14:58,755] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-06-21T23:14:58,766 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - model_name: opt, batchSize: 1
2023-06-21T23:14:58,766 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:14:58,766] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-06-21T23:14:58,778 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:14:58,778] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-06-21T23:14:58,779 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - model_name: opt, batchSize: 1
2023-06-21T23:14:58,791 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:14:58,791] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-06-21T23:14:59,639 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-06-21T23:14:59,639 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-06-21T23:14:59,639 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Transformers version 4.30.1
2023-06-21T23:14:59,642 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loading tokenizer
2023-06-21T23:14:59,644 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-06-21T23:14:59,644 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-06-21T23:14:59,644 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Transformers version 4.30.1
2023-06-21T23:14:59,647 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loading tokenizer
2023-06-21T23:14:59,649 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-06-21T23:14:59,649 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-06-21T23:14:59,649 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Transformers version 4.30.1
2023-06-21T23:14:59,652 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loading tokenizer
2023-06-21T23:14:59,654 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-06-21T23:14:59,654 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-06-21T23:14:59,655 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Transformers version 4.30.1
2023-06-21T23:14:59,657 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loading tokenizer
2023-06-21T23:15:00,273 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Creating deepspeed checkpoint file /tmp/models/a3fab812bd8448e986cc511106eb174b/checkpoints.json
2023-06-21T23:15:00,273 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,273] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.4, git-hash=unknown, git-branch=unknown
2023-06-21T23:15:00,274 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,274] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
2023-06-21T23:15:00,279 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,279] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
2023-06-21T23:15:00,279 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,279] [INFO] [comm.py:594:init_distributed] cdb=None
2023-06-21T23:15:00,288 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Creating deepspeed checkpoint file /tmp/models/a3fab812bd8448e986cc511106eb174b/checkpoints.json
2023-06-21T23:15:00,288 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,288] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.4, git-hash=unknown, git-branch=unknown
2023-06-21T23:15:00,289 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,289] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
2023-06-21T23:15:00,294 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,294] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
2023-06-21T23:15:00,294 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,294] [INFO] [comm.py:594:init_distributed] cdb=None
2023-06-21T23:15:00,305 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Creating deepspeed checkpoint file /tmp/models/a3fab812bd8448e986cc511106eb174b/checkpoints.json
2023-06-21T23:15:00,305 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,305] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.4, git-hash=unknown, git-branch=unknown
2023-06-21T23:15:00,305 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,305] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
2023-06-21T23:15:00,310 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Creating deepspeed checkpoint file /tmp/models/a3fab812bd8448e986cc511106eb174b/checkpoints.json
2023-06-21T23:15:00,310 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,310] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.4, git-hash=unknown, git-branch=unknown
2023-06-21T23:15:00,310 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,310] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
2023-06-21T23:15:00,311 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,310] [INFO] [comm.py:594:init_distributed] cdb=None
2023-06-21T23:15:00,311 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,310] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-06-21T23:15:00,311 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,311] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
2023-06-21T23:15:00,316 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,316] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
2023-06-21T23:15:00,316 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:00,316] [INFO] [comm.py:594:init_distributed] cdb=None
2023-06-21T23:15:00,318 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:1 to store for rank: 1
2023-06-21T23:15:01,287 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:1 to store for rank: 2
2023-06-21T23:15:01,298 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:1 to store for rank: 3
2023-06-21T23:15:01,303 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:1 to store for rank: 0
2023-06-21T23:15:01,303 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023-06-21T23:15:01,304 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:2 to store for rank: 0
2023-06-21T23:15:01,304 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023-06-21T23:15:01,305 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:2 to store for rank: 1
2023-06-21T23:15:01,308 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023-06-21T23:15:01,308 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
2023-06-21T23:15:01,308 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:2 to store for rank: 2
2023-06-21T23:15:01,309 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Added key: store_based_barrier_key:2 to store for rank: 3
2023-06-21T23:15:01,309 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
2023-06-21T23:15:01,314 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
2023-06-21T23:15:01,315 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
2023-06-21T23:15:01,319 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.
2023-06-21T23:15:01,829 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:01,843 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:01,846 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:01,851 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:01,861 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Detected CUDA files, patching ldflags
2023-06-21T23:15:01,861 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu117/transformer_inference/build.ninja...
2023-06-21T23:15:01,861 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Building extension module transformer_inference...
2023-06-21T23:15:01,861 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
2023-06-21T23:15:01,888 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - ninja: no work to do.
2023-06-21T23:15:01,891 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:01,892 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.06557369232177734 seconds
2023-06-21T23:15:01,893 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - [2023-06-21 23:15:01,892] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 7168, 'intermediate_size': 28672, 'heads': 56, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 4, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.ReLU: 2>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False}
2023-06-21T23:15:01,943 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:01,944 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.10344600677490234 seconds
2023-06-21T23:15:01,946 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:01,948 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.10380434989929199 seconds
2023-06-21T23:15:01,951 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:01,952 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.10337448120117188 seconds
2023-06-21T23:15:02,151 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:02,151 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - No modifications detected for re-loaded extension module transformer_inference, skipping build step...
2023-06-21T23:15:02,151 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:02,152 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.0022699832916259766 seconds
2023-06-21T23:15:02,222 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:02,223 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - No modifications detected for re-loaded extension module transformer_inference, skipping build step...
2023-06-21T23:15:02,223 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:02,223 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.002281665802001953 seconds
2023-06-21T23:15:02,231 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:02,231 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - No modifications detected for re-loaded extension module transformer_inference, skipping build step...
2023-06-21T23:15:02,232 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.001988649368286133 seconds
2023-06-21T23:15:02,232 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:02,232 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Using /home/ubuntu/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
2023-06-21T23:15:02,232 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - No modifications detected for re-loaded extension module transformer_inference, skipping build step...
2023-06-21T23:15:02,232 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading extension module transformer_inference...
2023-06-21T23:15:02,232 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Time to load transformer_inference op: 0.002056598663330078 seconds
2023-06-21T23:15:02,333 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - 
2023-06-21T23:15:02,399 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]
2023-06-21T23:15:02,404 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]
2023-06-21T23:15:02,408 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]
2023-06-21T23:15:20,619 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]
2023-06-21T23:15:20,628 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  14%|█▍        | 1/7 [00:18<01:49, 18.22s/it]
2023-06-21T23:15:20,634 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  14%|█▍        | 1/7 [00:18<01:49, 18.29s/it]
2023-06-21T23:15:20,637 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  14%|█▍        | 1/7 [00:18<01:49, 18.23s/it]
2023-06-21T23:15:38,736 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  14%|█▍        | 1/7 [00:18<01:49, 18.23s/it]
2023-06-21T23:15:38,741 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  29%|██▊       | 2/7 [00:36<01:30, 18.16s/it]
2023-06-21T23:15:38,749 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  29%|██▊       | 2/7 [00:36<01:30, 18.19s/it]
2023-06-21T23:15:38,751 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  29%|██▊       | 2/7 [00:36<01:30, 18.16s/it]
2023-06-21T23:15:56,545 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  29%|██▊       | 2/7 [00:36<01:30, 18.16s/it]
2023-06-21T23:15:56,555 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  43%|████▎     | 3/7 [00:54<01:11, 18.00s/it]
2023-06-21T23:15:56,559 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  43%|████▎     | 3/7 [00:54<01:12, 18.02s/it]
2023-06-21T23:15:56,562 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  43%|████▎     | 3/7 [00:54<01:12, 18.00s/it]
2023-06-21T23:15:57,123 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,123 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:798.8774299621582|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:170.09464645385742|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:17.6|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:65.8719819350356|#Level:Host,DeviceId:0|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:15169.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:65.8719819350356|#Level:Host,DeviceId:1|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:15169.0|#Level:Host,DeviceId:1|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,124 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:65.8719819350356|#Level:Host,DeviceId:2|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:15169.0|#Level:Host,DeviceId:2|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:65.8719819350356|#Level:Host,DeviceId:3|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:15169.0|#Level:Host,DeviceId:3|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:154253.171875|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,125 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:35181.5859375|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:57,126 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:19.3|#Level:Host|#hostname:ip-172-31-2-198,timestamp:1687389357
2023-06-21T23:15:58,526 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  43%|████▎     | 3/7 [00:54<01:12, 18.00s/it]
2023-06-21T23:15:58,545 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  57%|█████▋    | 4/7 [00:56<00:35, 11.67s/it]
2023-06-21T23:15:58,552 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  57%|█████▋    | 4/7 [00:56<00:35, 11.68s/it]
2023-06-21T23:15:58,567 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  57%|█████▋    | 4/7 [00:56<00:35, 11.69s/it]
2023-06-21T23:16:15,063 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  57%|█████▋    | 4/7 [00:56<00:35, 11.70s/it]
2023-06-21T23:16:15,072 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  71%|███████▏  | 5/7 [01:12<00:26, 13.43s/it]
2023-06-21T23:16:15,075 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  71%|███████▏  | 5/7 [01:12<00:26, 13.43s/it]
2023-06-21T23:16:15,080 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  71%|███████▏  | 5/7 [01:12<00:26, 13.43s/it]
2023-06-21T23:16:33,255 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  71%|███████▏  | 5/7 [01:12<00:26, 13.43s/it]
2023-06-21T23:16:33,263 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  86%|████████▌ | 6/7 [01:30<00:15, 15.05s/it]
2023-06-21T23:16:33,267 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  86%|████████▌ | 6/7 [01:30<00:15, 15.05s/it]
2023-06-21T23:16:33,270 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  86%|████████▌ | 6/7 [01:30<00:15, 15.05s/it]
2023-06-21T23:16:51,270 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - checkpoint loading time at rank 1: 108.87302947044373 sec
2023-06-21T23:16:51,270 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards:  86%|████████▌ | 6/7 [01:30<00:15, 15.05s/it]
2023-06-21T23:16:51,270 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 16.02s/it]
2023-06-21T23:16:51,271 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 15.55s/it]
2023-06-21T23:16:51,279 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - checkpoint loading time at rank 0: 108.94691133499146 sec
2023-06-21T23:16:51,279 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - 
2023-06-21T23:16:51,279 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 16.02s/it]
2023-06-21T23:16:51,279 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 15.56s/it]
2023-06-21T23:16:51,285 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - checkpoint loading time at rank 2: 108.88278388977051 sec
2023-06-21T23:16:51,285 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - 
2023-06-21T23:16:51,286 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 16.02s/it]
2023-06-21T23:16:51,286 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 15.55s/it]
2023-06-21T23:16:51,287 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - checkpoint loading time at rank 3: 108.88037157058716 sec
2023-06-21T23:16:51,287 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - 
2023-06-21T23:16:51,287 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 16.02s/it]
2023-06-21T23:16:51,288 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Loading 7 checkpoint shards: 100%|██████████| 7/7 [01:48<00:00, 15.55s/it]
2023-06-21T23:16:52,041 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loaded successfully
2023-06-21T23:16:52,041 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loaded successfully
2023-06-21T23:16:52,042 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loaded successfully
2023-06-21T23:16:52,042 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Model opt loaded successfully
2023-06-21T23:16:52,047 [DEBUG] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-06-21T23:16:52,047 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 113269
2023-06-21T23:16:52,047 [DEBUG] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - W-29500-opt_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED

  • Inference request
    Client
 curl -v "http://localhost:8080/predictions/opt" -T sample_text.txt
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> PUT /predictions/opt HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.87.0
> Accept: */*
> Content-Length: 54
> Expect: 100-continue
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 
< x-request-id: 70c9a4ff-6c57-4d65-99bc-d42adf0f179c
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 59
< connection: keep-alive
< 
Today the weather is really nice and I am planning on
* Connection #0 to host localhost left intact
going

Server

2023-06-21T23:16:57,280 [INFO ] W-29500-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT to backend at: 1687389417280
2023-06-21T23:16:57,281 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Backend received inference at: 1687389417
2023-06-21T23:16:57,281 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Received text: 'Today the weather is really nice and I am planning on
2023-06-21T23:16:57,281 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - '
2023-06-21T23:16:57,282 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Backend received inference at: 1687389417
2023-06-21T23:16:57,282 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Received text: 'Today the weather is really nice and I am planning on
2023-06-21T23:16:57,282 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - '
2023-06-21T23:16:57,282 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Backend received inference at: 1687389417
2023-06-21T23:16:57,282 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Received text: 'Today the weather is really nice and I am planning on
2023-06-21T23:16:57,283 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - '
2023-06-21T23:16:57,283 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Backend received inference at: 1687389417
2023-06-21T23:16:57,283 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Received text: 'Today the weather is really nice and I am planning on
2023-06-21T23:16:57,283 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - '
2023-06-21T23:16:57,289 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Input length of input_ids is 13, but `max_length` is set to 10. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
2023-06-21T23:16:57,289 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Input length of input_ids is 13, but `max_length` is set to 10. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
2023-06-21T23:16:57,290 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Input length of input_ids is 13, but `max_length` is set to 10. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
2023-06-21T23:16:57,290 [WARN ] W-29500-opt_1.0-stderr MODEL_LOG - Input length of input_ids is 13, but `max_length` is set to 10. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - ------------------------------------------------------
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Free memory : 6.170227 (GigaBytes)  
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Total memory: 22.056641 (GigaBytes)  
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Requested memory: 0.984375 (GigaBytes) 
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Setting maximum total tokens (input + output) to 1024 
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - WorkSpace: 0x7f59f8000000 
2023-06-21T23:16:58,088 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - ------------------------------------------------------
2023-06-21T23:16:58,256 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Generated text: ['Today the weather is really nice and I am planning on\ngoing']
2023-06-21T23:16:58,256 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Generated text: ['Today the weather is really nice and I am planning on\ngoing']
2023-06-21T23:16:58,256 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Generated text: ['Today the weather is really nice and I am planning on\ngoing']

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #2419 (858bd83) into master (a77a150) will decrease coverage by 0.12%.
The diff coverage is 0.00%.

❗ Current head 858bd83 differs from pull request most recent head 1d967d9. Consider uploading reports for the commit 1d967d9 to get more accurate results

@@            Coverage Diff             @@
##           master    #2419      +/-   ##
==========================================
- Coverage   72.01%   71.89%   -0.12%     
==========================================
  Files          78       78              
  Lines        3648     3654       +6     
  Branches       58       58              
==========================================
  Hits         2627     2627              
- Misses       1017     1023       +6     
  Partials        4        4              
Impacted Files Coverage Δ
ts/handler_utils/distributed/deepspeed.py 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@HamidShojanazeri
Copy link
Collaborator

HamidShojanazeri commented Jun 23, 2023

@ankithagunapal can you pls move the handler, README, requirement.txt and sample_Text to the parent folder from opt. Lets keep only model_config file in opt folder. They seem to be general regardless of the model/ we can always add to readme if needed other models, otherwise it would be repetitive.

@agunapal agunapal merged commit 603e89f into master Jun 24, 2023
@GuoqiangJia
Copy link

GuoqiangJia commented Jul 20, 2023

@agunapal , I noticed there are three 'Generated text' lines in the serve output, I think it's because you used three GPUs. However, is it correct each gpu generates one output ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants