Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint for embeddings #814

Open
pseudotensor opened this issue Sep 7, 2023 · 5 comments
Open

endpoint for embeddings #814

pseudotensor opened this issue Sep 7, 2023 · 5 comments

Comments

@pseudotensor
Copy link
Collaborator

pseudotensor commented Sep 7, 2023

gunicorn: https://medium.com/huggingface/scaling-a-massive-state-of-the-art-deep-learning-model-in-production-8277c5652d5f

We used [falcon](https://falconframework.org/) for the web servers(any other http framework would have worked too) in conjunction with [gunicorn](https://gunicorn.org/) to run our instances and balance the load. Our own [GPT-2 Pytorch implementation](https://github.com/huggingface/pytorch-pretrained-BERT) is the backbone of this project. We have a few examples in our [examples directory](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples) if you’re interested in doing something similar.

Gunicorn sets up “workers” which will independently run the application, efficiently balancing the load across different workers. You can check exactly how they work on the [official gunicorn documentation](http://docs.gunicorn.org/en/stable/design.html).

HF-supported server: https://localai.io/features/embeddings/index.html

Others:
https://python.langchain.com/docs/integrations/text_embedding/xinference
https://python.langchain.com/docs/integrations/text_embedding/localai

@pseudotensor
Copy link
Collaborator Author

@Far0n
Copy link

Far0n commented Jan 9, 2024

@pseudotensor I checked https://github.com/ELS-RD/transformer-deploy#feature-extraction--dense-embeddings:

docker run -it --rm --gpus all \
  -v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.6.0 \
  bash -c "cd /project && \
    pip3 install ".[GPU]" -f https://download.pytorch.org/whl/cu116/torch_stable.html --extra-index-url https://pypi.ngc.nvidia.com --no-cache-dir && \
    convert_model -m \"sentence-transformers/msmarco-distilbert-cos-v5\" \
    --backend tensorrt onnx \
    --task embedding \
    --seq-len 16 128 128"

after that I'm getting:

[01/09/2024-13:11:01] [TRT] [E] 3: [builderConfig.cpp::validatePool::313] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::validatePool::313, condition: false. Setting DLA memory pool size on TensorRT build with DLA disabled.
)
[01/09/2024-13:11:01] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/09/2024-13:11:01] [TRT] [W] building engine. depending on model size this may take a while
[01/09/2024-13:11:02] [TRT] [E] 2: [optimizer.cpp::getFormatRequirements::2945] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)
[01/09/2024-13:11:02] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 494, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 311, in main
    engine: ICudaEngine = build_engine(
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/trt_utils.py", line 206, in build_engine
    engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f6c7de46170>, None
free(): invalid pointer

Overall not a first good impression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants