Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding retry logic to the TritonInferenceStage to allow recovering from errors #1548

Merged
merged 69 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
e3f78ef
convert triton inference client stage to be AsyncioRunnable -based.
cwharris Feb 27, 2024
f0edba3
error catching with no retry
cwharris Feb 27, 2024
56d9984
rm dead code
cwharris Feb 27, 2024
ff9acfa
utilize triton async infer
cwharris Feb 27, 2024
2d3eb52
triton inference client exponential backoff
cwharris Feb 29, 2024
21959d5
add triton inference client
cwharris Mar 4, 2024
be19bae
inference working again
cwharris Mar 4, 2024
c834f3c
update input/output mappings
cwharris Mar 5, 2024
7834573
remove extrenuous warnings
cwharris Mar 5, 2024
0f98804
add a test for triton inference c++ stage
cwharris Mar 9, 2024
258633c
remove unused params from InferenceClientStage
cwharris Mar 9, 2024
8fd3a4e
prepare for passing triton client factory
cwharris Mar 9, 2024
69b4b0f
use triton client interface
cwharris Mar 10, 2024
f8b6be8
failing because message is nullptr
cwharris Mar 10, 2024
e429589
first passing test with fake triton client
cwharris Mar 10, 2024
e0c4dbd
fake inference result
cwharris Mar 10, 2024
eca8020
fix output shape
cwharris Mar 10, 2024
8aed8b8
add names to exceptions
cwharris Mar 10, 2024
96ca47d
make fake triton client throw errors in all scenarios
cwharris Mar 11, 2024
c1d2d13
copyright header
cwharris Mar 11, 2024
ee53275
move triton inference implementation to .cpp
cwharris Mar 11, 2024
0eec287
rm unnecessary csv
cwharris Mar 11, 2024
bbbcc32
IInferenceClientSession
cwharris Mar 12, 2024
fb37bb9
fix_all
cwharris Mar 12, 2024
17dcabb
Merge branch 'branch-24.03' of github.com:nv-morpheus/Morpheus into t…
cwharris Mar 12, 2024
f24588f
prefer unique_ptr > shared_ptr
cwharris Mar 12, 2024
6f9c7ca
minor style changes
cwharris Mar 12, 2024
28ac932
fix input/output mappings
cwharris Mar 16, 2024
91ef1d4
fix inout mappings for python
cwharris Mar 16, 2024
e6466d5
remove RMMTensor::buffer
cwharris Mar 18, 2024
ddc0e3e
separate out triton inference logic from inference stage logic
cwharris Mar 18, 2024
a675c05
Merge branch 'branch-24.03' of github.com:nv-morpheus/Morpheus into t…
cwharris Mar 18, 2024
728ae44
include inference_client_stage.cpp in build
cwharris Mar 18, 2024
6ff49cb
address review feedback
cwharris Mar 27, 2024
76d3890
error if inout_mapping and input/output _mapping are specified together
cwharris Mar 27, 2024
0bf3c3e
address review feedback
cwharris Mar 27, 2024
bb0e963
address review feedback
cwharris Mar 27, 2024
dce24ff
address review feedback
cwharris Mar 27, 2024
de2d372
address review feedback
cwharris Mar 27, 2024
a1f01c7
address review feedback
cwharris Mar 27, 2024
6442cc3
address review feedback
cwharris Mar 27, 2024
656542b
rm debug statement
cwharris Mar 27, 2024
350b5f9
styles
cwharris Mar 27, 2024
901acb7
remove constructor test
cwharris Mar 27, 2024
063b47e
delete private member checks
cwharris Mar 27, 2024
34d6098
styles
cwharris Mar 27, 2024
d4b46aa
styles
cwharris Mar 27, 2024
4133cab
Merge branch 'branch-24.03' of github.com:nv-morpheus/Morpheus into t…
cwharris Mar 27, 2024
5e473b3
styles, again
cwharris Mar 27, 2024
f8e0942
styles
cwharris Mar 27, 2024
053ad8c
fix missing definitions
cwharris Mar 27, 2024
c542087
style
cwharris Mar 27, 2024
204ddea
style
cwharris Mar 27, 2024
d4c9846
avoid using shared_mutex for InferenceClientStage coroutines
cwharris Apr 1, 2024
f6f2182
Merge branch 'branch-24.03' of github.com:nv-morpheus/Morpheus into t…
cwharris Apr 1, 2024
49d6507
fix clang tidy errors
cwharris Apr 1, 2024
acfc2a1
fix clang-tidy errors
cwharris Apr 1, 2024
3fcd081
fix clang tidy errors
cwharris Apr 1, 2024
e8d2ca4
fix clang tidy errors
cwharris Apr 1, 2024
0079632
fix clang-tidy errors
cwharris Apr 1, 2024
7b8ab09
avoid race-condition when resetting inference client stage session
cwharris Apr 1, 2024
6c8bf75
styles
cwharris Apr 1, 2024
856c7fb
fix cli auto-registration error
cwharris Apr 2, 2024
5847127
fix triton inference stage auto registration errors
cwharris Apr 2, 2024
e3e38b5
fix styles
cwharris Apr 2, 2024
8cdc06f
fix styles
cwharris Apr 2, 2024
699eea2
fix tests
cwharris Apr 2, 2024
35c4daa
fix styles
cwharris Apr 2, 2024
1e63052
fix tests
cwharris Apr 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ Checks: >

WarningsAsErrors: >
*,
-clang-diagnostic-unused-command-line-argument
-clang-diagnostic-unused-command-line-argument,
-Wno-ignored-optimization-argument,
-Qunused-arguments

#WarningsAsErrors: '*'
HeaderFilterRegex: '.*\/include\/morpheus\/.*'
Expand Down
10 changes: 9 additions & 1 deletion examples/log_parsing/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,12 @@ def _convert_one_response(output: MultiResponseMessage, inf: MultiInferenceNLPMe
return MultiResponseMessage.from_message(inf, memory=memory, offset=inf.offset, count=inf.mess_count)

def _get_inference_worker(self, inf_queue: ProducerConsumerQueue) -> TritonInferenceLogParsing:
return TritonInferenceLogParsing(inf_queue=inf_queue, c=self._config, **self._kwargs)
return TritonInferenceLogParsing(inf_queue=inf_queue,
c=self._config,
server_url=self._server_url,
model_name=self._model_name,
force_convert_inputs=self._force_convert_inputs,
use_shared_memory=self._use_shared_memory,
input_mapping=self._input_mapping,
output_mapping=self._output_mapping,
needs_logits=self._needs_logits)
1 change: 1 addition & 0 deletions morpheus/_lib/cmake/libmorpheus.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ add_library(morpheus
src/stages/file_source.cpp
src/stages/filter_detection.cpp
src/stages/http_server_source_stage.cpp
src/stages/inference_client_stage.cpp
src/stages/kafka_source.cpp
src/stages/preprocess_fil.cpp
src/stages/preprocess_nlp.cpp
Expand Down
168 changes: 168 additions & 0 deletions morpheus/_lib/include/morpheus/stages/inference_client_stage.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include "morpheus/export.h"
#include "morpheus/messages/multi_inference.hpp"
#include "morpheus/messages/multi_response.hpp"
#include "morpheus/types.hpp"

#include <mrc/coroutines/async_generator.hpp>
#include <mrc/coroutines/scheduler.hpp>
#include <mrc/coroutines/task.hpp>
#include <mrc/segment/builder.hpp>
#include <mrc/segment/object.hpp>
#include <pybind11/pybind11.h>
#include <pymrc/asyncio_runnable.hpp>

#include <cstdint>
#include <map>
#include <memory>
#include <mutex>
#include <string>
#include <vector>

namespace morpheus {

struct MORPHEUS_EXPORT TensorModelMapping
{
/**
* @brief The field name to/from the model used for mapping
*/
std::string model_field_name;

/**
* @brief The field name to/from the tensor used for mapping
*/
std::string tensor_field_name;
};

class MORPHEUS_EXPORT IInferenceClientSession
{
public:
virtual ~IInferenceClientSession() = default;
/**
@brief Gets the inference input mappings
*/
virtual std::vector<TensorModelMapping> get_input_mappings(std::vector<TensorModelMapping> input_map_overrides) = 0;

/**
@brief Gets the inference output mappings
*/
virtual std::vector<TensorModelMapping> get_output_mappings(
std::vector<TensorModelMapping> output_map_overrides) = 0;

/**
@brief Invokes a single tensor inference
*/
virtual mrc::coroutines::Task<TensorMap> infer(TensorMap&& inputs) = 0;
};

class MORPHEUS_EXPORT IInferenceClient
{
public:
virtual ~IInferenceClient() = default;
/**
@brief Creates an inference session.
*/
virtual std::unique_ptr<IInferenceClientSession> create_session() = 0;
};

/**
* @addtogroup stages
* @{
* @file
*/

/**
* @brief Perform inference with Triton Inference Server.
* This class specifies which inference implementation category (Ex: NLP/FIL) is needed for inferencing.
*/
class MORPHEUS_EXPORT InferenceClientStage
: public mrc::pymrc::AsyncioRunnable<std::shared_ptr<MultiInferenceMessage>, std::shared_ptr<MultiResponseMessage>>
{
public:
using sink_type_t = std::shared_ptr<MultiInferenceMessage>;
using source_type_t = std::shared_ptr<MultiResponseMessage>;

/**
* @brief Construct a new Inference Client Stage object
*
* @param client : Inference client instance.
* @param model_name : Name of the model specifies which model can handle the inference requests that are sent to
* Triton inference
* @param needs_logits : Determines if logits are required.
* @param inout_mapping : Dictionary used to map pipeline input/output names to Triton input/output names. Use this
* if the Morpheus names do not match the model.
*/
InferenceClientStage(std::unique_ptr<IInferenceClient>&& client,
std::string model_name,
bool needs_logits,
std::vector<TensorModelMapping> input_mapping,
std::vector<TensorModelMapping> output_mapping);

/**
* Process a single MultiInferenceMessage by running the constructor-provided inference client against it's Tensor,
* and yields the result as a MultiResponseMessage
*/
mrc::coroutines::AsyncGenerator<std::shared_ptr<MultiResponseMessage>> on_data(
std::shared_ptr<MultiInferenceMessage>&& data, std::shared_ptr<mrc::coroutines::Scheduler> on) override;

private:
std::string m_model_name;
std::shared_ptr<IInferenceClient> m_client;
std::shared_ptr<IInferenceClientSession> m_session;
bool m_needs_logits{true};
std::vector<TensorModelMapping> m_input_mapping;
std::vector<TensorModelMapping> m_output_mapping;
std::mutex m_session_mutex;

int32_t m_retry_max = 10;
};

/****** InferenceClientStageInferenceProxy******************/
/**
* @brief Interface proxy, used to insulate python bindings.
*/
struct MORPHEUS_EXPORT InferenceClientStageInterfaceProxy
{
/**
* @brief Create and initialize a InferenceClientStage, and return the result
*
* @param builder : Pipeline context object reference
* @param name : Name of a stage reference
* @param model_name : Name of the model specifies which model can handle the inference requests that are sent to
* Triton inference
* @param server_url : Triton server URL.
* @param needs_logits : Determines if logits are required.
* @param inout_mapping : Dictionary used to map pipeline input/output names to Triton input/output names. Use this
* if the Morpheus names do not match the model.
* @return std::shared_ptr<mrc::segment::Object<InferenceClientStage>>
*/
static std::shared_ptr<mrc::segment::Object<InferenceClientStage>> init(
mrc::segment::Builder& builder,
const std::string& name,
std::string model_name,
std::string server_url,
bool needs_logits,
std::map<std::string, std::string> input_mapping,
std::map<std::string, std::string> output_mapping);
};
/** @} */ // end of group

} // namespace morpheus
Loading
Loading