Add Health and Status endpoints to grpc spec #1387

ukclivecox · 2020-01-31T07:49:55Z

These are present in the python wrapper API but we need to consolidate the grpc and REST APIs.

Also connected to #1362

axsaucedo · 2020-02-16T08:57:49Z

I was thinking about this issue when working on #1447 as that would also have the same problem (ie not having these basic endpoints), including other endpoints like /metadata, /metrics, and /ready.

Also thinking through it, I realised that even if the microservice sets up a GRPC or KAFKA (or any other) server, the expectation would be that these basic endpoints are still consumable through rest (ie prometheus won't be consuming /metrics through GRPC)

When trying to solve this, the first idea was to leverage the already existing functionality in the python microservice.py file which would be able to start two servers through the start_servers(...) function.

I implemented this, and it successfully ran the KAFKA server, together with a REST server (in port 8000) which exposed /metadata, /health/status, /seldon.json, and /health/ping. This was basically implemented first by adding a new get_rest_basic_endpoints(...) function in microservice.py as follows:

def get_rest_basic_endpoints(user_model):
    app = Flask(__name__, static_url_path="")
    CORS(app)

    _set_flask_app_configs(app)

    if hasattr(user_model, "model_error_handler"):
        logger.info("Registering the custom error handler...")
        app.register_blueprint(user_model.model_error_handler)

    @app.errorhandler(SeldonMicroserviceException)
    def handle_invalid_usage(error):
        response = jsonify(error.to_dict())
        logger.error("%s", error.to_dict())
        response.status_code = error.status_code
        return response

    @app.route("/seldon.json", methods=["GET"])
    def openAPI():
        return send_from_directory("", "openapi/seldon.json")

    @app.route("/health/ping", methods=["GET"])
    def HealthPing():
        """
        Lightweight endpoint to check the liveness of the REST endpoint
        """
        return "pong"

    @app.route("/health/status", methods=["GET"])
    def HealthStatus():
        logger.debug("REST Health Status Request")
        response = seldon_core.seldon_methods.health_status(user_model)
        logger.debug("REST Health Status Response: %s", response)
        return jsonify(response)

    @app.route("/metadata", methods=["GET"])
    def Metadata():
        logger.debug("REST Metadata Request")
        response = seldon_core.seldon_methods.metadata(user_model)
        logger.debug("REST Metadata Response: %s", response)
        return jsonify(response)

    return app

Then this new secondary server was started by adding a new code block to add this to server2_func if the server is not a REST server, as follows:

    if hasattr(user_object, "custom_service") and callable(
        getattr(user_object, "custom_service")
    ):
        server2_func = user_object.custom_service
    elif args.api_type != "REST":
        def rest_prediction_server():
            options = {
                "bind": "%s:%s" % ("0.0.0.0", 8000),
                "access_logfile": "-",
                "loglevel": "info",
                "timeout": 5000,
                "reload": "true",
                "workers": args.workers,
                "max_requests": args.max_requests,
                "max_requests_jitter": args.max_requests_jitter,
            }
            app = seldon_microservice.get_rest_basic_endpoints(user_object)
            StandaloneApplication(app, user_object, options=options).run()

        logger.info("REST gunicorn microservice running on port %i", 8000)
        server2_func = rest_prediction_server

    else:
        server2_func = None

This would then successfully start the GRPC or KAFKA servers and concurrently it will also start the REST service.

This main problem with this approach is that we now would have two processes running which although point to the same class, they no longer are able to share the state of the object, and hence it would also be a problem because:

Now that the executor doesn't collect metrics, each pyhton/java microservice is responsible for exposing the custom metrics through the /metrics endpoint, so it's necessary to be able to modify state
Similarly this would apply for the /metadata endpoint, as well as the /ready endpoint

As a quick-fix we could decide that the /metadata and /metrics endpoint are static, and if the user wanted to add custom metadata or custom metrics, then we could explore exposing that through the "request metadata" which now that the executor doesn't modify the request it may be required to think about hwo this could be implemented thorugh the header data.

The other alternative is to think how this data could be passed through in a different way, such as through multiprocessing.Array or even Redis (such as implementing a set_metrics() or set_metadata function that modify the data in a persistent and thread safe storage). However it may be easier to just take the approach above initially.

What's your thoughts?

axsaucedo · 2020-02-16T09:20:23Z

As a small update, it seems that the multiprocessing thread-safe object approach may not be too much of a bad idea. I was able to implement metadata with a multiprocessing.Manager Dictionary, which seems to work pretty well. The implementation required the changes outlined below.

First I had to modify the start_servers function in microservice.py to ensure I can pass the thread safe dictionary to all the server functions as follows

def start_servers(target1: Callable, target2: Callable) -> None:
    """
    Start servers

    Parameters
    ----------
    target1
       Main flask process
    target2
       Auxilary flask process

    """
    manager = mp.Manager()
    metadata_dict = manager.dict()
    p2 = mp.Process(target=target2, args=(metadata_dict,))
    p2.daemon = True
    p2.start()

    target1(metadata_dict)

    p2.join()

Now I had to just ensure in the server functions I actually assign this thread safe dictionary. This was done as follows.

For the basic_rest_prediction_server function:

    elif args.api_type != "REST":
        def basic_rest_prediction_server(metadata_dict):
            options = {
                "bind": "%s:%s" % ("0.0.0.0", 8000),
                "access_logfile": "-",
                "loglevel": "info",
                "timeout": 5000,
                "reload": "true",
                "workers": args.workers,
                "max_requests": args.max_requests,
                "max_requests_jitter": args.max_requests_jitter,
            }
            user_object.metadata_dict = metadata_dict
            app = seldon_microservice.get_rest_basic_endpoints(user_object)
            StandaloneApplication(app, user_object, options=options).run()

        logger.info("REST gunicorn microservice running on port %i", port)
        server2_func = basic_rest_prediction_server

And for the Kafka / GRPC servers:

    elif args.api_type == "GRPC":

        def grpc_prediction_server(metadata_dict):

            user_object.metadata_dict = metadata_dict

            if args.tracing:
                from grpc_opentracing import open_tracing_server_interceptor

                logger.info("Adding tracer")
                interceptor = open_tracing_server_interceptor(tracer)
            else:
                interceptor = None

            server = seldon_microservice.get_grpc_server(
                user_object, annotations=annotations, trace_interceptor=interceptor
            )

            try:
                user_object.load()
            except (NotImplementedError, AttributeError):
                pass

            server.add_insecure_port(f"{host}:{port}")

            server.start()

            logger.info("GRPC microservice Running on port %i", port)
            while True:
                time.sleep(1000)

        server1_func = grpc_prediction_server

    elif args.api_type == "KAFKA":

        def kafka_prediction_server(metadata_dict):
            if args.tracing:
                from grpc_opentracing import open_tracing_server_interceptor

                logger.info("Adding tracer")
                interceptor = open_tracing_server_interceptor(tracer)
            else:
                interceptor = None

            user_object.metadata_dict = metadata_dict

            kafka_worker = seldon_microservice.get_kafka_worker(
                user_object,
                log_level=args.log_level,
                tracing=interceptor,
                host=host,
                port=port,
            )

            try:
                user_object.load()
            except (NotImplementedError, AttributeError):
                pass

            kafka_worker.execute_from_commandline()

What this means, is that now the user does not need to implement a metadata() function, and instead they only have to modify the metadata_dict object to ensure there is a change reflected. An example would be as follows:

class StreamingModel:
    def __init__(self):
        print("INITIALIZING STREAMINGMODEL")

    def predict(self, data, names=[], meta=[]):
        print(f"Inside predict: data [{data}] names [{names}] meta [{meta}]")
        if not "count" in self.metadata_dict:
            self.metadata_dict["count"] = 0
        else:
            self.metadata_dict["count"] += 1
        return data

    # This function would not be necessary once we implement it in our side
    def metadata(self):
        return dict(self.metadata_dict)

axsaucedo · 2020-02-16T09:24:30Z

One important thing to note however is that this would require a similar approach for the executor, especially on the metadata endpoint (which is currently being explored though #1362), as arguably when an external system wants to query metadata information of the model, they may want to query the metadata of the deployment/pod as a whole as opposed to querying the metadata of a container inside the pod. But something that would be worth thinking about when implementing this approach.

ukclivecox · 2020-06-25T11:04:41Z

Will be available in V2 dataplane in grpc spec.

ukclivecox added the triage Needs to be triaged and prioritised accordingly label Jan 31, 2020

ukclivecox added this to the 1.1 milestone Jan 31, 2020

ukclivecox self-assigned this Jan 31, 2020

ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Jan 31, 2020

ukclivecox modified the milestones: 1.1, 1.2 Jan 31, 2020

ukclivecox mentioned this issue Jan 31, 2020

Updated Seldon Data Plane #1389

Closed

axsaucedo mentioned this issue Feb 15, 2020

WIP: Seldon Core Async "Batch" Processing with Long Running Capability #1447

Closed

ukclivecox removed this from the 1.2 milestone Apr 23, 2020

ukclivecox added the priority/p1 label Apr 23, 2020

ukclivecox closed this as completed Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Health and Status endpoints to grpc spec #1387

Add Health and Status endpoints to grpc spec #1387

ukclivecox commented Jan 31, 2020

axsaucedo commented Feb 16, 2020 •

edited

Loading

axsaucedo commented Feb 16, 2020

axsaucedo commented Feb 16, 2020

ukclivecox commented Jun 25, 2020

Add Health and Status endpoints to grpc spec #1387

Add Health and Status endpoints to grpc spec #1387

Comments

ukclivecox commented Jan 31, 2020

axsaucedo commented Feb 16, 2020 • edited Loading

axsaucedo commented Feb 16, 2020

axsaucedo commented Feb 16, 2020

ukclivecox commented Jun 25, 2020

axsaucedo commented Feb 16, 2020 •

edited

Loading