Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate how session/context should be re-designed to work well for API use-cases #2182

Open
datajoely opened this issue Jan 9, 2023 Discussed in #2134 · 5 comments
Open
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Stage: User Research 🔬 Ticket needs to undergo user research before implementation

Comments

@datajoely
Copy link
Contributor

Discussed in #2134

Originally posted by illia-shkroba December 16, 2022
Hello.

I'm trying to build RestAPI with FastAPI that runs Kedro Pipeline under the hood and come up with this solution:

import pathlib
from typing import Any, Iterable

from fastapi import Depends, FastAPI
from kedro.framework.context import KedroContext
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project

app = FastAPI(
    title="FastAPI + Kedro",
    version="0.0.1",
    license_info={
        "name": "GNU GENERAL PUBLIC LICENSE",
        "url": "https://www.gnu.org/licenses/gpl-3.0.html",
    },
)


def get_session() -> Iterable[KedroSession]:
    bootstrap_project(pathlib.Path().cwd())
    with KedroSession.create() as session:
        yield session


def get_context(session: KedroSession = Depends(get_session)) -> Iterable[KedroContext]:
    yield session.load_context()


@app.get("/")
def index(
    session: KedroSession = Depends(get_session),
    context: KedroContext = Depends(get_context),
) -> dict[str, Any]:
    session.run("math")
    catalog = context.catalog
    return catalog.load("output")

session.run("math") runs a simple pipeline that calculates a variance for the input: [1, 2, 3].

The solution seems to work as expected, but it takes nearly 2.1 seconds to finish a request:

time curl http://127.0.0.1:8000
# curl http://127.0.0.1:8000  0.00s user 0.00s system 0% cpu 2.097 total

I've noticed that session.load_context() takes about 1 second to finish. Also I've found that load_context() is used by session.run():

        session_id = self.store["session_id"]
        save_version = session_id
        extra_params = self.store.get("extra_params") or {}
        context = self.load_context()

It seems that load_context() is called twice during the request:

  1. Inside of get_context().
  2. Inside of session.run().

I've tried to cache the result of session.load_context() like this:

def get_context(session: KedroSession = Depends(get_session)) -> Iterable[KedroContext]:
    context = session.load_context()
    session.load_context = lambda: context
    yield context

And by doing that I've decreased the request processing time to 1.06 seconds.

time curl http://127.0.0.1:8000
# curl http://127.0.0.1:8000  0.00s user 0.00s system 0% cpu 1.062 total

Do you have any suggestions on how I can further optimize the session.run()? Should I try a different approach with a plain DataCatalog/SequentialRunner? Or maybe Kedro's implementation of load_context() should be modified to use some caching?

@merelcht merelcht added Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation and removed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation labels Jan 26, 2023
@merelcht merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Feb 6, 2023
@merelcht
Copy link
Member

Related discussion: #2169 (comment)

@merelcht merelcht changed the title Investigate whether we can instantiate the context less fequently Investigate how session/context should be re-designed to work well for API use-cases Feb 27, 2023
@merelcht merelcht added the Stage: User Research 🔬 Ticket needs to undergo user research before implementation label Feb 27, 2023
@noklam
Copy link
Contributor

noklam commented Feb 28, 2023

I think this is related too, we need to document and understand what is the use case and what improvements we can make.

There are some questions we want to ask.

  • How common Kedro pipeline being exposed as a web endpoint?

Summary (To be updated)

  • Session can be used once only
  • Session creation is slow - creating a session for every API call is unsuitable because it runs lots of small pipelines. (significant overhead)
  • Runner is often used to get rid of the 1 session 1 run assumption, and directly interact with lower-level objects like DataCatalog.
  • API often need data injection (some parameters) - How to distribute and extend kedro pipelines #795 - How can we make Kedro pipeline work better with a RESTful API? Is there an easy way that user can pass extra data (common in a RESTful call with JSON) and trigger a Kedro pipeline?
  • What's the downside of using Runner?
    • The hook system is built for session instead of runner

@astrojuanlu
Copy link
Member

Moving this to the Session milestone

@astrojuanlu
Copy link
Member

In light of the interest that kedro-boot is getting (lots of mentions in Slack), that the authors @takikadiri and @Galileo-Galilei have already poured lots of thought on its design, and that we mostly agreed in Tech Design #2169 (comment) that this is an idea worth pursuing, are we ready for at least a first exploration of this issue from a technical standpoint?

@merelcht @rashidakanchwala I often hear that "the KedroSession was created for Experiment Tracking", do you happen to have any pointers? And besides, should kedro-org/kedro-viz#1624 be a blocker?

@datajoely
Copy link
Contributor Author

Can I please volunteer myself for a user interview on how my teams have approached this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Stage: User Research 🔬 Ticket needs to undergo user research before implementation
Projects
Status: No status
Development

No branches or pull requests

4 participants