feat(AI): Create abstractions for generic LLM calls within sentry #68771

JoshFerge · 2024-04-12T07:19:59Z

Background

For the User Feedback #61372 Spam Detection feature, we intend to use an LLM. Along with other use cases such as suggested fix, and code integration, there is a need across the Sentry codebase to be able to call LLMs.

Because Sentry is self hosted, some features use different LLMs and we want to provide modularity, we need to be able to configure different LLM providers, models, and usecases.

Solution:

We define an options based config for Providers and use cases, where you can specify a LLM provider's options and settings.
For each use case, you then define what LLM provider it uses, and what model.

Within sentry/llm we define a providers module, which consists of a base and implementations. To start, we have OpenAI, Google Vertex, and a Preview implementation used for testing. These will use the provider options to initialize a client and connect to the LLM provider. The providers inherit from LazyServiceWrapper.

Also within sentry/llm, we define a usecases module, which simply consists of a function complete_prompt, along with an enum of use cases. These options are passed to the LLM provider per use case, and can be configured via the above option.

Testing

I've added unit tests which mock the LLM calls, and I've tested in my local environment that calls to the actual services work.

In practice:

So to use an LLM, you do the following steps:

define your usecase in the usecase enum
Call the complete_prompt function with your usecase, prompt, content, temperature, and max_tokens)

Limitations:

Because each LLM right now has a different interface, some things that are specific, say to OpenAI like "function calling", where an output is guaranteed to be in a specific JSON format, this solution does not currently support. Advanced usecases beyond simple "prompt" + "text" and a singe output, are not currently supported. It is likely possible to add support for these on a case by case basis.

LLM providers are not quite to the point where they have standardized on a consistent API, which makes supporting these somewhat difficult. Third parties have come up with various solutions LangChain, LiteLLM, LocalAI, OpenRouter.

It will probably make sense eventually to adopt one of these tools, or our own advanced tooling, once our use cases outgrow this solution.

There is also a possible future where we want different use cases to use different API keys, but for now, one provider only has one set of credentials.

TODO

create develop docs for how to add a usecase, or new LLM provider
Follow up PR to replace suggested fix openai calls with new abstraction
PR in getsentry to set provider / usecase values for SaaS
PR followup to add telemetry information
We'll likely want to support streaming responses.

codecov · 2024-04-12T07:26:18Z

Bundle Report

Changes will decrease total bundle size by 552 bytes ⬇️

Bundle name	Size	Change
sentry-webpack-bundle-array-push	26.3MB	552 bytes ⬇️

JoshFerge · 2024-04-12T07:28:06Z

src/sentry/llm/usecases/__init__.py

+llm_provider_backends: dict[str, LlmModelBase] = {}
+
+
+def get_llm_provider_backend(usecase: LlmUseCase) -> LlmModelBase:


an alternative to this solution would be to use the ServiceDelegator, but it seemed a bit of a heavy lift for what we needed here. open to other implementation ideas here.

JoshFerge · 2024-04-12T07:34:07Z

src/sentry/llm/usecases/__init__.py

+
+
+def complete_prompt(
+    *,


i wasn't as familiar with this syntax, but this forces the rest of the arguments to be keyword arguments https://stackoverflow.com/a/14298976, which is a nice pythonic version of https://refactoring.com/catalog/introduceParameterObject.html

src/sentry/llm/usecases/__init__.py

JoshFerge · 2024-04-12T07:44:04Z

src/sentry/llm/providers/vertex.py

+logger = logging.getLogger(__name__)
+
+
+class VertexProvider(LlmModelBase):


will note that this is for "Palm 2" class models. Google, in classic fashion, has changed the API format for "Gemini" class models, we'll need to write another provider. will rename this one.

(We aren't using the GCP AI module because it seemed to be quite a heavy one). Will update the naming to reflect that.

bruno-garcia · 2024-04-12T11:43:56Z

src/sentry/llm/providers/vertex.py

+        if response.status_code == 200:
+            logger.info("Request successful.")
+        else:
+            logger.info(


would this mean we log into on 401, 500?

yeah, just added these to help with debugging as we were having issues getting this to work in prod, would more likely want to standardize it across the providers, but just leaving this in and planning to follow up with more telemetry

evanpurkhiser · 2024-04-12T20:28:46Z

src/sentry/llm/providers/base.py

+from sentry.utils.services import Service
+
+
+class LlmModelBase(Service):


Could use a docblock comment I think

evanpurkhiser · 2024-04-12T20:30:34Z

src/sentry/llm/providers/openai.py

+
+
+def get_openai_client(api_key: str) -> OpenAI:
+    # TODO: make this global?


will have to look into it, but the client may make an oauth token request, so not cacheing the client may have it do that on every request.

Ah yeah I see what you mean. Like initialize once 👍

If there is an oauth flow, does the client need handle refresh flows to renew tokens? Or are tokens long lived

good question, will look into that.

markstory · 2024-04-12T21:09:28Z

src/sentry/llm/providers/openai.py

+
+
+def get_openai_client(api_key: str) -> OpenAI:
+    # TODO: make this global?


If there is an oauth flow, does the client need handle refresh flows to renew tokens? Or are tokens long lived

src/sentry/llm/providers/base.py

src/sentry/llm/providers/vertex.py

src/sentry/llm/usecases/__init__.py

markstory · 2024-04-12T21:19:44Z

src/sentry/options/defaults.py

@@ -2390,3 +2390,23 @@
    default=[],
    flags=FLAG_PRIORITIZE_DISK | FLAG_AUTOMATOR_MODIFIABLE,
 )
+
+# Options for setting LLM providers and usecases
+register("llm.provider.options", default={}, flags=FLAG_NOSTORE)


Do you need to configure these with options-automator or change them at runtime? If you have no scenarios where you need to change this at runtime module level constants/dictionaries could be a simpler solution.

just thinking through what the easiest way for self hosted users to configure these -- is it right that they can simply modify their config.yml with options and they'll be picked up?

is it right that they can simply modify their config.yml with options and they'll be picked up?

Yes, option values can be defined in the yml/py config for self-hosted. Self-hosted also has a django settings file in python that they can update. Changes to either the yml or python will require a restart.

going with options so config.yml can be source of config for this, confirmed with open source team this makes sense.

tests/sentry/llm/test_openai.py

src/sentry/llm/providers/openai.py

JoshFerge · 2024-04-17T19:29:54Z

If there is an oauth flow, does the client need handle refresh flows to renew tokens? Or are tokens long lived

from what i understand, I think the openAI client itself handles this. we've been running this in prod for a bit now, so think it is okay.

asottile-sentry · 2024-04-17T20:31:58Z

src/sentry/llm/providers/base.py

+    def complete_prompt(
+        self,
+        *,
+        usecase_config: dict[str, Any],


this should probably be either a real type or a TypedDict -- especially in brand new code we should not be using dict[str, Any]

i tried typing with a typed dict and it proved to be annoying as each provider can have different options. can try again

Added a TypedDict with more narrow types, with options being dict[str,str]. doesn't seem like narrowing the specific types for each option set further from there is easy

src/sentry/llm/providers/openai.py

asottile-sentry · 2024-04-17T20:33:33Z

src/sentry/llm/providers/preview.py

+from sentry.llm.providers.base import LlmModelBase
+
+
+class PreviewLLM(LlmModelBase):


personally would try and avoid inheritance for this -- sentry historically has leaned on this a lot and it ends up being unmaintainably tangled (especially when attempting to type properly or reason about the implementation(s))

i feel like in this particular care inheritance seems sensible, as the different providers should all conform to the same interface -- curious what you think an alternative implementation could look like

codecov · 2024-04-17T21:20:55Z

Codecov Report

Attention: Patch coverage is 93.28358% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 79.72%. Comparing base (86c6aa6) to head (f491e5e).
Report is 67 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #68771      +/-   ##
==========================================
+ Coverage   79.70%   79.72%   +0.02%     
==========================================
  Files        6421     6431      +10     
  Lines      285320   285752     +432     
  Branches    49161    49245      +84     
==========================================
+ Hits       227420   227829     +409     
- Misses      57464    57487      +23     
  Partials      436      436

Files	Coverage Δ
src/sentry/conf/server.py	`89.18% <ø> (-0.04%)`	⬇️
src/sentry/llm/exceptions.py	`100.00% <100.00%> (ø)`
src/sentry/llm/providers/preview.py	`100.00% <100.00%> (ø)`
src/sentry/llm/types.py	`100.00% <100.00%> (ø)`
src/sentry/options/defaults.py	`100.00% <100.00%> (ø)`
src/sentry/llm/providers/base.py	`93.33% <93.33%> (ø)`
src/sentry/llm/providers/openai.py	`95.83% <95.83%> (ø)`
src/sentry/llm/usecases/__init__.py	`93.33% <93.33%> (ø)`
src/sentry/llm/providers/vertex.py	`84.61% <84.61%> (ø)`

... and 68 files with indirect coverage changes

…ferg/llmservice

jennmueng · 2024-04-17T23:14:01Z

src/sentry/llm/providers/base.py

+        prompt: str,
+        message: str,


I would make it clearer that the prompt is the system message as it holds a special role in most LLM models and is treated differently. Also, it's optional and not required, only the initial user message is.

got it. will add clarification.

jennmueng · 2024-04-17T23:17:25Z

src/sentry/llm/providers/openai.py

+            temperature=temperature
+            * 2,  # open AI temp range is [0.0 - 2.0], so we have to multiply by two


Scaling the temperature feels a bit weird to me but I can get behind this use case for allowing it to be easy to swap out models–however temperature scales differently across different models, would it make sense to allow it to be configurable in the use case config too. This way when someone switches out a model they can adjust to the according temp for it

yeah definitely. can follow up on this.

cmanallen

Please remove the commented code before merging. The rest of my feedback is optional.

src/sentry/llm/providers/base.py

cmanallen · 2024-04-18T13:14:22Z

src/sentry/llm/providers/openai.py

+
+@lru_cache(maxsize=1)
+def get_openai_client(api_key: str) -> OpenAI:
+    return OpenAI(api_key=api_key)


Why not a singleton?

i like this, more clear than this atm

cmanallen · 2024-04-18T13:19:01Z

src/sentry/llm/types.py

+
+class UseCaseConfig(TypedDict):
+    provider: str
+    options: dict[str, str]


Mentioning this because of all the commented config definitions. We could make the base class (or the protocol definition) generic over type T. This would allow you to pass specifically typed options to specific implementations. But only if you want to take the typing that far.

i tried doing this and my type-fu wasn't up to it :( can look to improve this if necessary in a future PR

src/sentry/llm/usecases/__init__.py

…8771) ### Background For the User Feedback #61372 Spam Detection feature, we intend to use an LLM. Along with other use cases such as suggested fix, and code integration, there is a need across the Sentry codebase to be able to call LLMs. Because Sentry is self hosted, some features use different LLMs and we want to provide modularity, we need to be able to configure different LLM providers, models, and usecases. ### Solution: - We define an options based config for Providers and use cases, where you can specify a LLM provider's options and settings. - For each use case, you then define what LLM provider it uses, and what model. Within `sentry/llm` we define a `providers` module, which consists of a base and implementations. To start, we have OpenAI, Google Vertex, and a Preview implementation used for testing. These will use the provider options to initialize a client and connect to the LLM provider. The providers inherit from `LazyServiceWrapper`. Also within `sentry/llm`, we define a `usecases` module, which simply consists of a function `complete_prompt`, along with an enum of use cases. These options are passed to the LLM provider per use case, and can be configured via the above option. ### Testing I've added unit tests which mock the LLM calls, and I've tested in my local environment that calls to the actual services work. ### In practice: So to use an LLM, you do the following steps: 1. define your usecase in the [usecase enum](https://github.com/getsentry/sentry/blob/a4e7a0e4af8c09a1d4007a3d7c53b71a2d4db5ff/src/sentry/llm/usecases/__init__.py#L14) 2. Call the `complete_prompt` function with your `usecase`, prompt, content, temperature, and max_tokens) ### Limitations: Because each LLM right now has a different interface, some things that are specific, say to OpenAI like "function calling", where an output is guaranteed to be in a specific JSON format, this solution does not currently support. Advanced usecases beyond simple "prompt" + "text" and a singe output, are not currently supported. It is likely possible to add support for these on a case by case basis. LLM providers are not quite to the point where they have standardized on a consistent API, which makes supporting these somewhat difficult. Third parties have come up with various solutions [LangChain](https://github.com/langchain-ai/langchain), [LiteLLM](https://github.com/BerriAI/litellm), [LocalAI](https://github.com/mudler/LocalAI), [OpenRouter](https://openrouter.ai/). It will probably make sense eventually to adopt one of these tools, or our own advanced tooling, once our use cases outgrow this solution. There is also a possible future where we want different use cases to use different API keys, but for now, one provider only has one set of credentials. ### TODO - [ ] create develop docs for how to add a usecase, or new LLM provider - [x] Follow up PR to replace suggested fix openai calls with new abstraction - [ ] PR in getsentry to set provider / usecase values for SaaS - [ ] PR followup to add telemetry information - [ ] We'll likely want to support streaming responses. --------- Co-authored-by: Michelle Zhang <56095982+michellewzhang@users.noreply.github.com> Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 12, 2024

vercel bot deployed to Preview April 12, 2024 07:26 View deployment

JoshFerge commented Apr 12, 2024

View reviewed changes

JoshFerge marked this pull request as ready for review April 12, 2024 07:28

JoshFerge requested a review from a team as a code owner April 12, 2024 07:28

vercel bot deployed to Preview April 12, 2024 07:31 View deployment

JoshFerge commented Apr 12, 2024

View reviewed changes

JoshFerge requested a review from a team April 12, 2024 07:34

vercel bot deployed to Preview April 12, 2024 07:34 View deployment

JoshFerge requested review from a team, mitsuhiko and evanpurkhiser April 12, 2024 07:34

JoshFerge commented Apr 12, 2024

View reviewed changes

src/sentry/llm/usecases/__init__.py Outdated Show resolved Hide resolved

JoshFerge commented Apr 12, 2024

View reviewed changes

bruno-garcia reviewed Apr 12, 2024

View reviewed changes

JoshFerge requested a review from markstory April 12, 2024 19:33

evanpurkhiser reviewed Apr 12, 2024

View reviewed changes

markstory reviewed Apr 12, 2024

View reviewed changes

michellewzhang reviewed Apr 16, 2024

View reviewed changes

src/sentry/llm/providers/openai.py Show resolved Hide resolved

JoshFerge mentioned this pull request Apr 17, 2024

Allow Azure OpenAPI Clients #68981

Closed

michellewzhang added 8 commits April 17, 2024 11:57

feat: add llm base service

f61a0e0

add openai

01fd861

typing

e363757

test w spam works

d102b56

openai configure

092e7fc

move openai_api_key to options

fb9d462

rename to complete_prompt

7a279da

abstract temp and max tokens

691c663

JoshFerge added 8 commits April 17, 2024 11:58

fit + finish

b2af123

clean up

b9b9061

clean up interfaces

9c98bbc

set defaults

944ee44

update comment

fd76152

validate temperature

529745d

fix type

7251815

pr feedback

13e5263

JoshFerge force-pushed the jferg/llmservice branch from 78b8cbe to 13e5263 Compare April 17, 2024 19:26

JoshFerge requested a review from a team as a code owner April 17, 2024 19:26

🛠️ apply pre-commit fixes

85436dc

vercel bot deployed to Preview April 17, 2024 19:34 View deployment

asottile-sentry reviewed Apr 17, 2024

View reviewed changes

JoshFerge added 2 commits April 17, 2024 14:32

use TypedDicts, and lru cache

0018b0b

Merge branch 'jferg/llmservice' of github.com:getsentry/sentry into j…

b00fcb7

…ferg/llmservice

vercel bot deployed to Preview April 17, 2024 22:33 View deployment

jennmueng reviewed Apr 17, 2024

View reviewed changes

trillville mentioned this pull request Apr 17, 2024

Use Azure OpenAI Libraries to Enable Azure OpenAI in addition to OpenAI integration #68972

Open

JoshFerge and others added 2 commits April 17, 2024 16:57

Merge branch 'master' into jferg/llmservice

41c8318

🛠️ apply pre-commit fixes

452cffd

vercel bot deployed to Preview April 18, 2024 00:01 View deployment

cmanallen approved these changes Apr 18, 2024

View reviewed changes

pr feedback

f491e5e

vercel bot deployed to Preview April 19, 2024 00:00 View deployment

JoshFerge merged commit 32f7e6f into master Apr 19, 2024
49 checks passed

JoshFerge deleted the jferg/llmservice branch April 19, 2024 15:38

github-actions bot locked and limited conversation to collaborators May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(AI): Create abstractions for generic LLM calls within sentry #68771

feat(AI): Create abstractions for generic LLM calls within sentry #68771

JoshFerge commented Apr 12, 2024 •

edited

Loading

codecov bot commented Apr 12, 2024 •

edited

Loading

JoshFerge Apr 12, 2024

JoshFerge Apr 12, 2024

JoshFerge Apr 12, 2024 •

edited

Loading

bruno-garcia Apr 12, 2024

JoshFerge Apr 12, 2024

evanpurkhiser Apr 12, 2024

evanpurkhiser Apr 12, 2024

JoshFerge Apr 12, 2024

evanpurkhiser Apr 12, 2024

JoshFerge Apr 12, 2024

markstory Apr 12, 2024

JoshFerge Apr 12, 2024

markstory Apr 12, 2024

markstory Apr 12, 2024

JoshFerge Apr 12, 2024

markstory Apr 15, 2024

JoshFerge Apr 17, 2024

JoshFerge commented Apr 17, 2024

asottile-sentry Apr 17, 2024

JoshFerge Apr 17, 2024

JoshFerge Apr 17, 2024

asottile-sentry Apr 17, 2024

JoshFerge Apr 17, 2024

codecov bot commented Apr 17, 2024 •

edited

Loading

jennmueng Apr 17, 2024

JoshFerge Apr 19, 2024

jennmueng Apr 17, 2024

JoshFerge Apr 19, 2024

cmanallen left a comment

cmanallen Apr 18, 2024

JoshFerge Apr 18, 2024

cmanallen Apr 18, 2024

JoshFerge Apr 19, 2024

		llm_provider_backends: dict[str, LlmModelBase] = {}


		def get_llm_provider_backend(usecase: LlmUseCase) -> LlmModelBase:

		logger = logging.getLogger(__name__)


		class VertexProvider(LlmModelBase):

		from sentry.utils.services import Service


		class LlmModelBase(Service):



		def get_openai_client(api_key: str) -> OpenAI:
		# TODO: make this global?

		from sentry.llm.providers.base import LlmModelBase


		class PreviewLLM(LlmModelBase):

		temperature=temperature
		* 2, # open AI temp range is [0.0 - 2.0], so we have to multiply by two

feat(AI): Create abstractions for generic LLM calls within sentry #68771

feat(AI): Create abstractions for generic LLM calls within sentry #68771

Conversation

JoshFerge commented Apr 12, 2024 • edited Loading

Background

Solution:

Testing

In practice:

Limitations:

TODO

codecov bot commented Apr 12, 2024 • edited Loading

Bundle Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshFerge Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshFerge commented Apr 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 17, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmanallen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshFerge commented Apr 12, 2024 •

edited

Loading

codecov bot commented Apr 12, 2024 •

edited

Loading

JoshFerge Apr 12, 2024 •

edited

Loading

codecov bot commented Apr 17, 2024 •

edited

Loading