Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(AI): Create abstractions for generic LLM calls within sentry #68771

Merged
merged 24 commits into from
Apr 19, 2024

Conversation

JoshFerge
Copy link
Member

@JoshFerge JoshFerge commented Apr 12, 2024

Background

For the User Feedback #61372 Spam Detection feature, we intend to use an LLM. Along with other use cases such as suggested fix, and code integration, there is a need across the Sentry codebase to be able to call LLMs.

Because Sentry is self hosted, some features use different LLMs and we want to provide modularity, we need to be able to configure different LLM providers, models, and usecases.

Solution:

  • We define an options based config for Providers and use cases, where you can specify a LLM provider's options and settings.
  • For each use case, you then define what LLM provider it uses, and what model.

Within sentry/llm we define a providers module, which consists of a base and implementations. To start, we have OpenAI, Google Vertex, and a Preview implementation used for testing. These will use the provider options to initialize a client and connect to the LLM provider. The providers inherit from LazyServiceWrapper.

Also within sentry/llm, we define a usecases module, which simply consists of a function complete_prompt, along with an enum of use cases. These options are passed to the LLM provider per use case, and can be configured via the above option.

Testing

I've added unit tests which mock the LLM calls, and I've tested in my local environment that calls to the actual services work.

In practice:

So to use an LLM, you do the following steps:

  1. define your usecase in the usecase enum
  2. Call the complete_prompt function with your usecase, prompt, content, temperature, and max_tokens)

Limitations:

Because each LLM right now has a different interface, some things that are specific, say to OpenAI like "function calling", where an output is guaranteed to be in a specific JSON format, this solution does not currently support. Advanced usecases beyond simple "prompt" + "text" and a singe output, are not currently supported. It is likely possible to add support for these on a case by case basis.

LLM providers are not quite to the point where they have standardized on a consistent API, which makes supporting these somewhat difficult. Third parties have come up with various solutions LangChain, LiteLLM, LocalAI, OpenRouter.

It will probably make sense eventually to adopt one of these tools, or our own advanced tooling, once our use cases outgrow this solution.

There is also a possible future where we want different use cases to use different API keys, but for now, one provider only has one set of credentials.

TODO

  • create develop docs for how to add a usecase, or new LLM provider
  • Follow up PR to replace suggested fix openai calls with new abstraction
  • PR in getsentry to set provider / usecase values for SaaS
  • PR followup to add telemetry information
  • We'll likely want to support streaming responses.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 12, 2024
Copy link

codecov bot commented Apr 12, 2024

Bundle Report

Changes will decrease total bundle size by 552 bytes ⬇️

Bundle name Size Change
sentry-webpack-bundle-array-push 26.3MB 552 bytes ⬇️

llm_provider_backends: dict[str, LlmModelBase] = {}


def get_llm_provider_backend(usecase: LlmUseCase) -> LlmModelBase:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative to this solution would be to use the ServiceDelegator, but it seemed a bit of a heavy lift for what we needed here. open to other implementation ideas here.

@JoshFerge JoshFerge marked this pull request as ready for review April 12, 2024 07:28
@JoshFerge JoshFerge requested a review from a team as a code owner April 12, 2024 07:28


def complete_prompt(
*,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wasn't as familiar with this syntax, but this forces the rest of the arguments to be keyword arguments https://stackoverflow.com/a/14298976, which is a nice pythonic version of https://refactoring.com/catalog/introduceParameterObject.html

logger = logging.getLogger(__name__)


class VertexProvider(LlmModelBase):
Copy link
Member Author

@JoshFerge JoshFerge Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will note that this is for "Palm 2" class models. Google, in classic fashion, has changed the API format for "Gemini" class models, we'll need to write another provider. will rename this one.

(We aren't using the GCP AI module because it seemed to be quite a heavy one). Will update the naming to reflect that.

if response.status_code == 200:
logger.info("Request successful.")
else:
logger.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this mean we log into on 401, 500?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, just added these to help with debugging as we were having issues getting this to work in prod, would more likely want to standardize it across the providers, but just leaving this in and planning to follow up with more telemetry

@JoshFerge JoshFerge requested a review from markstory April 12, 2024 19:33
from sentry.utils.services import Service


class LlmModelBase(Service):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a docblock comment I think



def get_openai_client(api_key: str) -> OpenAI:
# TODO: make this global?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will have to look into it, but the client may make an oauth token request, so not cacheing the client may have it do that on every request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I see what you mean. Like initialize once 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an oauth flow, does the client need handle refresh flows to renew tokens? Or are tokens long lived

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, will look into that.



def get_openai_client(api_key: str) -> OpenAI:
# TODO: make this global?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an oauth flow, does the client need handle refresh flows to renew tokens? Or are tokens long lived

src/sentry/llm/providers/base.py Outdated Show resolved Hide resolved
src/sentry/llm/providers/vertex.py Outdated Show resolved Hide resolved
src/sentry/llm/usecases/__init__.py Outdated Show resolved Hide resolved
src/sentry/llm/usecases/__init__.py Outdated Show resolved Hide resolved
@@ -2390,3 +2390,23 @@
default=[],
flags=FLAG_PRIORITIZE_DISK | FLAG_AUTOMATOR_MODIFIABLE,
)

# Options for setting LLM providers and usecases
register("llm.provider.options", default={}, flags=FLAG_NOSTORE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to configure these with options-automator or change them at runtime? If you have no scenarios where you need to change this at runtime module level constants/dictionaries could be a simpler solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking through what the easiest way for self hosted users to configure these -- is it right that they can simply modify their config.yml with options and they'll be picked up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it right that they can simply modify their config.yml with options and they'll be picked up?

Yes, option values can be defined in the yml/py config for self-hosted. Self-hosted also has a django settings file in python that they can update. Changes to either the yml or python will require a restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going with options so config.yml can be source of config for this, confirmed with open source team this makes sense.

tests/sentry/llm/test_openai.py Outdated Show resolved Hide resolved
@JoshFerge JoshFerge requested a review from a team as a code owner April 17, 2024 19:26
@JoshFerge
Copy link
Member Author

If there is an oauth flow, does the client need handle refresh flows to renew tokens? Or are tokens long lived

from what i understand, I think the openAI client itself handles this. we've been running this in prod for a bit now, so think it is okay.

def complete_prompt(
self,
*,
usecase_config: dict[str, Any],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be either a real type or a TypedDict -- especially in brand new code we should not be using dict[str, Any]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried typing with a typed dict and it proved to be annoying as each provider can have different options. can try again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TypedDict with more narrow types, with options being dict[str,str]. doesn't seem like narrowing the specific types for each option set further from there is easy

src/sentry/llm/providers/openai.py Outdated Show resolved Hide resolved
from sentry.llm.providers.base import LlmModelBase


class PreviewLLM(LlmModelBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally would try and avoid inheritance for this -- sentry historically has leaned on this a lot and it ends up being unmaintainably tangled (especially when attempting to type properly or reason about the implementation(s))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like in this particular care inheritance seems sensible, as the different providers should all conform to the same interface -- curious what you think an alternative implementation could look like

Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 93.28358% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 79.72%. Comparing base (86c6aa6) to head (f491e5e).
Report is 67 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #68771      +/-   ##
==========================================
+ Coverage   79.70%   79.72%   +0.02%     
==========================================
  Files        6421     6431      +10     
  Lines      285320   285752     +432     
  Branches    49161    49245      +84     
==========================================
+ Hits       227420   227829     +409     
- Misses      57464    57487      +23     
  Partials      436      436              
Files Coverage Δ
src/sentry/conf/server.py 89.18% <ø> (-0.04%) ⬇️
src/sentry/llm/exceptions.py 100.00% <100.00%> (ø)
src/sentry/llm/providers/preview.py 100.00% <100.00%> (ø)
src/sentry/llm/types.py 100.00% <100.00%> (ø)
src/sentry/options/defaults.py 100.00% <100.00%> (ø)
src/sentry/llm/providers/base.py 93.33% <93.33%> (ø)
src/sentry/llm/providers/openai.py 95.83% <95.83%> (ø)
src/sentry/llm/usecases/__init__.py 93.33% <93.33%> (ø)
src/sentry/llm/providers/vertex.py 84.61% <84.61%> (ø)

... and 68 files with indirect coverage changes

Comment on lines +33 to +34
prompt: str,
message: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it clearer that the prompt is the system message as it holds a special role in most LLM models and is treated differently. Also, it's optional and not required, only the initial user message is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. will add clarification.

Comment on lines +25 to +26
temperature=temperature
* 2, # open AI temp range is [0.0 - 2.0], so we have to multiply by two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scaling the temperature feels a bit weird to me but I can get behind this use case for allowing it to be easy to swap out models–however temperature scales differently across different models, would it make sense to allow it to be configurable in the use case config too. This way when someone switches out a model they can adjust to the according temp for it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah definitely. can follow up on this.

Copy link
Member

@cmanallen cmanallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the commented code before merging. The rest of my feedback is optional.

src/sentry/llm/providers/base.py Show resolved Hide resolved

@lru_cache(maxsize=1)
def get_openai_client(api_key: str) -> OpenAI:
return OpenAI(api_key=api_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a singleton?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this, more clear than this atm


class UseCaseConfig(TypedDict):
provider: str
options: dict[str, str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning this because of all the commented config definitions. We could make the base class (or the protocol definition) generic over type T. This would allow you to pass specifically typed options to specific implementations. But only if you want to take the typing that far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried doing this and my type-fu wasn't up to it :( can look to improve this if necessary in a future PR

src/sentry/llm/usecases/__init__.py Outdated Show resolved Hide resolved
src/sentry/llm/usecases/__init__.py Show resolved Hide resolved
src/sentry/llm/usecases/__init__.py Outdated Show resolved Hide resolved
src/sentry/llm/usecases/__init__.py Outdated Show resolved Hide resolved
src/sentry/llm/usecases/__init__.py Show resolved Hide resolved
@JoshFerge JoshFerge merged commit 32f7e6f into master Apr 19, 2024
49 checks passed
@JoshFerge JoshFerge deleted the jferg/llmservice branch April 19, 2024 15:38
MichaelSun48 pushed a commit that referenced this pull request Apr 25, 2024
…8771)

### Background
For the User Feedback #61372
Spam Detection feature, we intend to use an LLM. Along with other use
cases such as suggested fix, and code integration, there is a need
across the Sentry codebase to be able to call LLMs.

Because Sentry is self hosted, some features use different LLMs and we
want to provide modularity, we need to be able to configure different
LLM providers, models, and usecases.

### Solution:

- We define an options based config for Providers and use cases, where
you can specify a LLM provider's options and settings.
- For each use case, you then define what LLM provider it uses, and what
model.


Within `sentry/llm` we define a `providers` module, which consists of a
base and implementations. To start, we have OpenAI, Google Vertex, and a
Preview implementation used for testing. These will use the provider
options to initialize a client and connect to the LLM provider. The
providers inherit from `LazyServiceWrapper`.

Also within `sentry/llm`, we define a `usecases` module, which simply
consists of a function `complete_prompt`, along with an enum of use
cases. These options are passed to the LLM provider per use case, and
can be configured via the above option.


### Testing
I've added unit tests which mock the LLM calls, and I've tested in my
local environment that calls to the actual services work.


### In practice:
So to use an LLM, you do the following steps:
1. define your usecase in the [usecase
enum](https://github.com/getsentry/sentry/blob/a4e7a0e4af8c09a1d4007a3d7c53b71a2d4db5ff/src/sentry/llm/usecases/__init__.py#L14)
2. Call the `complete_prompt` function with your `usecase`, prompt,
content, temperature, and max_tokens)



### Limitations:

Because each LLM right now has a different interface, some things that
are specific, say to OpenAI like "function calling", where an output is
guaranteed to be in a specific JSON format, this solution does not
currently support. Advanced usecases beyond simple "prompt" + "text" and
a singe output, are not currently supported. It is likely possible to
add support for these on a case by case basis.

LLM providers are not quite to the point where they have standardized on
a consistent API, which makes supporting these somewhat difficult. Third
parties have come up with various solutions
[LangChain](https://github.com/langchain-ai/langchain),
[LiteLLM](https://github.com/BerriAI/litellm),
[LocalAI](https://github.com/mudler/LocalAI),
[OpenRouter](https://openrouter.ai/).

It will probably make sense eventually to adopt one of these tools, or
our own advanced tooling, once our use cases outgrow this solution.

There is also a possible future where we want different use cases to use
different API keys, but for now, one provider only has one set of
credentials.



### TODO

- [ ] create develop docs for how to add a usecase, or new LLM provider
- [x] Follow up PR to replace suggested fix openai calls with new
abstraction
- [ ] PR in getsentry to set provider / usecase values for SaaS
- [ ] PR followup to add telemetry information
- [ ] We'll likely want to support streaming responses.

---------

Co-authored-by: Michelle Zhang <56095982+michellewzhang@users.noreply.github.com>
Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators May 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants