[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers #92

ashwinb · 2024-09-23T19:29:57Z

This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.
We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
- routing model A to ollama and model B to a remote provider like Together
- routing shield A to local impl while shield B to a remote provider like Bedrock
- routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis
Support for provider specific parameters to be passed from the clients. A client can pass data using x_llamastack_provider_data parameter which can be type-checked and provided to the Adapter implementations.

Sample adapter implementation for Bedrock implementation of Guardrails

This reverts commit fa04d86.

This reverts commit 164d0e2.

This reverts commit 6a95edc.

This reverts commit 756e98c.

This reverts commit bc4ac2c.

This reverts commit d8fab77.

This reverts commit 08379f5.

…ety (#90) * fix llama stack build * fix configure * fix configure for simple case * configure w/ routing * move examples config * fix memory router naming * issue w/ safety * fix config w/ safety * update memory endpoints * allow providers in api_providers * configure script works * all endpoints w/ build->configure->run simple local works * new example run.yaml * run openapi generator

…ectly without naming collision

yanxi0830

🚀

This PR makes several core changes to the developer experience surrounding Llama Stack. Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.) However, this had a few drawbacks: you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later. the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model. the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term. What this PR does: This PR structures the run config with only a single prominent key: - providers Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models. providers: inference: - provider_id: foo provider_type: remote::tgi config: { ... } - provider_id: bar provider_type: remote::tgi config: { ... } Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error. When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.) The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs. Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods register_model list_models The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.) There are many other cleanups included some of which are detailed in a follow-up comment.

yanxi0830 and others added 30 commits September 20, 2024 11:22

example config

9bb6ce5

add new resolve_impls_with_routing

7d4135d

migrate router for memory wip

cda6111

delete router from providers

9c33587

clean up

3787408

Add a special header per-client call to parser provider data

90a59fd

safety API cleanup part 1

93e4ef3

Sample adapter implementation for Bedrock implementation of Guardrails

simple run config

8df53ac

backward compatibility

308a1d1

Update the meta reference safety implementation to match new API

51245a4

Update safety implementation inside agents

6e0f283

test safety against safety client

9e16b09

Further bug fixes

e5a7001

stage tmp changes

a6be32b

example config

73399fe

add new resolve_impls_with_routing

34f0c11

migrate router for memory wip

08379f5

delete router from providers

d8fab77

clean up

bc4ac2c

simple run config

756e98c

backward compatibility

6a95edc

stage tmp changes

164d0e2

Add a special header per-client call to parser provider data

fa04d86

Revert "Add a special header per-client call to parser provider data"

c22844f

This reverts commit fa04d86.

Revert "stage tmp changes"

3ea55d9

This reverts commit 164d0e2.

Revert "backward compatibility"

74765cc

This reverts commit 6a95edc.

Revert "simple run config"

50d95a6

This reverts commit 756e98c.

Revert "clean up"

515bec3

This reverts commit bc4ac2c.

Revert "delete router from providers"

3939611

This reverts commit d8fab77.

Revert "migrate router for memory wip"

cf8bd10

This reverts commit 08379f5.

ashwinb and others added 15 commits September 22, 2024 17:25

fix clients

622f143

bug fix with routing tables

bc39488

nuke safety/list_shields, we don't need it now

484dc2e

enhance the tracing span utility to make a context manager

5d75c24

opentelemetry -> jaeger

84ebed9

bug fixes to make this work, trace creation worked - spans dont yet

6e5ca13

move example configs to tests/

8cf634e

update openAPI

9210ee2

add memory_banks

21b844c

Undo ollama commenting lol

2d7ce81

update memory_banks method name to for openapi generator to work corr…

75357df

…ectly without naming collision

fix safety shield -> shield_type

98da002

Add strong_typing, add defaults

2f6ce08

Bug fix

ab46465

ashwinb requested review from yanxi0830, hardikjshah, dltn and raghotham as code owners September 23, 2024 19:29

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 23, 2024

ashwinb added 3 commits September 23, 2024 13:22

We don't need to confuse people by a complex config line when name works

54a73b4

fix sample

e4324f4

Add missing init files

af6b7ca

yanxi0830 approved these changes Sep 23, 2024

View reviewed changes

ashwinb merged commit ec4fc80 into main Sep 23, 2024
3 checks passed

ashwinb deleted the api_updates_3 branch September 23, 2024 21:22

yanxi0830 mentioned this pull request Sep 24, 2024

API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor #85

Closed

ashwinb mentioned this pull request Oct 7, 2024

Remove "routing_table" and "routing_key" concepts for the user #201

Merged

heyjustinai pushed a commit that referenced this pull request Nov 19, 2024

Corrected agent config key (#92)

681260b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers #92

[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers #92

ashwinb commented Sep 23, 2024

yanxi0830 left a comment

[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers #92

[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers #92

Conversation

ashwinb commented Sep 23, 2024

yanxi0830 left a comment

Choose a reason for hiding this comment