Upcoming architecture changes for Langfuse 3.0 (self-hosted) #1902

maxdeichmann · 2024-04-29T12:36:04Z

maxdeichmann
Apr 29, 2024
Maintainer

Hi all,

Langfuse is growing a lot, both in feature scope as well as in usage on single instances. Thus we plan for a couple of changes that will be released in Langfuse v3.

Important

A Langfuse v3 developer preview is available. Please see this comment for the current release timeline and more details. Feedback is very much appreciated!

We are currently required to mature our architecture as we are working on the following challenges:
✅ Building model-based evals, which requires us to run asynchronous tasks, rate limited, with failover capabilities.
🧑‍🍳 Improve performance as instances scale out.

I wanted to give you a heads up on upcoming changes which are required to make these features work. Currently, Langfuse contains a single Docker container, which takes care of everything we do. This was fast to set up Langfuse initially, but we need more technical capabilities now. In addition to the existing components (Docker container + Postgres database), we will add the following:

Asynchronous processing in a dedicated container
Redis container (for queues)
OLAP storage (likely Clickhouse) to enable lower latencies on analytical queries and scale out storage for large and multi modal inputs outputs

If you self-host Langfuse, this means that we will likely advise to change to the following setup to be able to benefit from new infra changes easily. We are happy to hear your thoughts on this:

Migrate from a container service (e.g. GCP Cloud Run) to a Virtual Machine to run docker-compose from there. Docker-compose helps to quickly bootstrap the new components.
Either provide connection strings to the docker-compose for Redis, Postgres, and Clickhouse for managed databases or run all of them on the virtual machine as well.

Feel free to share your thoughts below on these topics:

Tool choice: do you have experience with the tools mentioned above and have advise for us?
Do you have ideas on how to make self-hosting easier in a multi-container, multi-database setup than docker compose?
Do you have ideas on how a great migration path could look like?

Find more context in the last Langfuse Townhall meeting. We will provide an easy to follow upgrade path for self-hosters once v3 is generally available. The infrastructure change does not affect public APIs, thus, users of Langfuse Cloud will not be affected by this change. Currently we pilot the async container & queue for the evals feature which is currently in public beta on Langfuse Cloud.

UPDATE JULY 22nd

A more detailled overview of planned changes:

Architectures v2 vs. v3

Containers
- web container: hosts public api, and all resources for the user interface
- worker container: asynchronous processes, no exposed ports
Databases
- Redis used as cache and queue
- Postgres stores transactional data such as projects or API keys
- Clickhouse stores tracing data generated by the SDKs. This database will do most of the processing as our server will insert all the SDK data and read it for tables and dashboards.

Next to the core application, an application load balancer for TLS termination and routing of requests to the Web container is necessary. We use nginx but you can also use e.g. the fully managed AWS load balancer.

Upgrade path from v2 to v3

Thousands of teams run on Langfuse (~400k docker pulls)
→ we aim to offer the easiest migration experience that is automated and documented

Guidance on scalability of dockerized datastores, when are fully-managed datastores necessary, including at-rest encryption with custom keys
Guidance of reliable auto-scaling configuration of the application and databases
Migration script to move data from Postgres to Clickhouse, and/or to Clickhouse and a new Postgres instance

Application deployment

You will be able to deploy the containers kia Kubernetes or your own Container Deployment Service (such as Google Cloud Run) or via docker compose on a virtual machine. In either case, you will also be able to use dockerized databases or you can provide us with connection strings for managed databases.

For low-volume/non-production deployments, dockerized DBs + docker compose is a sensible option to keep complexity low. We will publish guidance on when options 3 and 4 are necessary.

DB deployment

Databases (see above): redis, postgres, clickhouse

Low-volume

All datastores are available as single docker containers
Can be bundled in docker compose or EKS/ECS/K8S deployment

High-volume / fully-managed → databases external of application cluster

Redis
- No special requirements
- AWS: we chose ElastiCache (Redis OSS) for a fully managed Redis instance
Postgres
- No special requirements compared to other applications.
- AWS: we chose RDS for a fully-managed Postgres instance
Clickhouse
- Main data store, thus this database will scale the most.
- AWS:
  - There is no RDS-equivalent for Clickhouse on AWS
  - We use Clickhouse Cloud which is the fully managed Clickhouse database managed by the Clickhouse team. This can be run on an AWS region of your choice with VPC peering for private access and data security.
  - There are potentially additional vendors for managed Clickhouse such as DoubleCloud or Altinity.
  - Alternatively, you can run Clickhouse yourself in ECS, e.g. by using this template.

We will provide guidance at which scale high-volume/fully-managed clickhouse is necessary. On hosted Langfuse, we are currently in the process of migrating to arrive at a scalable architecture. Once we are done with the migration, we will release 3.0. We will keep you posted here on updates regarding the migration.

FAQ

Will v3 introduce any non-OSS components that require additional licensing? No, all new infrastructure components (Clickhouse, Redis/Valkey, S3/minio) can be self-hosted without any non-FOSS dependency or licensing requirements.
How long will there be security updates for v2? We will support security updates for V2 until at least end of Q1 of 2025. If you require extended support, please reach out to us.

Timeline

We currently test many of the v3 infrastructure pieces on Langfuse Cloud. We will release v3 once all of these changes are "battle-tested" to be sure that this is a smooth transition for everyone self-hosting Langfuse without uncertainties. While there is no strict timeline, we aim for a release in late November.

MarkEdmondson1234 · 2024-04-30T06:10:14Z

MarkEdmondson1234
Apr 30, 2024

As requested from Discord, my comment:

I really do not want to move off serverless infra to a dedicated VM, I'd say a major reason I chose Langfuse was its Cloud Run deployment I could couple with my existing AlloyDB. And perhaps AlloyDB is quick enough it doesn't need help with analytical queries.

Cloud Run recently introduced side car containers so perhaps that is an option?
https://cloud.google.com/blog/products/serverless/cloud-run-now-supports-multi-container-deployments

There is a managed Redis option too but its a bit more pricy.

My serverless deployment at the moment I don't pay for Langfuse until I'm browsing the UI or its capturing traces, aside the already sunk cost of the database.

0 replies

noorvir · 2024-04-30T13:26:01Z

noorvir
Apr 30, 2024

The current docs advise not to use Docker Compose for production. I'm guessing that will change for v3? I would be great to have a ready to deploy docker compose that just needs an env file to get started. I'd use the cloud hosted service but have relatively strict data privacy requirements.

10 replies

marcklingen May 28, 2024
Maintainer

awesome, thank you @mautini 🙏 We are currently finalizing many infra pieces, will definitely reach out!

benobytes May 28, 2024

Our current stack uses langfuse in Kubernetes so I will be happy to assist as well!

verdverm May 28, 2024

We use k8s operators for the redis, postgres, clickhouse components. Would want a way to have develop vs production setups, which is probably easy from the chart perspective, more about docs / examples

marcklingen May 28, 2024
Maintainer

I think it's sensible to do this similar as the current chart, i.e. provide two options:

dev/staging, low-volume prod: deploy all components within k8s
high-volume prod: add api keys for dedicated/managed redis, clickhouse, postgres

DevBey Jul 1, 2024

big up vote, k8s is the way to go.

Manouchehri · 2024-04-30T13:56:16Z

Manouchehri
Apr 30, 2024

I'm hesitant on a more complicated docker-compose setup. One of the reasons we were able to open to using Langfuse to begin with, was how easy it was to deploy on a serverless platform.

3 replies

nikcaryo-super Apr 30, 2024

copying from discord, GCP offers: https://cloud.google.com/blog/products/serverless/cloud-run-now-supports-multi-container-deployments but I'm not sure exactly how those would work for asynchronous tasks (e.g. does the sidecar running an async task continue running if the request finishes and the cpu is unallocated?)

maybe there's an architecture where async tasks are just sent as webhooks to whatever "langfuse-worker" url people have setup? and then GCP people can just have a langfuse-worker serverless instance / pubsub task queue or whatever.

MarkEdmondson1234 Apr 30, 2024

That’s how the rest of my infra works, pubsub events to Cloud Run scale to 0 workers

marcklingen Apr 30, 2024
Maintainer

agree, would love to figure out together how to best run this on Google Cloud Run to potentially scale to 0. The worker is necessary to power async workloads (evals, tokenization) but could also scale to zero if the queue could be in pub/sub (GCP specific). I/m unsure how to find a generalizable solution here but we'll figure it out

fernandocamargoai · 2024-04-30T20:41:28Z

fernandocamargoai
Apr 30, 2024

It feels weird to deploy using docker-compose. For me, it's always have been a good tool to use during development, but not really for production.

My team is currently planning to deploy it. We haven't decided if we'll deploy to kubernetes or Cloud Run though. We'll stay tuned to see what will work best for us.

6 replies

Manouchehri Apr 30, 2024

As a team with k8s experience.. we try to avoid it at all costs. 😬 We like Langfuse enough that we'd probably cave and do it if needed, but not our first choice.

marcklingen Apr 30, 2024
Maintainer

Agree that we’d want to make it work on GCP/CloudRun and comparable solutions as it is just much easier. There’s no way really around it as without an additional container, it’s seems close to impossible to run async workloads

MarkEdmondson1234 May 1, 2024

There are a few options for async tasks, like Batch, cloud run jobs, workflows? GCP specific though. k8s is most generic but yeah, I’d prefer that only as a last option

zoltan-fedor Jul 9, 2024

We do run Langfuse today on k8s (our team has lots of k8s experience - we run everything there with the exception of databases).
I wrote our own Helm chart for Langfuse (includes HPA (autoscaling), ingress, etc), which we use for about 1/2 a year with no issues both in prod and preprod.
I would advise that Langfuse includes a Helm chart for production-level deployments next to the docker compose option, especially once it become multi-container with v3.0

Also Redis should be allowed to be external (running Redis in a container in K8s is not really a good architecture) and connection to it parametrized just like Postgres is. I am sure I will be able to make it so in our Helm charts, but it could be useful for those who are less experienced. We already run Redis external to K8s.
We have no Clickhouse, but I am sure we will be able to set that up on K8s if required.

EDIT: I just saw that further own @marcklingen has already described basically the same I was recommending here (external database connection option + official helm chart offered by Langfuse). See https://github.com/orgs/langfuse/discussions/1902#discussioncomment-9729809

marcklingen Jul 15, 2024
Maintainer

thanks for sharing you thoughts @zoltan-fedor 🙏

chai3 · 2024-05-10T14:25:28Z

chai3
May 10, 2024

I want Radis and OLAP Clickhouse to be keep opt-in.

I am deploying LangFuse on AWS with a simple architecture.
Just AWS AppRunner and RDS.
AWS AppRunner only supports a single Docker image and cannot use docker compose and sidecar.
ECS, EC2, Radis, and ALB are expensive, but AppRunner only needs to pay the CPU cost when there is a request by user, and if there is no request, it is very cheap at 0.007 USD / GB-hour(If 2GB memory, 5 USD/month)

https://github.com/AI4Organization/langfuse-ecr-ecs-deployment-cdk/blob/f21d237324f9c5727227915a57523a7ead779720/lib/langfuse-ecr-apprunner-deployment-cdk-stack.ts

1 reply

chai3 Aug 10, 2024

I found a similar discussion and v2 implementation with AWS CDK

https://github.com/orgs/langfuse/discussions/2863

https://github.com/aaronsu11/langfuse-on-aws

qymab · 2024-05-12T10:04:02Z

qymab
May 12, 2024

Any specific reason why not providing arm based images ?

2 replies

marcklingen May 12, 2024
Maintainer

See self-hosting docs

Are there prebuilt ARM images available? No, currently we do not publish official ARM images. However, you can build your own ARM images using the Dockerfile in the Langfuse repository.

No specific reason, open for contributions on this!

marcklingen Jul 2, 2024
Maintainer

First version to include arm image was just released: https://github.com/langfuse/langfuse/releases/tag/v2.58.0
Thanks again @rennokki for the contribution

newcworld · 2024-05-24T15:15:19Z

newcworld
May 24, 2024

when clickhouse, I think it's great

0 replies

aiakubovich · 2024-05-29T22:58:40Z

aiakubovich
May 29, 2024

I'm not a big fan of docker-compose and we unable to use docker-compose files for deployments. Our requirements dictate that we first build an image using an ARM template, scan it with an infosec tool like Aqua in the CI/CD pipeline, and then deploy it. It would be great if there is option to deploy each container as stand along app.

1 reply

marcklingen May 30, 2024
Maintainer

makes sense, thanks for sharing!

tiagoantao · 2024-06-10T14:25:47Z

tiagoantao
Jun 10, 2024

We are currently very happy users of Langfuse 2 (massive internal adoption success).
There are a few of issues for us with the new architecture:

Maintaining a new OLAP solution. We would prefer not to have it or to be able to integrate it with our existing OLAP solution
We would prefer a container only solution vs a VM based solution. This is purely because our current deployment is container based and we have no VMs
Better yet, containers without nothing attached (e.g. Clickhouse would require attaching volumes).

Thank you for considering our requirements

2 replies

marcklingen Jun 10, 2024
Maintainer

Thanks for you feedback here!

The v3 infrastructure will most likely include clickhouse as it is a very solid yet OSS OLAP db that works well at high throughput and size yet has support for upserts which are required in some scenarios.

we are currently aiming for these 4 deployment options:

data stores (postgres, redis, clickhouse)	docker compose (vm)	helm chart (k8s)
dockerized	1, low volume	2, low-mid volume
external, connection provided via env	3, low-mid volume	4, high volume

We'd encourage deployments with external data stores (3&4) for improved scalability and less ops overhead (no additional storage attached to containers) as there are managed services for all 3 data stores, postgres/redis offered by all major cloud vendors, and clickhouse e.g. via Clickhouse Cloud. At low volume, deployment via docker should work as well and we cross-checked this with many teams who run low-volume clickhouse deployments in docker.

The only alternative would be to maintain an additional postgres-only flavor of Langfuse. However, we do not want to go this route as this would slow down development a lot and would make the two versions drift in performance. Context: many analytics use cases right now are blocked by this change as the postgres instance is limited by IOPS for many teams who deploy langfuse at scale.

verdverm Jun 11, 2024

I'd love to see this integrate with my existing log/mon stack (grafana lgtm, but there are others)

Having to fuse traces & logs across them is not ideal

mrout94 · 2024-06-13T07:59:00Z

mrout94
Jun 13, 2024

Will there be a way to migrate existing traces, prompts and datasets from v2 to v3?

We have massive internal success and adoption rate for this and there are a lot of data, prompts and traces stored in the containers. We are really excited about v3 but also do not want to redo these steps again.

Wishing the entire team good luck for the v3 release. Looking forward to this.

1 reply

marcklingen Jun 13, 2024
Maintainer

There will be a migration guide and script that will bring the migration effort down to a minimum. We are very much aware that you and many other teams depend on Langfuse for their workflows.

snikch · 2024-06-13T20:12:36Z

snikch
Jun 13, 2024

Hey team. Thanks for the updates in the newsletter.

I agree on the general comments that docker compose isn't a great production solution, however I generally see docker compose as a great way to document the required setup for self hosters. It's simple to grok the required services and interactions / configuration.

We will continue to use a k8s deployment, and are happy to integrate into our existing Redis and Clickhouse services / spin up new instances. Bringing up Clickhouse and Redis shouldn't be considered a large operational hurdle IMO for people wanting to self host. With the reality of services becoming more complex over time, and features / scale are added, it is a reasonable expectation (Posthog is a good example).

As long as the migration path is documented and clear this all sounds great! 👏

1 reply

marcklingen Jun 13, 2024
Maintainer

Thanks for sharing and your trust in this general path! We’ll follow up with drafts of these docs in this thread when we have a stable end-to-end deployment running across all features in order to get feedback and help folks test v3 ahead of a stable release.

We talked a lot about this and are convinced that this change is necessary to serve all teams that want to scale with Langfuse. If your product is doing well and Langfuse breaks as the infra was kept as simple as possible, that’s not great. Agree that Posthog is a great example as some other OSS analytics projects deliberately run on Postgres and thus can’t keep up with Posthog from my point of view.

marcklingen · 2024-06-27T14:46:10Z

marcklingen
Jun 27, 2024
Maintainer

Update to the above: we plan to release v3 in July (no strict ETA yet) as we are currently going though many optimization steps to make the new setup as performant as possible. We will post an update here once there is documentation and a pre-release version to try.

4 replies

4t8dd Jun 28, 2024

Looking forward to the migration to clickhouse.

marcklingen Jun 28, 2024
Maintainer

me too!

nonlocalStream Aug 12, 2024

Excited for this! any updated timelines?

marcklingen Aug 12, 2024
Maintainer

currently running parts of this in beta on Langfuse Cloud. No strict ETA yet, will release once it is stable and a simple one-way migration for everyone

pamelafox · 2024-07-01T18:46:26Z

pamelafox
Jul 1, 2024

I've received an inquiry as to how we'd be able to deploy your changes to Azure:
Azure-Samples/langfuse-on-azure#14

We can set up a Redis connection to Azure Cache for Redis. I'm not familiar with Clickhouse. Do you already have an idea about how we could get that working on Azure?

3 replies

marcklingen Jul 1, 2024
Maintainer

The architecture is not yet finalized but I'd suggest to deploy as mentioned here via either fully-managed services (as the one for Redis) or dockerized versions (e.g. Clickhouse): https://github.com/orgs/langfuse/discussions/1902#discussioncomment-9729809

This should work on Azure as will work on other cloud providers as well and we will share drafts of the docs on this ahead of the v3 release here. Would love to chat about potential adaptations to the Azure example once this is finalized.

JungeAlexander Nov 21, 2024

@marcklingen and @pamelafox: now that v3 is being finalized, would it make sense for the two of you to connect? Pamela expressed some concerns on the hosting getting a lot harder on Azure here (which resonate with me) so it would be great to clarify for the sake of continued community adoption of v3.

Steffen911 Nov 21, 2024
Maintainer

Hey @JungeAlexander,

Thank you for the follow-up. I see that the main concerns in the linked ticket are the S3 and the ClickHouse requirement.

Regarding S3: Langfuse supports any S3-API compatible blob storage (https://langfuse.com/docs/deployment/v3/components/blobstorage) and we are aware that Azure Blob Storage doesn't fall within that group. We are working on native ABS support within Langfuse and intend to have that ready with the V3 release or shortly after. There are options like https://github.com/gaul/s3proxy to wrap a S3-compatible interface around ABS, but we understand that this wouldn't be a nice self-hosting experience. One could also use a MinIO container to get an S3 compatible interface hosted on Azure.

Regarding ClickHouse: There are multiple options to address this requirement. For a small development stack the clickhouse-server docker image might be a good fit. As long as you assign a persistent volume, it should serve the use-case perfectly fine. Alternatively, there is a ClickHouse helm chart to get it running on any Kubernetes cluster and a managed ClickHouse Cloud service which can be deployed on Azure. The latter is also available through the Azure Marketplace. We collected more information on managing ClickHouse for Langfuse in our documentation.

Please let me know if you have further questions on either component.

cc @pamelafox

indranilr · 2024-07-01T19:02:07Z

indranilr
Jul 1, 2024

Would one then need to use managed services from AWS for Postgres, Redis and Clickhouse if hosting Langfuse as ECS containers ?

3 replies

marcklingen Jul 1, 2024
Maintainer

You will not need to use managed services. For low-mid volume deployments, you cam use dockerized versions of Redis, Postgres and Clickhouse or run them all on a single VM via docker compose. See this comment for more details: https://github.com/orgs/langfuse/discussions/1902#discussioncomment-9729809

indranilr Jul 1, 2024

Any specific benchmarks/guidance around s to what is the criteria (numbers range) to classify as low, mid and high volume (I assuming this is Query TPS) ?

marcklingen Jul 1, 2024
Maintainer

Not yet, we aim to provide guidance together with the v3 migration docs.

Core metrics:

TPS of API
Overall ingestion volume and order of magnitude of objects stored -> relevant for read-heavy APIs, can be approximated by #(Traces/Observations/Scores)/Timeframe

luciowu · 2024-07-16T06:11:19Z

luciowu
Jul 16, 2024

Happy to see Langfuse is keep evolving!
We deployed Langfuse in k8s cluster in GCP. And in order to meet our OLAP requirements, we build our own ETL pipeline (PG -> BigQuery) to extract the data to BigQuery.
Hope to make Clickhouse as a plugin so that we could keep our current deployment architecture without change.

0 replies

clemra · 2024-07-22T10:01:29Z

clemra
Jul 22, 2024
Maintainer

We've just updated the initial post with more information and guidance on the envisaged changes. Here attached in PDF format.
Langfuse_V3_Architecture.pdf

0 replies

abdullah-retorio · 2024-07-27T23:02:42Z

abdullah-retorio
Jul 27, 2024

Hello, AFAIK Redis is not open-source anymore. Does this mean we need a prod-licence of Redis to self-host on our VMs or can we still use their docker containers freely?

1 reply

clemra Jul 29, 2024
Maintainer

Hi @abdullah-retorio -- great point.

We've pinned the Redis version to ensure it's open source. We will swap it out with Valkey soon.

zxkxyz · 2024-08-25T18:51:17Z

zxkxyz
Aug 25, 2024

Any update on the timeline for V3 release? We're looking to deploy self-hosted Langfuse but are hesitant to proceed with V2 given V3 is right around the corner...

1 reply

marcklingen Aug 25, 2024
Maintainer

Thanks for asking, currently we run large pieces of the v3 infrastructure on our cloud service already to test the setup and provide sufficient guidance once rolling it out to everyone self-hosting with v3. I'd recommend to get started with v2 today as the transition will be simple and might take another 1-2 months on our side. I have updated this posted accordingly.

Freezaa9 · 2024-09-25T15:40:49Z

Freezaa9
Sep 25, 2024

Hello,
So the low volume langfuse V3 we just be a container upgrade in a case of an existing deployment on GCP using Cloud run + Postgres. No additional Redis or clickhouse needed ?

If needed can we keep:
Langfuse in cloud run
Postgres in GCP Cloud SQL:

and add click house and redis on 2 separate cloud run container

In large organization, it is a pain (sometime not even possible) to deploy new service like that, and we don't need high volume yet. Some organizations don't have dedicated teams to manage a K8S and all new services need to pass by multiple committee (security ...). And VM are not authorized because it is hard to managed and serverless solution are prefered.

Thanks in advance !

1 reply

marcklingen Sep 25, 2024
Maintainer

Thanks for sharing your thoughts!

We aim to facilitate a straightforward low-volume deployment of Langfuse. Currently, we assume this will be based on Docker Compose, which may not be suitable for you since it cannot run serverless. We will do more research on potential options, considering these constraints. However, we prefer not to have multiple distributions of the applications targeting different datastores, as this would slow down the project. We want to stay focused on executing the overall roadmap as the market and requirements evolve quickly.

chai3 · 2024-10-01T16:41:27Z

chai3
Oct 1, 2024

We will support security updates for V2 until at least end of the year 2024.

Please extend V2 security updates. For example, a few months after V3 GA.

0 replies

timoschd · 2024-10-03T13:48:26Z

timoschd
Oct 3, 2024

Hi, will caching of llm outputs like Helicone offers be included in v3 or is it in any way planned ? Your planned architecture is going to look quite similar to theirs.

3 replies

marcklingen Oct 4, 2024
Maintainer

As Langfuse is focused on async observability that is not in the critical path and does not add latency, caching is out of scope for now. I know some teams who use LiteLLM for this alongside Langfuse, especially for CI runs who benefit the most from caching.

timoschd Oct 7, 2024

Thanks for your reply and hinting at LiteLLM. The path forward for Langfuse is now clearer for me.

marcklingen Oct 8, 2024
Maintainer

Yes, highly recommended if you like the proxy approach to extend llm tracing/logging to many projects within your org. Usually projects then shift to async logging via the langfuse integrations over time for more in-depth tracing capabilities

ninjaa · 2024-10-05T16:47:55Z

ninjaa
Oct 5, 2024

Hi, I work with a leading US enterprise AI consultancy.

We love Langfuse V2 because:

Data model really matches Metrics, Logging, Tracing & Evals excellently
Open source containerized approach means we don't need to go through complex procurement workflows when presenting PoCs and prototyping
Langfuse UX is really strong given the early stage - works for us for now
Any data we can't get out of the frontend we feel comfortable fetching from Postgres directly using SQL as a last resort
docker compose has us up and running locally and in stage on a VM in minutes. we were able to do an initial deploy on prod this way too, but expect to have to migrate to k8s.

Regarding the upgrade path, we have the following concerns:

Development environment: We run Langfuse locally during development. Currently V2 only requires PostgreSQL and Langfuse which is ideal in terms of resource usage while developing. An analysis of the memory footprint and processing needs of V2 vs V3 would be very helpful. Rapid local development is a really important when prototyping AI apps.
Relative unfamiliarity with ClickHouse vs Postgres: ClickHouse seems to offer an SQL layer for querying and a JDBC endpoint for conecting to it, which should make data access straightforward with standard DB browsers & from the cli. Not tried it yet, but hopefully is straightforward enough. An analysis of the two schemas V2 & V3 and what it would take to get data out of Clickhouse would be useful.

Generally we support the evolution of Langfuse in the direction mentioned. It makes sense to add Redis and ClickHouse for latency and improved reporting and petabyte analytics scale. We are very excited to see what you guys have cooked up, but do keep our requirements in mind. We are really pleased with the small and nimble footprint of V2 for now and haven't run into any scaling issues yet.

Keep up the great work. Excited to be part of your user community!

1 reply

marcklingen Oct 8, 2024
Maintainer

Thanks for taking the time to share this!

Development environment: We run Langfuse locally during development. Currently V2 only requires PostgreSQL and Langfuse which is ideal in terms of resource usage while developing. An analysis of the memory footprint and processing needs of V2 vs V3 would be very helpful. Rapid local development is a really important when prototyping AI apps.

Agree! Docker compose is super important for quick POCs and in environments where a production/staging deployments needs too much time due to restrictions within an organization. We will include guidance on this in the v3 documentation.

Relative unfamiliarity with ClickHouse vs Postgres: ClickHouse seems to offer an SQL layer for querying and a JDBC endpoint for conecting to it, which should make data access straightforward with standard DB browsers & from the cli. Not tried it yet, but hopefully is straightforward enough. An analysis of the two schemas V2 & V3 and what it would take to get data out of Clickhouse would be useful.

Clickhouse should be relatively easy to query from any db viewer/interface and the core db-schema stays relatively similar to the current one (which is close to open telemetry). I currently use tableplus to run tests on clickhouse and it works flawlessly.

Generally we support the evolution of Langfuse in the direction mentioned. It makes sense to add Redis and ClickHouse for latency and improved reporting and petabyte analytics scale. We are very excited to see what you guys have cooked up, but do keep our requirements in mind. We are really pleased with the small and nimble footprint of V2 for now and haven't run into any scaling issues yet.

Thank you! Both being easy to get started and the right solution when projects scale is important to us. Currently Langfuse is doing very well on the former and v3 will solve for the latter.

secsilm · 2024-10-28T08:12:29Z

secsilm
Oct 28, 2024

Any update on the timeline for V3 release?

4 replies

marcklingen Oct 28, 2024
Maintainer

thanks for the ping on this, we currently plan with an initial release in late November.

barseghyanartur Nov 7, 2024

thanks for the ping on this, we currently plan with an initial release in late November.

With that in mind, for how long will you still support LangFuse 2?

aazbeltran Nov 7, 2024

According with the FAQ in this same discussion:

How long will there be security updates for v2? We will support security updates for V2 until at least end of the year 2024. If you require extended support, please reach out to us.

Please note that this was added to the discussion on July 25. Since then, there have been more modifications in the launch planning, and with two months left in the year, I don't know if this EOS will be moved.

marcklingen Nov 7, 2024
Maintainer

Thanks for the ping on this, updated the FAQ. We plan to offer security updates for v2 until end of Q1 of 2025 given the current release plan.

Steffen911 · 2024-11-15T15:21:22Z

Steffen911
Nov 15, 2024
Maintainer

Hey all,

We're reaching the final stages of v3 and run most of Langfuse Cloud on the new stack. Today, I want to share an update on the timelines for our self-hosters and our preliminary documentation. We are seeing big benefits for large-scale deployments and are excited to get all those into your hands as soon as possible.

Thank you everyone in this thread for all the questions and feedback we've already received.

Timeline and Scope

We intend to finalize the v3.0.0 release by the end of November and support docker compose and Kubernetes deployments via Helm from day one. Further deployment guides and options will follow soon after. You can preview the steps to deploy Langfuse v3 at https://langfuse.com/docs/deployment/v3. Keep in mind that Langfuse v3 is not production ready and the configuration and documentation can and will change without notice.

Migration

For existing Langfuse v2 self-hosters, we offer a migration guide at https://langfuse.com/docs/deployment/v3/migrate-v2-to-v3. For most users, the migration will entail the provisioning of new infrastructure components (Clickhouse, Redis, S3/Blob Store) and an additional Langfuse worker container. We created a Background Migration mechanism that should cover all data transitions from Postgres to Clickhouse transparently, i.e. we don't expect the migration to involve any manual steps on the data level.

Request for Feedback

At this point, we welcome any feedback on the documentation, the deployment guides, and the docker compose and helm chart configurations. Please create issues on GitHub or reply within this discussion for any feedback on those.

We are also looking for early adopters that want to give the new deployment options a go. If you are interested in deploying Langfuse v3 in an experimental setup, please give it a go and let us know how it went.

If you are interested in building the Langfuse v3 Helm Chart with us, you can reach out directly to me at steffen[at]langfuse.com or reply to this thread.

Best,
Steffen

18 replies

Steffen911 Nov 22, 2024
Maintainer

@AlexXi19 No, all traces, observations, and scores will be read from ClickHouse. So those views will have the same performance benefits as the dashboard.

If you're interested in trying V3 for yourself, we have our preview online: https://langfuse.com/docs/deployment/v3/overview. I'm happy to assist with the setup in case you have any questions around it.

AlexXi19 Nov 22, 2024

Awesome, thanks for the info 🙏 we're in the process of getting it up and running.

sapountzis Nov 25, 2024

Thank you for the detailed updates and documentation on v3. I have a question regarding deployment options, particularly for serverless environments.

Would it be possible to run the web container and worker container as separate serverless applications? Are there any anticipated challenges or limitations with such a setup?

Looking forward to your insights!

Steffen911 Nov 26, 2024
Maintainer

Hey @sapountzis,
Could you clarify what you mean with separate serverless applications?

Internally, we run both containers as separate services in AWS ECS on Fargate, i.e. in a fully serverless configuration. That works well as the web and worker container only communicate via our storage layers (S3, Redis, Postgres, ClickHouse) - there is no HTTP interface between them. Hence, as long as they both have access to the storage layers any deployment mechanism would work.

If you want to run them on serverless lambda style function (AWS Lambda, GCP Cloud Functions, ...) this could become more challenging. I expect that the web container would work almost straight away via an HTTP trigger, but we haven't tested whether you can trigger lambda style functions via bullmq which is our message queue implementation on Redis. If you find a way to do the latter, even a scale-to-zero, cloud function approach should work for the web and the worker container.

Does this address your question?

sapountzis Nov 26, 2024

Thank you for the detailed explanation. That’s exactly what I was looking to clarify. There was some confusion from others suggesting that a VM was required, but your response makes it clear that as long as managed services are used for Redis, Clickhouse, Postgres, and S3, the worker and web containers can indeed be separated and deployed on serverless compute.

I appreciate the insight about the potential challenges with using a scale-to-zero cloud function approach for the worker container due to the bullmq implementation. This gives a much clearer understanding of the flexibility and constraints.

Upcoming architecture changes for Langfuse 3.0 (self-hosted) #1902

maxdeichmann Apr 29, 2024 Maintainer

UPDATE JULY 22nd

Architectures v2 vs. v3

Upgrade path from v2 to v3

Application deployment

DB deployment

FAQ

Timeline

Replies: 24 comments · 66 replies

marcklingen May 28, 2024 Maintainer

marcklingen May 28, 2024 Maintainer

marcklingen Apr 30, 2024 Maintainer

marcklingen Apr 30, 2024 Maintainer

marcklingen Jul 15, 2024 Maintainer

marcklingen May 12, 2024 Maintainer

marcklingen Jul 2, 2024 Maintainer

marcklingen May 30, 2024 Maintainer

marcklingen Jun 10, 2024 Maintainer

marcklingen Jun 13, 2024 Maintainer

marcklingen Jun 13, 2024 Maintainer

marcklingen Jun 27, 2024 Maintainer

marcklingen Jun 28, 2024 Maintainer

marcklingen Aug 12, 2024 Maintainer

marcklingen Jul 1, 2024 Maintainer

Steffen911 Nov 21, 2024 Maintainer

marcklingen Jul 1, 2024 Maintainer

marcklingen Jul 1, 2024 Maintainer

clemra Jul 22, 2024 Maintainer

clemra Jul 29, 2024 Maintainer

marcklingen Aug 25, 2024 Maintainer

maxdeichmann
Apr 29, 2024
Maintainer

Replies: 24 comments 66 replies

marcklingen May 28, 2024
Maintainer

marcklingen May 28, 2024
Maintainer

marcklingen Apr 30, 2024
Maintainer

marcklingen Apr 30, 2024
Maintainer

marcklingen Jul 15, 2024
Maintainer

marcklingen May 12, 2024
Maintainer

marcklingen Jul 2, 2024
Maintainer

marcklingen May 30, 2024
Maintainer

marcklingen Jun 10, 2024
Maintainer

marcklingen Jun 13, 2024
Maintainer

marcklingen Jun 13, 2024
Maintainer

marcklingen
Jun 27, 2024
Maintainer

marcklingen Jun 28, 2024
Maintainer

marcklingen Aug 12, 2024
Maintainer

marcklingen Jul 1, 2024
Maintainer

Steffen911 Nov 21, 2024
Maintainer

marcklingen Jul 1, 2024
Maintainer

marcklingen Jul 1, 2024
Maintainer

clemra
Jul 22, 2024
Maintainer

clemra Jul 29, 2024
Maintainer

marcklingen Aug 25, 2024
Maintainer