-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We're changing database #408
Comments
Small update as I forgot to include this in the main issue: We previously supported direct connection to the database using the PostgreSQL wire protocol, meaning you could connect with psql, pgcli or pandas, but also with BI tools that "talked postgres" like tableau, google looker studio, metabase etc. (Side note: it wasn't actually a direct connection, but rather a pg wire protocol proxy we wrote which checked the query AST for functions we didn't want to call (like We've had to temporarily switch this off while we migrate to fusionfire. Instead we're allowing uses to query their data with SQL using an HTTP API (data can be returned as arrow IPC, JSON or CSV), see #405 — this should be available to use in the next few days. We aim to reimplement the PG wire protocol connections with fusionfire in a few months, the hardest bit will be getting the information schemas to exactly match postgres so the very complex schema introspection queries run by BI tools and pgcli work correctly. If you need this feature urgently, please let us know. |
Well, congratulations first of all! (Though I'd call it logfusion 🪵⚛️)
@MarcoGorelli should've worked on a lot of these features for Polars: I don't know if he can contribute, but he's a bit of Time(zone) lord. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Thanks for reporting @frankie567, I've moved that to #433. |
We've been fully switched to the new database for a while now. |
Rollout
We're gradually rolling out queries to the new database now. If you're affected, you'll see a banner like this:
If you notice queries taking longer or returning errors or different results, please let us know below or contact us via email or Slack.
If you need to continue querying the old database, you can do so by right-clicking on your profile picture in the top right and setting the query engine to 'TS' (Timescale, the old database):
To get rid of the warning banner, set the query engine to 'TS' and then back to 'FF' (FusionFire, the new database) again.
We will be increasing the percentage of users whose default query engine is FF over time and monitoring the impact. We may decrease it again if we notice problems. If you set a query engine explicitly to either TS or FF, this won't affect you. Otherwise, your query engine may switch back and forth. For most users, there shouldn't be a noticeable difference.
Most queries should be faster with FF, especially if they aggregate lots of data over a long time period. If your dashboards were timing out before with TS, try using FF. However some specific queries that are very fast with TS are slower with FF. In particular, TS can look up trace and span IDs almost instantly without needing a specific time range. If you click on a link to a trace/span ID in a table, it will open the live view with a time range of 30 days because it doesn't know any better. If this doesn't load, reduce the time range.
Summary
We're changing the database that stores observability data in the Logfire platform from Timescale to a custom database built on Apache Datafusion.
This should bring big improvements in performance, but will lead to some SQL compatibility issues initially (details below).
Background
Timescale is great, it can be really performant when you know the kind of queries you regularly run (so you can set up continuous aggregates) and when you can enable their compression features (which both save money and make queries faster).
Unfortunately we can't use either of those features:
Earlier this year, as the volume of data the Logfire platform received increased in the beta, these limitations became clearer and clearer.
The other more fundamental limitation of Timescale was their open/closed source business model.
The ideal data architecture for us (and any analytics database I guess) is separated storage and compute: data is stored in S3/GCS as parquet (or equivalent), with an external index used by the query/compute nodes. Timescale has this, but it's completely closed source. So we can either get a scaleable architecture but be forced to use their SAAS, or run Timescale as a traditional "coupled storage and compute" database ourselves.
For lots of companies either of those solutions would be satisfactory, but if Logfire scales as we hope it does, we'd be scuppered with either.
Datafusion
We settled on Datafusion as the foundation for our new database for a few reasons:
datafusion-functions-json
). Since starting to use datafusion, our team has contributed 20 or 30 pull requests to datafusion, and associated projects likearrow-rs
andsqlparser-rs
Transition
For the last couple of months we've been double-writing to Timescale and Fusionfire (our cringey internal name for the new datafusion-based database), working on improving reliability and performance of Fusionfire for all types of queries.
Fusionfire is now significantly (sometimes >10x) faster than timescale for most queries. There's a few low latency queries on very recent data which are still faster on timescale that we're working on improving.
Currently by default the live view, explore view, dashboards and alerts use timescale by default. You can try fusionfire now for everything except alerts by right clicking on your profile picture in the top right and selecting "FF" as the query engine.
In the next couple of weeks we'll migrate fully to Fusionfire and retire timescale.
We're working hard to make Fusionfire more compatible with PostgreSQL (see apache/datafusion-sqlparser-rs#1398, apache/datafusion-sqlparser-rs#1394, apache/datafusion-sqlparser-rs#1360, apache/arrow-rs#6211, apache/datafusion#11896, apache/datafusion#11876, apache/datafusion#11849, apache/datafusion#11321, apache/arrow-rs#6319, apache/arrow-rs#6208, apache/arrow-rs#6197, apache/arrow-rs#6082, apache/datafusion#11307), but there are still a few expressions which currently don't run correctly (a lot related to intervals):
generate_series('2024-08-28 00:00:00'::timestamptz, '2024-08-28 00:00:60'::timestamptz, INTERVAL '10 seconds')
3 * interval '10 seconds'
end_timestamp - interval '1 second' > start_timestamp
— will be fixed by FixINTERVAL
parsing to support expressions and units via dialect apache/datafusion-sqlparser-rs#1398extract(seconds from end_timestamp - start_timestamp)
— (second
without the trailings
works thanks to allowDateTimeField::Custom
withEXTRACT
in Postgres apache/datafusion-sqlparser-rs#1394)jsonb_array_elements
aren't available yetIf you notice any other issues, please let us know on this issue or a new issue, and we'll let you know how quickly we can fix it.
The text was updated successfully, but these errors were encountered: