-
-
Notifications
You must be signed in to change notification settings - Fork 431
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* 📝 first draft blog * 📝 article update * 📝 blog * 📝 improve blog post * 📝blog post improvment * 📝blog post improvment
- Loading branch information
1 parent
917979b
commit 99d3a10
Showing
5 changed files
with
117 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
title: "Building OpenStatus: A Deep Dive into Our Infrastructure Architecture" | ||
description: | ||
Let's deep dive in the infra behind OpenStatus. Learn how we built it and how we host it. | ||
author: | ||
name: Thibault Le Ouay Ducasse | ||
url: https://bsky.app/profile/thibaultleouay.dev | ||
publishedAt: 2024-12-29 | ||
tag: engineering | ||
image: /assets/posts/infra-openstatus/tech-infra.png | ||
--- | ||
|
||
## Infrastructure Overview | ||
|
||
OpenStatus is a synthetic monitoring platform designed with resilience, scalability, and efficiency in mind. | ||
Our users rely on us to provide real-time insights into their service health, making it essential to maintain a robust and performant infrastructure. | ||
|
||
In this post, we'll take a deep dive into our infrastructure architecture, exploring the key components, managed services, and design principles that power OpenStatus. | ||
|
||
## Application Landscape | ||
|
||
Our platform consists of several interconnected applications, each designed for a specific purpose: | ||
|
||
1. **Frontend Ecosystem**: | ||
- A NextJS application that powers our marketing site, user dashboard, and status page hosted on [Vercel](https://vercel.com/). | ||
- An Astro + Starlight-powered documentation application hosted on [Cloudflare Pages](https://pages.cloudflare.com/). | ||
|
||
We chose Vercel for the Next.js application because it performs exceptionally well there, the DX is great. And we selected Cloudflare Pages for the documentation since it is a static site and it's super cheap. | ||
|
||
2. **Backend Infrastructure** All our backend services are hosted on Fly.io. | ||
- API server: Our public API and our alerting engine | ||
- Probes/Checker: a golang app deployed globally to monitor your service | ||
- Screenshot app: a service that takes screenshot of your website when we detect an downtime (Playwright) | ||
- Workflow engine: a server that handles the workflow of alerting, and our internal workflows (email automation). | ||
|
||
We chose Fly.io for our backend services because it's a great platform for deploying globally distributed services. It's also very easy to deploy and manage. | ||
We are planning to add more providers (e.g. Koyeb) to our probes to have a more resilient system. | ||
|
||
<Image | ||
alt="Hosting providers" | ||
src="/assets/posts/infra-openstatus/hosting.png" | ||
width={650} | ||
height={575} | ||
/> | ||
|
||
|
||
## Managed Services | ||
|
||
We also rely heavily on managed services to avoid handling it ourselves. Here are the services we use: | ||
|
||
### Scheduling | ||
|
||
Recognizing the critical nature of monitoring, we've heavily rely on CRON to ensure timely checks: | ||
|
||
- **Cron Jobs**: Currently using Vercel Cron, with plans to migrate to Google Cron for an enhanced user experience (better UI e.g. we can see when the cron ran, retry policy). | ||
|
||
|
||
### Queue Architecture | ||
|
||
Due to the critical nature of checks, we are using a queue to handle task processing and retry logic: | ||
|
||
Every check is pushed to a queue and processed by our probes. If the probe fails to process the check, it is retried 3 times before being marked as failed. | ||
|
||
- **Job Queue**: Google Task Queues provide our distributed task management, with strategically segmented queues for different check frequencies | ||
|
||
We've implemented a granular queue system to ensure efficient task processing, each queue is dedicated to a specific check frequency (e.g. every minute, every 10 minutes). | ||
|
||
|
||
<Image | ||
alt="Queue providers" | ||
src="/assets/posts/infra-openstatus/queue.png" | ||
width={650} | ||
height={575} | ||
/> | ||
|
||
|
||
### Data Infrastructure | ||
|
||
We also don't want to handle the data infrastructure by ourselves. We rely on managed services for that: | ||
|
||
- **Primary Database**: [Turso](https://turso.tech?ref=openstatus.dev), providing a cost efficient data storage solution. We love the fact that's it's hosted SQLite database. It's just a file we can embedded in our services and sync it periodically. | ||
- **Analytics Database**: [Tinybird](https://www.tinybird.co?ref=openstatus.dev), enabling complex analytical queries and insights. | ||
|
||
## Design Philosophy | ||
|
||
Our infrastructure design is driven by several key principles: | ||
|
||
- **Resilience**: Ensuring high availability and fault tolerance | ||
- **Scalability**: Architectural choices that allow seamless growth | ||
- **Cost-Efficiency**: Leveraging managed services and cloud credits | ||
- **Performance**: Optimizing each component for maximum efficiency. | ||
|
||
|
||
## How much does it cost us? | ||
|
||
Our current monthly cost is around $328. This includes: | ||
|
||
- Vercel: $40, we are two members in the team, so we had to upgrade to the team plan. | ||
- Fly.io: $154 36*4 (all our probes at $4 average, not all regions cost the same) + $10 (for the api server) | ||
- Google Cloud Platform: $0 (We are still using the free credits, but we expect to pay around $50 for the queue) | ||
- Tinybird: $100 | ||
- Turso: $29 | ||
- Cloudfare: $5 | ||
|
||
|
||
|
||
# Conclusion | ||
|
||
Building a resilient synthetic monitoring platform is hard. It's not just a $5 VPS that you can deploy and forget. It requires a more complex infrastructure to be able to provide a reliable service. | ||
|
||
The drawback of this approach is the complexity of providing an easy self hostable services. Which is annoying because we are an open-source project and we want to provide a self-hostable version of OpenStatus. But we are working on community edition that will be easier to deploy. | ||
|
||
*Want to start monitoring your services with OpenStatus? [Sign up for free](https://www.openstatus.dev/app/login?ref=blogpost-infra) and get started today!* |