I gave this talk at RailsConf 2020 Couch Edition and this blog article contains the source code and step by step tutorial. The talk and tutorial is inspired by my work on influxdb-rails and contains several patterns and learnings from it.
By the end of this tutorial, we will have a very basic Performance Monitoring tool which measures the response and SQL query time of a Rails application.
Building a Performance Analytics Tool with ActiveSupport
Setting up a performance analytics tool like NewRelic or Skylight is one of the first things many developers do in a new project. However, have you ever wondered how these tools work under the hood?
In this talk, we will build a basic performance analytics tool from scratch. We will deep dive into ActiveSupport instrumentations to collect, group and normalise metrics from your app. To persist these metrics, we will use a time series database and visualise the data on a dashboard. By the end of the talk, you will know how your favourite performance analytics tool works.
As a first step we need to implement a notification framework which we can use to instrument and subscribe to performance metrics.
A basic example of the functionality of our notification framework would be this code:
scenario "It can susbcribe and instrument events" do
events = []
ActiveMonitoring::Notifications.subscribe("test_event") do |name, start, finish, id, payload|
events << { name: name, start: start, finish: finish, id: id, payload: payload }
end
ActiveMonitoring::Notifications.instrument("test_event", payload: :payload) do
1 + 1
end
expect(metrics[0].name).to eq("test_event")
expect(metrics[0].payload).to include(payload: :payload)
end
The implementation can be found in commit 32c3. For the sake of simplicity, this framework is not thread safe of course.
This framework is inspired by ActiveSupport Notifications which already provides various hooks into Rails internals. The implementatition of Active Support Notifications can be found here.
ActiveSupport Notifications provides already several hooks to subscribe to e.g. processing of controller actions, SQL queries or view rendering. However, Rails does not provide hooks for external libraries like Redis or Memcache. In ef20 and 46c3 we monkey patch Rails to subscribe to controller actions and SQL queries.
In chapter two we discussed an appropriate storage engine for our data. Our data would look for example like this
Timestamp | Hook | Location | Request ID | Value |
---|---|---|---|---|
01.04.2020 00:31:30 CET | process_action | BooksController#show | cf6d6dcd | 100 ms |
01.04.2020 00:32:30 CET | sql | BooksController#show | c7c7c841 | 80 ms |
This data looks tabular, so using a relational database to store it would make sense. However, there are several downsides to this approach.
Relational databases perform very well with applications which map to the basic CRUD operations of the database like a Rails controller.
class BooksController < ApplicationController
def new; end
def create; end
def show; end
def edit; end
def update; end
def destroy; end
end
However, our plugin does not really use all of these traditional database operations or in a different way we would in a normal Rails app.
For each request, we will write one process action and several SQL query data points. If we extend this application to also cover view render times or background jobs, it's not unlikely we would write 10+ metrics per request. Depending of how many requests your app does process, we could easily create millions of metrics per year.
We will read the metrics in a dashboard for a certain time range like today or last three hours but not randomly.
We will almost never go back to update on of the metrics.
We will only delete metrics in batches but not single metrics. For instance, we are only interested in metrics for the last three months so once a month we can do a 'garbage collection' and delete metrics older than three months.
With this all said, a relational database might not be the best storage engine for this kind of data as
- We will heavily write to the database which would cause to frequently create new pages in a B-Tree like most relational databases use internally
- We would need to implement some sort of garbage collection to remove old data
- Compression of data would not be very efficient
For this kind of data, a log based storage engine like most Time Series databases implement it would be more efficient.
However, for the sake of simplicity, we store our metrics in a relational database in this tutorial, the implementation can be found in 5967.
In the previous chapter we already stored the controller metrics. Before we can store SQL metrics, we need to do some more additional work.
ActiveRecord does a lot of 'magic' behind the scenes to abstract persisting objects to the database. Some of this 'magic' requires to execute additional database queries, for instance
- Current version of the database engine
- Which migrations did already run
- Which environment
We're only interested to store application metrics so we need to filter these queries out. The implementation can be found in 156d.
Depending on the database adapter you use, the queries might contain values. To group the same queries together, we need to normalize these queries.
SELECT * FROM books WHERE id = 1;
SELECT * FROM books WHERE id = 2;
For our performance monitoring tool, these two queries should be treated the same and we need to normalize them to
SELECT * FROM books WHERE id = xxx;
In 66a8 we implement a simple query normalizer.
ActiveRecord is a standalone framework, therefore the payload of the SQL event does not contain a request_id
.
We can use ActiveRecord for instance in migrations or background jobs so it is perfectly valid to use it outside of a request response cycle.
However, in our Performance Monitoring tool we would like to group requests and SQL queries together so we can see if a query causes a slow response.
Luckily, we also implemented a start_processing
event in ef20.
We can now subscribe to this event and set the request_id
and location
in a cache which we later read
in when writing the SQL metrics.
In b1f1 we implement a CurrentAttributes
class and eventually write the SQL metrics.
ActiveSupport ships with CurrentAttributes out of the box since Rails 5.
So we now have several metrics in our data base written which would look something like
Timestamp | Hook | Location | Request ID | Value |
---|---|---|---|---|
01.04.2020 00:31:30 CET | process_action | BooksController#show | cf6d6dcd | 100 ms |
01.04.2020 00:32:30 CET | sql | BooksController#show | c7c7c841 | 80 ms |
It would be very hard to spot now problems or discover pattern so eventually we need to visualize our collected data. In cfca we implement a dashboard to show percentiles and the slowest queries of our Rails app. A very good blog article about data visualization for performance metrics from Richard Schneemann can be found here.
In a real world application, I would strongly recommend to use a dashboard software like Grafana.
Congratulations, we implemented a very basic Performance Monitoring tool in just a few hundred lines of code now. We deep dived into ActiveSupport Notifications and hooked into Rails events to write request and SQL query metrics in our data storage. As data storage, we compared relational databases with time series databases like InfluxDB. Before we could visualize our data, we needed to clean, normalize and group the metrics.
From here, we can easily add more metrics like rendering of views, caching or background jobs.
As initially mentioned, this tutorial is heavily influenced by our work on https://github.com/influxdata/influxdb-rails. If this made you curious, we always look for new contributors.
- Profiling and Benchmarking 101 by Nate Berkopec
- Lies, Damned Lies, and Averages: Perc50, Perc95 explained for Programmers by Richard Schneeman
- https://github.com/influxdata/influxdb-ruby
- https://github.com/influxdata/influxdb-rails
- https://docs.influxdata.com/influxdb/v1.7/concepts/storage_engine/
The gem is available as open source under the terms of the MIT License.