Know if you are running high-performing software development teams by making it clear how individual products (i.e. services, APIs, systems...) line up with the DORA metrics.
Dorametrix is a Node.js-based service that helps you calculate your DORA metrics, by inferring your metrics from events you can create manually or with webhooks. It has pre-baked support for push and incident events from GitHub Actions, Bitbucket and Jira (only incidents!), and can be run "as is" as a web service, too.
🏖️ It's super easy to get started with, comes pre-packaged and needs just a tiny bit of fiddling with webhooks and settings on your end: Simply put, the easiest way you can start using DORA metrics in a scalable way today!
✋ ℹ️ Credit where credit is due: This project is strongly influenced by Google Cloud's Four Keys
project. The Four Keys project is great but is, for obvious reasons, Google Cloud-oriented. It also uses a SQL database and ETL pattern which is less than ideal from a serverless perspective. In general, the approach and the structuring and nomenclature is the same here. Most interestingly, that Dorametrix is better decoupled from the specifics of any one CI tool, and uses DynamoDB (NoSQL) instead of BigQuery.
At its heart, Dorametrix is a serverless web service that collects and represents specific delivery-related events that you send to it, which are then stored in a database. As a user you can request these metrics, which are calculated from the same stored events.
For more exact information, see the section "What are the DORA metrics and how does Dorametrix calculate them?" further down.
The most basic data type we have is the Event. These are internally created from inputs to the service. For example, when you push a commit, an Event is added to the database. The event will contain, for example, information like the commit SHA, time of the commit, and similar. We keep the events indefinitely so we can have a complete record of all Dorametrix events that have occurred.
Dorametrix will also (on its own) add individual domain-specific records for (respectively) a Change, Deployment, or Incident, based on the incoming data. This is so we can easily follow up on those typologies and make the desired calculations.
- Changes essentially correspond to
push
-type lifecycle events, since these represent commits. - Deployments are added by you manually, as a separate activity at the end of a deployment pipe.
- Incidents are special, and more complex, since they both have to be raised and (later) be resolved. This can be done manually by calling Dorametrix but it's highly recommended (and much more practical) to let your issue tracker send such events by webhook based on actual user interactions with issues or tickets.
As it stands currently, Dorametrix is implemented in an AWS-oriented manner. This should be fairly easy to modify so it works with other cloud platforms and with other persistence technologies. If there is sufficient demand, I might add extended support. Or you do it! Just make a PR and I'll see how we can proceed.
Please see the generated documentation site for more detailed information.
- Recent Node.js (ideally 18+) installed.
- Amazon Web Services (AWS) account with sufficient permissions so that you can deploy infrastructure. A naive but simple policy would be full rights for CloudWatch, Lambda, API Gateway, DynamoDB, X-Ray, and S3.
- Ideally some experience with Serverless Framework as that's what we will use to deploy the service and infrastructure.
- You will need to deploy the stack prior to working with it locally as it uses actual infrastructure even in local mode.
Clone, fork, or download the repo as you normally would. Run npm install --force
.
The below commands are the most critical ones. See package.json
for more commands! Substitute npm
for yarn
or whatever floats your boat.
npm start
: Run Serverless Framework in offline modenpm test
: Run tests on the codebasenpm run deploy
: Deploy with Serverless Frameworknpm run build
: Package and build the code with Serverless Frameworknpm run teardown
: Removes the deployed stack
You can set certain values in serverless.yml
.
custom.config.accountNumber
: Your AWS account number.custom.config.apiKey
: The "API key" or authorization token you want to use to secure your service.
Note that all unit tests use a separate authorization token that you don't have to care about in regular use.
custom.config.maxDateRange
: This defaults to30
which is a reasonable default.custom.config.maxLifeInDays
: This defaults to90
but can be changed.custom.config.tableName
: This defaults todorametrix
but can be changed.
This is highly recommended but not strictly necessary, though it will become quite a hassle if you do not automate incident handling.
Webhooks are fired automatically from your chosen tool upon your selected events being fired in them. This makes the practical integration very easy to set up.
Create a webhook; see this guide if you need instructions.
Add your Dorametrix endpoint URL ("add event" path; default {API_URL}/event?authorization=API_KEY
), set the content type to application/json
and select the event types Issues
and Push
.
Create a webhook; see this guide if you need instructions.
Add your Dorametrix endpoint URL ("add event" path; default {API_URL}/event?authorization=API_KEY
) and select the event types Repository:Push
, Issue:Created
and Issue:Updated
.
Create a webhook; see this guide if you need instructions.
Add your Dorametrix endpoint URL ("add event" path; default {API_URL}/event?authorization=API_KEY
) and select the event types Issue:created
, Issue:updated
, Issue:deleted
.
Note: Because there is no mapping between the Jira project and a given repository you need to manually provide that context through Jira. Dorametrix will assume a custom field on any issue: The ideal option is the URL field type where the value must conform to either a GitHub or Bitbucket repository URL, i.e. https://github.com/SOMEORG/SOMEREPO
or https://bitbucket.org/SOMEORG/SOMEREPO/
format. The actual name and description of the field does not matter.
If there is a need to support additional Git hosts, then please raise an Issue and explain what the authorative URL format is for your tool.
Create a webhook; see this guide if you need instructions.
Add your Dorametrix endpoint URL ("add event" path; default {API_URL}/event?authorization=API_KEY
Create Shortcut API Token
Create an incident label:
- Creating Shortcut Labels
- The Shortcut Label Page shows the label id in the url
Note: You must pass the following options on your first deployment. Refer to Deployment or update the serverless.yaml
file options. Failing to set these options will result in a runtime error!
custom.config.shortcut.shortcutApiToken
: Shortcut API token to load story details.custom.config.shortcut.shortcutIncidentLabelId
: Label ID to identify incidents over changes.
custom.config.shortcut.shortcutRepoName
: Repository name to be used, defaults tounknown
.
The current version of Dorametrix does not have built-in support for GitHub webhook secrets, but if there is sufficient demand I might add such support.
Note that Bitbucket Cloud and Jira do not have support for webhook secrets: https://jira.atlassian.com/browse/BCLOUD-14683.
The approach used in Dorametrix is instead to make the best of the situation and require an authorization
query string parameter with a custom authorization token. This then gets verified by a Lambda Authorizer function.
All GET requests require that same token but in a more practical Authorization
header.
This approach adds a minimal security measure but is flexible enough to also work effortlessly with any integration tests you might want to run. At the end of the day an acceptable compromise solution, I hope.
Run npm start
.
Note that it will attempt to connect to a database, so deploy the application and infrastructure before any local development.
First make sure that you have a fallback value for your AWS account number in serverless.yml
, for example: awsAccountNumber: ${opt:awsAccountNumber, '123412341234'}
or that you set the deployment script to use the flag, for example npx sls deploy --awsAccountNumber 123412341234
.
If using Shortcut for ticket managmenet you must pass additional options. Refer to the Shortcut Configuration section.
--shortcutApiToken <guid>
--shortcutIncidentLabelId 1234
For example:
npx sls deploy --awsAccountNumber 123412341234 --shortcutApiToken 1232-1234-1234 --shortcutIncidentLabelId 1234
Then you can deploy with npm run deploy
.
Dorametrix uses mikrolog and mikrometric for logging and metrics respectively.
Logs will have a richly structured format and metrics for cached and uncached reads will be output to CloudWatch Logs (using Embedded Metrics Format, under the covers).
You can create deployments either manually, with the provided script, or use ready-made actions or pipes to abstract that part for you.
Download the deployment.sh
script in this repository. In your CI script, call the script at the end of your deployment, for example:
bash deployment.sh "$ENDPOINT" "$API_KEY" "$PRODUCT"
As seen above, the required inputs are:
- The endpoint
- The API key
- The product name
An example using two user-provided secrets for endpoint (DORAMETRIX_ENDPOINT
) and API key (DORAMETRIX_API_KEY
):
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Dorametrix
uses: mikaelvesavuori/dorametrix-action@v1.0.0
with:
endpoint: ${{ secrets.DORAMETRIX_ENDPOINT }}
api-key: ${{ secrets.DORAMETRIX_API_KEY }}
A full example is available at https://github.com/mikaelvesavuori/demo-dorametrix-action.
The specific action, mikaelvesavuori/dorametrix-action@v1.0.0
, is available for use.
An example using two user-provided secrets and setting the product with a known variable representing the repo name:
- step:
name: Dorametrix
script:
- pipe: docker://mikaelvesavuori/dorametrix-pipe:1.0.0
variables:
ENDPOINT: '$ENDPOINT'
API_KEY: '$API_KEY'
PRODUCT: '$BITBUCKET_REPO_SLUG'
A full example is available at https://github.com/mikaelvesavuori/demo-dorametrix-pipe.
The specific action, docker://mikaelvesavuori/dorametrix-pipe:1.0.0
, is available for use but is highly limited when it comes to pulling it (since it is hosted on a free plan). It's therefore highly recommended that you host or push your own image if you are within the Bitbucket + Docker Hub infrastructure!
See below for the tool-specific conventions.
- Open incident: Create a GitHub Issue with an
incident
label. - Close incident: Close the Issue or unlabel the
incident
label.
- Open incident: Create a Bitbucket Issue with a
bug
label. - Close incident: Close the Issue or unlabel the
bug
label.
- Open incident: Create a Jira Issue with an
incident
label. - Close incident: Close the Issue or unlabel the
incident
label.
Shortcut does not have a specific incident label and can be any label
- Open incident: Create a Story or label with an
incident
label. - Close incident: Close a Story or unlabel the
incident
label.
Quotes from a blog post on Google Cloud.
The period that is taken into account is the one provided in the serverless.yml
configuration, under custom.config.maxDateRange
. It's set to 30 (days) by default.
How often an organization successfully releases to production.
- Send a
CHANGE
event when starting up the CI build. You can do this with either a direct call to the Dorametrix API endpoint, or by using a webhook ("push" event or similar) in GitHub, Bitbucket, or Jira. - Send a
DEPLOYMENT
event after pushing code to your production environment. You can do this with either a direct call to the Dorametrix API endpoint, or by using a convenience GitHub action or Bitbucket pipe for your CI. See the GitHub action demo or Bitbucket pipe demo for more information.
{number of deployments in period} / {number of incidents in period}
is the standard (i.e. number of deployments in the last week).
The amount of time it takes a commit to get into production.
Same as deployment frequency (see above).
The solution used in Dorametrix is based on calculating the difference between the earliest commit timestamp in a change batch (that lead to a deployment) with the timestamp of the actual deployment.
{accumulated time of every first commit for each deployment in period} / {number of deployments in period}
The percentage of deployments causing a failure in production.
Send an INCIDENT
event. You can do this with either a direct call to the Dorametrix API endpoint, or by using a webhook ("opened"/"closed"/"labeled"/"unlabeled" event or similar) in GitHub, Bitbucket, or Jira. The conventions are listed above, in the "Configuration" section.
You could certainly look into other types of integrations and automations, for example by connecting a failing function with calling Dorametrix (or connecting Cloudwatch to it).
Some thinkers in the DORA metrics space will say that we need to understand whether a deployment was successful or failed. In Dorametrix this is seen as unimportant and an over-complication of matters. Instead, all deployments are simply... deployments.
{incident count} / {deployment count}
.
How long it takes an organization to recover from a failure in production.
Depends on the above collection of incidents (change failure rate).
This is very straight-forward, just add the total time of all incidents (from start to being resolved). Unresolved tasks will obviously continue to add up.
{accumulated time of all incidents in period} / {incident count in period}
.
Dorametrix does not collect, store, or process any details on a given individual and their work. All data is strictly anonymous and aggregated. You should feel entirely confident that nothing invasive is happening with the data handled with Dorametrix.
To keep the volume of data manageable, version 2.1.0
introduces a maxLifeInDays
setting. It defaults to 90
days, after which DynamoDB will remove the record after the given period + 1 day. You can set the value to any other value, as needed.
While you can get a range of dates, you can't get more exact responses than a full day.
The most recent date you can get metrics for is the day prior, i.e. "yesterday". The reason for this is partly because it makes no real sense to get incomplete datasets, as well as because Dorametrix caches all data requests. Caching a dataset with incomplete data would not be very good.
Dorametrix uses UTC/GMT/Zulu time.
Timestamps are set internally in Dorametrix and generated based on the UTC/GMT/Zulu time.
This should be fine for most circumstances but will possibly be inaccurate if you have teams that are very widely distributed, in which case certain events may be posted to the wrong date.
To cater for more precise queries, you can use the offset
parameter with values between -12
and 12
(default is 0
) to adjust for a particular time zone.
On any given metrics retrieval request, Dorametrix will behave in one of two ways:
- Cached filled: Return the cached content.
- Cache empty: Query > Store response in cache > Return response.
Caching is always done for a range of dates. All subsequent lookups will use the cached data only if the exact same "from" and "to" date ranges are cached.
All of the below demonstrates "directly calling" the API; since webhook events from GitHub, Bitbucket and Jira have other (varying shapes) they are out-of-scope for the example calls.
POST {{BASE_URL}}/event?authorization=API_KEY
{
"eventType": "change",
"product": "demo"
}
204 No Content
POST {{BASE_URL}}/event?authorization=API_KEY
{
"eventType": "deployment",
"product": "demo",
"changes": [
{
"id": "356a192b7913b04c54574d18c28d46e6395428ab",
"timeCreated": "1642879177"
},
{
"id": "da4b9237bacccdf19c0760cab7aec4a8359010b0",
"timeCreated": "1642874964"
},
{
"id": "77de68daecd823babbb58edb1c8e14d7106e83bb",
"timeCreated": "1642873353"
}
]
}
204 No Content
POST {{BASE_URL}}/event?authorization=API_KEY
{
"eventType": "incident",
"product": "demo"
}
204 No Content
Remember to pass your authorization token in the Authorization
header for any GET requests!
GET {{BASE_URL}}/metrics?repo=SOMEORG/SOMEREPO&last=30
{
"repo": "SOMEORG/SOMEREPO",
"period": {
"from": "20230101",
"to": "20230110",
"offset": 0
},
"total": {
"changesCount": 3,
"deploymentCount": 1,
"incidentCount": 0
},
"metrics": {
"changeFailureRate": "0.00",
"deploymentFrequency": "0.20",
"leadTimeForChanges": "00:00:04:04",
"timeToRestoreServices": "00:00:00:00"
}
}
GET {{BASE_URL}}/metrics?repo=SOMEORG/SOMEREPO&from=20230101&to=20230110
{
"repo": "SOMEORG/SOMEREPO",
"period": {
"from": "20230101",
"to": "20230110",
"offset": 0
},
"total": {
"changesCount": 3,
"deploymentCount": 1,
"incidentCount": 0
},
"metrics": {
"changeFailureRate": "0.00",
"deploymentFrequency": "0.10",
"leadTimeForChanges": "00:00:04:04",
"timeToRestoreServices": "00:00:00:00"
}
}
Note that this works precisely the same for last
usage.
GET {{BASE_URL}}/metrics?repo=SOMEORG/SOMEREPO&from=20230101&to=20230110&offset=-5
{
"repo": "SOMEORG/SOMEREPO",
"period": {
"from": "20230101",
"to": "20230110",
"offset": -5
},
"total": {
"changesCount": 3,
"deploymentCount": 1,
"incidentCount": 0
},
"metrics": {
"changeFailureRate": "0.00",
"deploymentFrequency": "0.10",
"leadTimeForChanges": "00:00:04:04",
"timeToRestoreServices": "00:00:00:00"
}
}
GET {{BASE_URL}}/lastdeployment?repo=SOMEORG/SOMEREPO
{
"id": "de9e97a5f7e60230c440c627b0779629fa2c796b",
"timeCreated": "1644259334000"
}
- Create test data functionality/script
- Add more tests
- Higher type coverage
- The time format
DD:HH:MM:SS
is logically limited to 99 days - but is this a real problem? - Alarms for failures