Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cardano-tracer: Allow switching EKG service between different nodes. #5975

Merged
merged 1 commit into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions cabal.project
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ package cryptonite
flags: -support_rdrand

package snap-server
flags: +openssl
flags: -openssl

package bitvec
flags: -simd
Expand All @@ -62,8 +62,8 @@ constraints:

allow-newer:
, katip:Win32
, ekg-wai:time

-- IMPORTANT
-- Do NOT add more source-repository-package stanzas here unless they are strictly
-- temporary! Please read the section in CONTRIBUTING about updating dependencies.

11 changes: 11 additions & 0 deletions cardano-tracer/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# ChangeLog

## 0.3 (September 20, 2024)

* Abondon `snap` webserver in favour of `wai`/`warp` for Prometheus and EKG Monitoring.
* Add dynamic routing to EKG stores of all connected nodes.
* Derive URL compliant routes from connected node names (instead of plain node names).
* Remove the requirement of two distinct ports for the EKG backend (changing `hasEKG` config type).
* For optional RTView component only: Disable SSL/https connections. Force `snap-server`
dependency to build with `-flag -openssl`.
* Add JSON responses when listing connected nodes for both Prometheus and EKG Monitoring.
* Add consistency check for redundant port values in the config.

## 0.2.4 (August 13, 2024)

* `systemd` is enabled by default. To disable it use the cabal
Expand Down
19 changes: 10 additions & 9 deletions cardano-tracer/cardano-tracer.cabal
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cabal-version: 3.0

name: cardano-tracer
version: 0.2.4
version: 0.3
synopsis: A service for logging and monitoring over Cardano nodes
description: A service for logging and monitoring over Cardano nodes.
category: Cardano,
Expand Down Expand Up @@ -155,11 +155,12 @@ library
cardano-git-rev ^>=0.2.2
, cassava
, threepenny-gui
, utf8-string
, vector

build-depends: aeson
, async
, async-extras
, auto-update
, bimap
, blaze-html
, bytestring
Expand All @@ -168,21 +169,20 @@ library
, containers
, contra-tracer
, directory
, ekg
, ekg-core
, ekg-forward ^>= 0.5
, ekg-forward >= 0.5
, ekg-wai
, extra
, filepath
, http-types
, mime-mail
, optparse-applicative
, ouroboros-network ^>= 0.17
, ouroboros-network-api
, ouroboros-network-framework
, signal
, slugify
, smtp-mail ^>= 0.5
, snap-blaze
, snap-core
, snap-server
, stm
, string-qq
, text
Expand All @@ -191,6 +191,8 @@ library
, trace-forward
, trace-resources
, unordered-containers
, wai ^>= 3.2
, warp ^>= 3.4
, yaml

if flag(systemd) && os(linux)
Expand Down Expand Up @@ -281,8 +283,7 @@ library demo-acceptor-lib

exposed-modules: Cardano.Tracer.Test.Acceptor

build-depends: async-extras
, bytestring
build-depends: bytestring
, cardano-tracer
, containers
, extra
Expand Down
14 changes: 4 additions & 10 deletions cardano-tracer/configuration/complete-example.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,10 @@
},
"loRequestNum": 100,
"ekgRequestFreq": 2,
"hasEKG": [
{
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
],
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
},
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
Expand Down
4 changes: 1 addition & 3 deletions cardano-tracer/configuration/complete-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@ network:
loRequestNum: 100
ekgRequestFreq: 2
hasEKG:
- epHost: 127.0.0.1
epHost: 127.0.0.1
epPort: 3100
- epHost: 127.0.0.1
epPort: 3101
hasPrometheus:
epHost: 127.0.0.1
epPort: 3000
Expand Down
10 changes: 2 additions & 8 deletions cardano-tracer/demo/multi/active-tracer-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,10 @@
"/run/user/1000/cardano-tracer-demo-3.sock"
]
},
"hasEKG": [
{
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
],
},
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
Expand Down
14 changes: 4 additions & 10 deletions cardano-tracer/demo/multi/passive-tracer-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,10 @@
"tag": "AcceptAt",
"contents": "/run/user/1000/cardano-tracer-demo-1.sock"
},
"hasEKG": [
{
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
],
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
},
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
Expand Down
137 changes: 100 additions & 37 deletions cardano-tracer/docs/cardano-tracer.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,72 +337,135 @@ The fields `rpMaxAgeMinutes`, `rpMaxAgeHours` specify the lifetime of the log fi

## Prometheus

The optional field `hasPrometheus` specifies the host and port of the web page with metrics. For example:
At top-level route `/` Promtheus gives a list of connected nodes.

The responses are either human-readable names (HTML) with clickable
links, or JSON mapping from connected node names to relative URLs,
depending on desired content type (`Accept:` header of the request).

The routes dynamically depend on the connected nodes, the node names
are [sluggified](https://hackage.haskell.org/package/slugify).

The optional field `hasPrometheus` specifies the host and port of the
web page with Prometheus metrics. For example:

```
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
"epPort": 3200
}
```

Here the web page is available at `http://127.0.0.1:3000`. Please note that if you skip this field, the web page will not be available.
With this example, the list of clickable identifiers of connected
nodes will be available at `http://127.0.0.1:3200`, such as:

```
* 127.0.0.1:30004
* 127.0.0.1:30001
* 127.0.0.1:30005
* 127.0.0.1:30000
* 127.0.0.1:30003
* 127.0.0.1:30002
* TxGenerator
```

Clicking an identifier will take you to its monitoring page. For
example clicking on `127.0.0.1:30004` displays the monitoring metrics
at `http://localhost:3200/12700130004`.

After you open `http://127.0.0.1:3000` in your browser, you will see the list of identifiers of connected nodes (or the warning message, if there are no connected nodes yet), for example:
Sending a HTTP GET request with a JSON Accept header gives the metrics
of the top-level route, or identifier as JSON. `jq '.'` pretty-prints
the JSON object.

```
* tmp-forwarder.sock@0
* tmp-forwarder.sock@1
* tmp-forwarder.sock@2
$ curl --silent -H "Accept: application/json" '127.0.0.1:3200' | jq '.'
{
"127.0.0.1:30000": "/12700130000",
"127.0.0.1:30001": "/12700130001",
"127.0.0.1:30002": "/12700130002",
"127.0.0.1:30003": "/12700130003",
"127.0.0.1:30004": "/12700130004",
"127.0.0.1:30005": "/12700130005",
"TxGenerator": "/txgenerator"
}
```

Each identifier is a hyperlink to the page where you will see the **current** list of metrics received from the corresponding node, in such a format:
The Promethus output is a map from Prometheus metric to value:

```
$ curl '127.0.0.1:3200/12700130004'
blockNum_int 35
rts_gc_init_cpu_ms 5
rts_gc_par_tot_bytes_copied 0
rts_gc_num_gcs 2
rts_gc_max_bytes_slop 15880
rts_gc_num_bytes_usage_samples 1
rts_gc_wall_ms 4005
...
rts_gc_par_max_bytes_copied 0
rts_gc_mutator_cpu_ms 57
rts_gc_mutator_wall_ms 4004
rts_gc_gc_cpu_ms 1
rts_gc_cumulative_bytes_used 184824
served_block_counter 31
submissions_accepted_counter 2771
density_real 5.7692307692307696e-2
blocksForged_int 6

```

## EKG Monitoring

The optional field `hasEKG` specifies the hosts and ports of two web pages:
At top-level route `/` EKG gives a list of connected nodes.

The responses are either human-readable names (HTML) with clickable
links, or JSON mapping from connected node names to relative URLs,
depending on desired content type (`Accept:` header of the request).

1. the list of identifiers of connected nodes,
2. EKG monitoring page.
The routes dynamically depend on the connected nodes, the node names
are [sluggified](https://hackage.haskell.org/package/slugify).

For example, if you use JSON configuration file:
The optional field `hasEKG` specifies the host and port of the web
page with EKG metrics. For example:

```
"hasEKG": [
{
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
]
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
}
```

The page with the list of identifiers of connected nodes will be available at `http://127.0.0.1:3100`, for example:
With this example, the list of clickable identifiers of connected
nodes will be available at `http://127.0.0.1:3100`, such as:

```
* tmp-forwarder.sock@0
* tmp-forwarder.sock@1
* tmp-forwarder.sock@2
* 127.0.0.1:30004
* 127.0.0.1:30001
* 127.0.0.1:30005
* 127.0.0.1:30000
* 127.0.0.1:30003
* 127.0.0.1:30002
* TxGenerator
```

Each identifier is a hyperlink, after clicking to it you will be redirected to `http://127.0.0.1:3101` where you will see EKG monitoring page for corresponding node.
Clicking an identifier will take you to its monitoring page. For
example clicking on `127.0.0.1:30004` displays the monitoring metrics
at `http://localhost:3100/12700130004`.

Sending a HTTP GET request with a JSON Accept header gives the metrics
of an identifier as JSON. `jq '.'` pretty-prints the JSON object.

```
$ curl --silent -H 'Accept: application/json' '127.0.0.1:3100/12700130004' | jq '.'
{
"ChainSync": {
"HeadersServed_counter": {
"type": "c",
"val": 24
}
},
"Mem": {
"resident_int": {
"type": "g",
"val": 91877376
}
},
"RTS": {
"alloc_int": {
"type": "g",
"val": 1014189896
},
```

## Verbosity

Expand Down
26 changes: 20 additions & 6 deletions cardano-tracer/src/Cardano/Tracer/Acceptors/Utils.hs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{-# LANGUAGE NamedFieldPuns #-}
#if RTVIEW
{-# LANGUAGE OverloadedStrings #-}
#endif
{-# LANGUAGE TupleSections #-}

module Cardano.Tracer.Acceptors.Utils
( prepareDataPointRequestor
Expand All @@ -26,6 +25,7 @@ import Control.Concurrent.STM.TVar (TVar, modifyTVar', newTVarIO)
import qualified Data.Bimap as BM
import qualified Data.Map.Strict as M
import qualified Data.Set as S
import Data.Time.Clock.POSIX (getPOSIXTime)
#if RTVIEW
import Data.Time.Clock.System (getSystemTime, systemToUTCTime)
#endif
Expand All @@ -51,12 +51,26 @@ prepareMetricsStores
-> IO (EKG.Store, TVar MetricsLocalStore)
prepareMetricsStores TracerEnv{teConnectedNodes, teAcceptedMetrics} connId = do
addConnectedNode teConnectedNodes connId
storesForNewNode <- (,) <$> EKG.newStore
<*> newTVarIO emptyMetricsLocalStore
atomically $
modifyTVar' teAcceptedMetrics $ M.insert (connIdToNodeId connId) storesForNewNode
store <- EKG.newStore

EKG.registerCounter "ekg.server_timestamp_ms" getTimeMs store
storesForNewNode <- (store ,) <$> newTVarIO emptyMetricsLocalStore

atomically do
modifyTVar' teAcceptedMetrics do
M.insert (connIdToNodeId connId) storesForNewNode

return storesForNewNode

where
-- forkServer definition of `getTimeMs'. The ekg frontend relies
-- on the "ekg.server_timestamp_ms" metric being in every
-- store. While forkServer adds that that automatically we must
-- manually add it.
-- url
-- + https://github.com/tvh/ekg-wai/blob/master/System/Remote/Monitoring/Wai.hs#L237-L238
getTimeMs = (round . (* 1000)) `fmap` getPOSIXTime

addConnectedNode
:: ConnectedNodes
-> ConnectionId LocalAddress
Expand Down
Loading
Loading