ClawNetwork is a tool to simulate a network and evaluate failures impacts on Top of Racks.
It has been specially crafted for Clos Matrix network. For now, cyclic graphs are not supported. Only trees are.
ClawNetwork is in active development.
The main features are implemented, but endpoints, structures and functions may change.
But it is now usable and you are free to play with it :)
The main usecase it to evaluate if an operation on a device in your core network will impact a Top of Rack.
Concerned operations can be: upgrade, reboot, risky maintenance etc...
ClawNetwork can be leveraged to detect SPOF of any anomalies such as spine without downlinks.
Simply run ClawNetwork app using go run .
Alternative: build the binary via go build
and run it.
Run ClawNetwork with default backend (FileRepository):
docker-compose -f compose/docker-compose.yml up -d
FileRepository stores the topologies in dedicated JSON files on the disk.
By default, this uses examples/
directory provided in this repository.
At the moment this is not customizable, but it will be very soon.
docker-compose -f compose/docker-compose.yml -f <backend>.yml up -d
recommended backend for production if you need to store topologies
At the moment, Redis JSON is the only alternative backend:
docker-compose -f compose/docker-compose.yml -f redisjson.yml up -d
This backend leverages RedisJSON module to store pure JSON to Redis. Persistence is enabled and forced at each changes (ADD/DELETE) by ClawNetwork.
Configuration can be configured either via environment variables or YAML file (settings.yaml).
List of parameters available (varenv format
| YAML format
):
CLAW_LISTENADDRESS
|ListenAddress
: ClawNetwork API listen address (default:"0.0.0.0"
)CLAW_LISTENPORT
|ListenPort
: ClawNetwork API listen port (default:"8080"
)CLAW_TOPDEVICEROLE
|TopDeviceRole
: Role of device at the top of the topology graph (default:"edge"
)CLAW_BOTTOMDEVICEROLE
|BottomDeviceRole
: Role of device at the Bottom of the topology graph (default:"tor"
)CLAW_BACKEND
|Backend
: Choose backend to store topologies (choices:"file", "redis"
, default:"file"
)CLAW_BACKENDS_FILE_PATH
|Backends.Redis.Path
: Redis DB to use (default:"./topologies/"
)CLAW_BACKENDS_REDIS_HOST
|Backends.Redis.Host
: Redis server address (default:"localhost"
)CLAW_BACKENDS_REDIS_PORT
|Backends.Redis.Port
: Redis server port (default:"6379"
)CLAW_BACKENDS_REDIS_PASSWORD
|Backends.Redis.Password
: Redis password (default:""
)CLAW_BACKENDS_REDIS_DB
|Backends.Redis.DB
: Redis DB to use (default:0
)
- GET
/topology
: list stored topologies - GET
/topology/:topology_name
: get topology definition - POST
/topology/:topology_name
: create a new topology - DELETE
/topology/:topology_name
: delete a topology - GET
/topology/details
: list stored topologies with some stats - GET
/topology/:topology_name/details
: get topology stats
- GET
/topology/:topology_name/device/:device/down/impact
: run simulations on existing topology - POST
/topology/custom/device/:device/down/impact
: run simulations on topology provided in the request body
It will run a simulation on a stored topology.
If :device
is set to each
, it will simulate failure impact of each devices excluding Top of Racks.
- GET
/topology/:topology_name/anomalies
: get topology anomalies
It list all anomalies in the topology graph.
A node is not connected properly to the graph.
For example:
- a ToR does not have any uplinks
- a spine does not have any downlinks or any uplinks
- an edge does not have any downlinks
This does not consider the status of the link, it only checks if there is a link.
The topology to provide looks like this in JSON:
{
"nodes": [
{
"hostname": "tor-01-01",
"role": "tor",
"status": true,
"layer": 1
},
{
"hostname": "fabric-1-01",
"role": "fabric",
"status": true,
"layer": 2
}
],
"links": [
{
"south_node": "tor-01-01",
"north_node": "fabric-1-01",
"status": true,
"uid": "10.0.0.0->10.0.0.1"
}
]
}
This structure is subject to change, as the API is not considered stable at the moment
Topology = 4 healthy fabric nodes + 4 healthy ToR
Simulations:
- first simulation considering first fabric node as down
- second simulation considering second fabric node as down but with the first up
- ...
You can query the following endpoint to simulate down impact of each devices. It get the tppology example from the example/full_topology_with_issues.json
.
$ curl http://127.0.0.1:8080/topology/full_topology_with_issues/device/each/down/impact | jq
{
"scenarios_result": {
"edge-0": {
"impacts": null,
"parameters": {
"devices_down": [
"edge-0"
],
"links_down": null
}
},
"edge-1": {
"impacts": null,
"parameters": {
"devices_down": [
"edge-1"
],
"links_down": null
}
},
"fabric-1-01": {
"impacts": [
"tor-01-01"
],
"parameters": {
"devices_down": [
"fabric-1-01"
],
"links_down": null
}
},
...,
"compute_time": "89 ms"
}
As you can see, tor-01-01
would be down if we shut fabric-1-01
.
The topology defined in example/full_topology_with_issues.json
, also specifies some devices as down. Here all the fabric of pod 01 has been set to down except for fabric-1-01
. This is why if there is a failure on this device, it will impact tor-01-01
as this ToR only had one healthy uplink.
Note: more advanced examples will be provided soon, with more complex scenarios.
Below some ideas of possible integrations:
- the client push the topology with the simulation request. The topology is not stored.
+-------------------------+
| Observability metrics |
| example: Prometheus |
+-------------------------+
^
|
| get metrics
|
|
|
+-----------------------+
| | get impact
| Client | on custom topology +---------------+
| => convert metrics |--------------------------------->| ClawNetwork |
| to topology | +---------------+
+-----------------------+
- the client provides the topologies and they are stored
+-------------------------+
| Observability metrics |
| example: Prometheus |
+-------------------------+
^
|
| get metrics
|
|
|
+-----------------------+
| Client | push topology +---------------+ save topology +-------------------------+
| => convert metrics |------------------------->| ClawNetwork |<------------------------>| Storage (FS, redis,...) |
| to topology | get impact +---------------+ get topology +-------------------------+
+-----------------------+
- dedicated topology provider
+---------------------+
+-------------------------+ | Topology provider |
| Observability metrics | <------------------- | => convert metrics |
+-------------------------+ | to topology |
+---------------------+
|
|
| push topology
|
|
|
v
+-----------------------+ get impact +---------------+ save topology +-------------------------+
| Client |------------------------->| ClawNetwork |<------------------------>| Storage (FS, redis,...) |
+-----------------------+ +---------------+ get topology +-------------------------+