Data Aggregation | Report Exemplum

This documentation describes data flow stages and how data is aggregated in ROS-OCP

Stage 1 - Raw metrics

When cost-management operator initially collects metrics from openshift, CSV file look like below.

report_period_start	report_period_end	interval_start	interval_end	container_name	pod	owner_name	owner_kind	workload	workload_type	namespace	image_name	node	resource_id	cpu_request_container_avg	cpu_request_container_sum	cpu_limit_container_avg	cpu_limit_container_sum	cpu_usage_container_avg	cpu_usage_container_min	cpu_usage_container_max	cpu_usage_container_sum	cpu_throttle_container_avg	cpu_throttle_container_max	cpu_throttle_container_sum	memory_request_container_avg	memory_request_container_sum	memory_limit_container_avg	memory_limit_container_sum	memory_usage_container_avg	memory_usage_container_min	memory_usage_container_max	memory_usage_container_sum	memory_rss_usage_container_avg	memory_rss_usage_container_min	memory_rss_usage_container_max	memory_rss_usage_container_sum
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:00:01 +0000 UTC	2023-02-22 05:15:00 +0000 UTC	Yuptoo-service	Yuptoo-service-999-1	Yuptoo-service-999	ReplicaSet	Yuptoo-service	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	1	1	1	1	3	1	6	2	0	0	0	1073741824	1073741824	1073741824	1073741824	513587266.064516	510009344	513900544	513587266.064516	493311537.548387	493293568	493371392	493311537.548387
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:00:01 +0000 UTC	2023-02-22 05:15:00 +0000 UTC	Yuptoo-service	Yuptoo-service-999-2	Yuptoo-service-999	ReplicaSet	Yuptoo-service	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	1	1	1	1	5	2	7	2	0	0	0	1073741824	1073741824	1073741824	1073741824	513587266.064516	510009344	513900544	513587266.064516	493311537.548387	493293568	493371392	493311537.548387
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:00:01 +0000 UTC	2023-02-22 05:15:00 +0000 UTC	Yuptoo-service	Yuptoo-app-standalone-1	Yuptoo-app	ReplicaSet	none	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	1	1	1	1	0.047932	0.031571	0.064131	0.047932	0	0	0	1073741824	1073741824	1073741824	1073741824	513587266.064516	510009344	513900544	513587266.064516	493311537.548387	493293568	493371392	493311537.548387
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:00:01 +0000 UTC	2023-02-22 05:15:00 +0000 UTC	server-one	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	1	1	1	1	0.047932	0.031571	0.064131	0.047932	0	0	0	1073741824	1073741824	1073741824	1073741824	513587266.064516	510009344	513900544	513587266.064516	493311537.548387	493293568	493371392	493311537.548387
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:00:01 +0000 UTC	2023-02-22 05:15:00 +0000 UTC	Server-two	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	1	1	1	1	0.047932	0.031571	0.064131	0.047932	0	0	0	1073741824	1073741824	1073741824	1073741824	513587266.064516	510009344	513900544	513587266.064516	493311537.548387	493293568	493371392	493311537.548387
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:00:01 +0000 UTC	2023-02-22 05:15:00 +0000 UTC	Server-three	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	1	1	1	1	0.047932	0.031571	0.064131	0.047932	0	0	0	1073741824	1073741824	1073741824	1073741824	513587266.064516	510009344	513900544	513587266.064516	493311537.548387	493293568	493371392	493311537.548387
2023-02-01 00:00:00 +0000 UTC	2023-03-01 00:00:00 +0000 UTC	2023-02-22 05:15:01 +0000 UTC	2023-02-22 05:30:00 +0000 UTC	volume-shell	volume-shell					koku-metrics-operator-2	docker.io/library/busybox:latest	ip-10-0-147-202.us-east-2.compute.internal	i-080d71220c1e835ec					0.000018	0	0.000554	0.000018								1835140.129032	1323008	2256896	1835140.129032	106496	106496	106496	106496

Each row of above table/csv represent container running in openshift.

Stage 2 - Creating Dataframe

When ros-ocp receives this csv it creates a dataframe out of it. We use go-gota / gota golang module for data analysis and manipulation.

df := dataframe.LoadRecords(data)

Stage 3 - Cleaning Data

We clean the data by dropping the rows if any of the column (owner_kind, owner_name, workload, workload_type) is empty.

df = df.FilterAggregation(
		dataframe.And,
		dataframe.F{Colname: "owner_kind", Comparator: series.Neq, Comparando: ""},
		dataframe.F{Colname: "owner_name", Comparator: series.Neq, Comparando: ""},
		dataframe.F{Colname: "workload", Comparator: series.Neq, Comparando: ""},
		dataframe.F{Colname: "workload_type", Comparator: series.Neq, Comparando: ""},
	)

container_name	pod	owner_name	owner_kind	workload	workload_type	namespace	image_name	node	resource_id
Yuptoo-service	Yuptoo-service-999-1	Yuptoo-service-999	ReplicaSet	Yuptoo-service	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94
Yuptoo-service	Yuptoo-service-999-2	Yuptoo-service-999	ReplicaSet	Yuptoo-service	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94
Yuptoo-service	Yuptoo-app-standalone-1	Yuptoo-app	ReplicaSet	none	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94
server-one	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94
Server-two	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94
Server-three	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94

Stage 4 - Adding new columns

Depending on the value of workload column we create the new columns k8s_object_name and k8s_object_type for each row in the dataframe.

For each row if owner_kind == "ReplicaSet" && workload == "<none>" we set k8s_object_type = "replicaset" and k8s_object_name = owner_name.

or if owner_kind == "ReplicationController" && workload == "<none>" we set k8s_object_type = "replicationcontroller" and k8s_object_name = owner_name.

else we set k8s_object_type = workload_type and k8s_object_name = workload.

s := df.Rapply(func(s series.Series) series.Series {
		owner_name := s.Elem(index_of_owner_name).String()
		owner_kind := s.Elem(index_of_owner_kind).String()
		workload := s.Elem(index_of_workload).String()
		workload_type := s.Elem(index_of_workload_type).String()
		if owner_kind == "ReplicaSet" && workload == "<none>" {
			return series.Strings([]string{"replicaset", owner_name})
		} else if owner_kind == "ReplicationController" && workload == "<none>" {
			return series.Strings([]string{"replicationcontroller", owner_name})
		} else {
			return series.Strings([]string{workload_type, workload})
		}
	})

	df = df.Mutate(s.Col("X0")).Rename("k8s_object_type", "X0")
	df = df.Mutate(s.Col("X1")).Rename("k8s_object_name", "X1")

container_name	pod	owner_name	owner_kind	workload	workload_type	namespace	image_name	node	resource_id	k8s_object_type	k8s_object_name
Yuptoo-service	Yuptoo-service-999-1	Yuptoo-service-999	ReplicaSet	Yuptoo-service	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	deployment	Yuptoo-service
Yuptoo-service	Yuptoo-service-999-2	Yuptoo-service-999	ReplicaSet	Yuptoo-service	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	deployment	Yuptoo-service
Yuptoo-service	Yuptoo-app-standalone-1	Yuptoo-app	ReplicaSet	none	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	replicaset	Yuptoo-app
server-one	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	deployment	servers
Server-two	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	deployment	servers
Server-three	Servers-999-1	servers-999	ReplicaSet	servers	deployment	Yuptoo-prod	quay.io/cloudservices/yuptoo	ip-10-0-176-227.us-east-2.compute.internal	i-0dfbb3fa4d0e8fc94	deployment	servers

Stage 5 - Filter valid k8s object types

Currently we only supports kubernetes objects like Daemonset, Deployment, Deploymentconfig, Replicaset, Replicationcontroller and Statefulset. We do not support other object like Jobs. So in this steps we remove the unsupported object like Jobs from the CSV/Dataframe.

func filter_valid_k8s_object_types(df dataframe.DataFrame) dataframe.DataFrame {
	return df.Filter(
		dataframe.F{
			Colname:    "k8s_object_type",
			Comparator: series.In,
			Comparando: []string{
				w.Daemonset.String(),
				w.Deployment.String(),
				w.Deploymentconfig.String(),
				w.Replicaset.String(),
				w.Replicationcontroller.String(),
				w.Statefulset.String(),
			}},
	)
}

Stage 6 - Filter valid csv records

It might happen operator generates CSV file where in for some container usage metrics is not present(blank) we need to drop such rows from CSV/dataframe.

func filter_valid_csv_records(main_df dataframe.DataFrame) (dataframe.DataFrame, int) {
	df := main_df.FilterAggregation(
		dataframe.And,
		dataframe.F{Colname: "memory_rss_usage_container_sum", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_rss_usage_container_max", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_rss_usage_container_min", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_rss_usage_container_avg", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_usage_container_sum", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_usage_container_max", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_usage_container_min", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "memory_usage_container_avg", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "cpu_usage_container_sum", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "cpu_usage_container_max", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "cpu_usage_container_min", Comparator: series.GreaterEq, Comparando: 0},
		dataframe.F{Colname: "cpu_usage_container_avg", Comparator: series.GreaterEq, Comparando: 0},
	)

	no_of_dropped_records := main_df.Nrow() - df.Nrow()

	return df, no_of_dropped_records
}

Stage 7 - Grouping

We group the rows based on containers so that if any deployment have multiple replicas of pods we can aggregate there metrics of containers inside them.

dfGroups := df.GroupBy(
		"namespace",
		"k8s_object_type",
		"k8s_object_name",
		"workload",
		"container_name",
		"image_name",
		"interval_start",
		"interval_end",
	)

Group 1

container_name	image_name	k8s_object_name	k8s_object_type	namespace	workload	cpu_usage_container_avg	cpu_usage_container_max	cpu_usage_container_min	cpu_usage_container_sum
Yuptoo-service	quay.io/cloudservices/yuptoo	Yuptoo-service	deployment	Yuptoo-prod	Yuptoo-service	3	6	1	2
Yuptoo-service	quay.io/cloudservices/yuptoo	Yuptoo-service	deployment	Yuptoo-prod	Yuptoo-service	5	7	2	2

Group 2

container_name	image_name	k8s_object_name	k8s_object_type	namespace	workload	cpu_usage_container_avg	cpu_usage_container_max	cpu_usage_container_min	cpu_usage_container_sum
Yuptoo-service	quay.io/cloudservices/yuptoo	Yuptoo-app	replicaset	Yuptoo-prod	none	0.047932	0.064131	0.031571	0.047932

.

Group N

Stage 8 - Data Aggregation

aggregationMapping := map[string]dataframe.AggregationType{
		"cpu_request_container_avg":      dataframe.Aggregation_MEAN,
		"cpu_request_container_sum":      dataframe.Aggregation_SUM,
		"cpu_limit_container_avg":        dataframe.Aggregation_MEAN,
		"cpu_limit_container_sum":        dataframe.Aggregation_SUM,
		"cpu_usage_container_avg":        dataframe.Aggregation_MEAN,
		"cpu_usage_container_min":        dataframe.Aggregation_MIN,
		"cpu_usage_container_max":        dataframe.Aggregation_MAX,
		"cpu_usage_container_sum":        dataframe.Aggregation_SUM,
		"cpu_throttle_container_avg":     dataframe.Aggregation_MEAN,
		"cpu_throttle_container_max":     dataframe.Aggregation_MAX,
		"cpu_throttle_container_sum":     dataframe.Aggregation_SUM,
		"memory_request_container_avg":   dataframe.Aggregation_MEAN,
		"memory_request_container_sum":   dataframe.Aggregation_SUM,
		"memory_limit_container_avg":     dataframe.Aggregation_MEAN,
		"memory_limit_container_sum":     dataframe.Aggregation_SUM,
		"memory_usage_container_avg":     dataframe.Aggregation_MEAN,
		"memory_usage_container_min":     dataframe.Aggregation_MIN,
		"memory_usage_container_max":     dataframe.Aggregation_MAX,
		"memory_usage_container_sum":     dataframe.Aggregation_SUM,
		"memory_rss_usage_container_avg": dataframe.Aggregation_MEAN,
		"memory_rss_usage_container_min": dataframe.Aggregation_MIN,
		"memory_rss_usage_container_max": dataframe.Aggregation_MAX,
		"memory_rss_usage_container_sum": dataframe.Aggregation_SUM,
	}

	columnsToAggregate := []string{}
	columnsAggregationType := []dataframe.AggregationType{}
	for k, v := range aggregationMapping {
		columnsToAggregate = append(columnsToAggregate, k)
		columnsAggregationType = append(columnsAggregationType, v)
	}

	df = dfGroups.Aggregation(columnsAggregationType, columnsToAggregate)

container_name	image_name	k8s_object_name	k8s_object_type	namespace	workload	cpu_usage_container_avg_MEAN	cpu_usage_container_max_MAX	cpu_usage_container_min_MIN	cpu_usage_container_sum_SUM
Yuptoo-service	quay.io/cloudservices/yuptoo	Yuptoo-service	deployment	Yuptoo-prod	Yuptoo-service	4	7	1	4

Final Output

container_name	image_name	k8s_object_name	k8s_object_type	namespace	workload	cpu_usage_container_avg_MEAN	cpu_usage_container_max_MAX	cpu_usage_container_min_MIN	cpu_usage_container_sum_SUM
Yuptoo-service	quay.io/cloudservices/yuptoo	Yuptoo-app	replicaset	Yuptoo-prod	none	0.047932	0.064131	0.031571	0.047932
server-one	quay.io/cloudservices/yuptoo	servers	deployment	Yuptoo-prod	servers	0.047932	0.064131	0.031571	0.047932
Server-two	quay.io/cloudservices/yuptoo	servers	deployment	Yuptoo-prod	servers	0.047932	0.064131	0.031571	0.047932
Server-three	quay.io/cloudservices/yuptoo	servers	deployment	Yuptoo-prod	servers	0.047932	0.064131	0.031571	0.047932
Yuptoo-service	quay.io/cloudservices/yuptoo	Yuptoo-service	deployment	Yuptoo-prod	Yuptoo-service	4	7	1	4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly