Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored parquet exporters, dynamic selection of output columns #241

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

T0mexX
Copy link
Contributor

@T0mexX T0mexX commented Aug 12, 2024

Summary

Added output configuration, that can be defined in the scenario .json file, that allows to select which columns are to be included in the raw output files host.parquet, server.parquet and service.parquet.

Implementation Notes ⚒️

Columns

The 'default' columns are defined in DfltHostExportcolumns, DfltServerExportColumns and DfltServiceExportColumns. Any number of additional columns can be definied anywhere (ExportColumn<Exportable>) and it is going to be deserializable as long as it is loaded by the jvm.

Deserialization

Each ExportColumn has a Regex, used for deserialization. If no custom regex is provided, the default one is used. The default regex matches the column name in case-insensitive manner, either with _ as separator (as in the name) or with (blank space).

E.g.:
column name = "cpu_count"
default column regex = "\\s*(?:cpu_count|cpu count)\\s*" (case-insensitive)
matches = "cpu_count", "cpu count", "CpU_cOuNt" etc.

JSON Schema

// scenario.json
{
	...
	"computeExportConfig": {
		"type": "object",
		"properties": {
			"hostExportColumns": { "type": "array" }, 
			"serverExportColumns": { "type": "array" } ,
			"serviceExportColumns": { "type": "array" } ,
			"required": [ /* NONE REQUIRED */ ]
		}
	},
	...
	"required": [
		...
		// NOT REQUIRED
	]
}

Bad Formatting Cases

  • If a column name (and type) does not match any deserializable column, the entry is ignored and error message is logged.
  • If an empty list of columns is provided or those that are provided were not deserializable, then all loaded columns for that Exportable are used, and a warning message is logged.
  • If no list is provided, then all loaded columns for that Exportable are used.

Example

// scenario.json
{
	...
	"computeExportConfig": {
		"hostExportColumns": ["timestamp", "timestamp_absolute", "invalid-entry", "guests_invalid"],
		"serverExportColumns": ["invalid-entry"],
		"serviceExportColumns": ["timestamp", "servers_active", "servers_pending"]
	},
	...
// console output
10:51:56.561 [ERROR] ColListSerializer - no match found for column "invalid-entry", ignoring...
10:51:56.563 [ERROR] ColListSerializer - no match found for column "invalid-entry", ignoring...
10:51:56.564 [WARN] ComputeExportConfig - deserialized list of export columns for exportable ServerTableReader produced empty list, falling back to all loaded columns
10:51:56.584 [INFO] ScenariosSpec - 
| === Compute Export Config ===
| Host columns    : timestamp, timestamp_absolute, guests_invalid
| Server columns  : timestamp, timestamp_absolute, server_id, server_name, cpu_count, mem_capacity, cpu_limit, cpu_time_active, cpu_time_idle, cpu_time_steal, cpu_time_lost, uptime, downtime, provision_time, boot_time, boot_time_absolute
| Service columns : timestamp, servers_active, servers_pending

host.parquet

host_schema
host_parquet

server.parquet
not included cause too large

service.parquet

service_schema
service_parquet

External Dependencies 🍀

  • N/A

Breaking API Changes ⚠️

  • N/A

@T0mexX
Copy link
Contributor Author

T0mexX commented Aug 12, 2024

I force-pushed to change the commit messages to all use the past tense, since I saw that most of the commits use it.

@T0mexX T0mexX changed the title Refactored parquet exporters, allowing to dynamically select columns to include in output Refactored parquet exporters, dynamic selection of output columns Aug 15, 2024
@T0mexX T0mexX force-pushed the master branch 6 times, most recently from 76baea8 to ecaa44c Compare August 21, 2024 10:26
T0mexX added a commit to t0m3x-org/opendc-parquet-exporter that referenced this pull request Aug 22, 2024
T0mexX added a commit to t0m3x-org/opendc-parquet-exporter that referenced this pull request Aug 22, 2024
@DanteNiewenhuis DanteNiewenhuis merged commit f9ffdfb into atlarge-research:master Aug 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants