-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the Topics and Messages sent to Kafka #28
Comments
For collectors, we need to use one topic per collector, named after the collector that is going to use it for sending messages. The topics we have right now are:
This way the spark task sender can receive messages from a certain topic and know from its config file what scripts to launch The messages will contain the data that will be passed to the scripts, sent as a JSON formatted string, which for collectors would be:
So the messages would be the marshalled JSON from the Golang struct:
This way we have the collector performing their task, then signalling on their topic that they have the data ready, and the spark task sender can react accordingly by sending the tasks to the Spark cluster and provide the necessary arguments to the scripts. |
@Cerfoglg ok, go for it and update the collectors accordingly. I would always use underscore separated and lowercase for Kafka topic, to be consistent. If you agree, please update benchflow/benchflow#2. At same point we would also probably need to specify the SUT information somewhere, so that the spark-tasks-sender has all the information to instantiate the correct data-transformer and consequently the correct analysers. |
@VincenzoFerme I altered the description a bit. Basically, instead of passing experiment id and replication number I just pass trial ID, since it's a composite of the two anyways. Also, I'm passing the container ID, as we need to store that in the database, so we should have the container report that. Also, the message format is a single string containing the 3 values, separated by commas. |
@Cerfoglg ok for just using trial_ID.
|
|
|
That second point about the commons is very true. Alright, I'll change it to send a JSON object instead of a coma separated message. Unmarshalling JSON into a data structure in Golang is really quick, so it should be easy to deal with them. |
Now when a collector signals on kafka, it will send a JSON with this structure: {
minio_key: "MINIO_LOCATION",
trial_id: "TRIAL_ID"
experiment_id:"EXPERIMENT_ID"
container_id:"CONTAINER_ID"
host_id:"HOST_ID"
collector_name:"COLLECTOR_NAME"
} With location and trial id being the key of the stored data, and trial id being the trial id associated with them |
@Cerfoglg discuss about why it is the right choice to have a unique key for each "container folder" and using multiple comma separated key to represent information coming from different containers. |
@VincenzoFerme It's acceptable to send a single key containing the container folder because with the Minio API we can obtain a list of all files in that "folder", essentially all keys with that prefix. We can separate by comma keys belonging to different containers, which need to be taken separately by the scripts. This way we don't end up with large kafka messages in the case we have too many files that were collected. We send the container ids the same way as the minio keys: a comma separated list in the same order as the minio keys |
The current kafka messages are sent as a json marshalling of this Go structure:
Where minio keys and container ids can be sent as coma separated lists when dealing with multiple containers to collect data from, such as stats. |
@VincenzoFerme This definition should be final |
Evaluate the following:
|
Define and Implement the Topic and the structure of the Messages sent from the Collectors to Kafka
The text was updated successfully, but these errors were encountered: