Define the Topics and Messages sent to Kafka #28

VincenzoFerme · 2015-12-10T10:04:46Z

Define and Implement the Topic and the structure of the Messages sent from the Collectors to Kafka

Cerfoglg · 2015-12-11T15:23:25Z

For collectors, we need to use one topic per collector, named after the collector that is going to use it for sending messages. The topics we have right now are:

mysql
stats
properties
faban
logs

This way the spark task sender can receive messages from a certain topic and know from its config file what scripts to launch

The messages will contain the data that will be passed to the scripts, sent as a JSON formatted string, which for collectors would be:

The location of the stored data on minio
The trial ID
The experiment ID
The container ID
The host ID
The collector name

So the messages would be the marshalled JSON from the Golang struct:

type KafkaMessage struct {
    Minio_key string `json:"minio_key"`
    Trial_id string `json:"trial_id"`
    Experiment_id string `json:"experiment_id"`
    Container_id string `json:"container_id"`
    Host_id string `json:"host_id"`
    Collector_name string `json:"collector_name"`
    }

This way we have the collector performing their task, then signalling on their topic that they have the data ready, and the spark task sender can react accordingly by sending the tasks to the Spark cluster and provide the necessary arguments to the scripts.

VincenzoFerme · 2015-12-11T21:15:31Z

@Cerfoglg ok, go for it and update the collectors accordingly.

I would always use underscore separated and lowercase for Kafka topic, to be consistent. If you agree, please update benchflow/benchflow#2.

At same point we would also probably need to specify the SUT information somewhere, so that the spark-tasks-sender has all the information to instantiate the correct data-transformer and consequently the correct analysers.

Cerfoglg · 2015-12-15T15:05:42Z

@VincenzoFerme I altered the description a bit. Basically, instead of passing experiment id and replication number I just pass trial ID, since it's a composite of the two anyways. Also, I'm passing the container ID, as we need to store that in the database, so we should have the container report that.

Also, the message format is a single string containing the 3 values, separated by commas.

VincenzoFerme · 2015-12-15T19:09:05Z

@Cerfoglg ok for just using trial_ID.

Why do you need the container_ID for all the collectors? You should need it only for the collectors sending data referencing to the container table. Moreover in the future we should move somewhere else in the flow the mapping between the trail_ID and the container_ID so that we will only need to pass around the trial_ID. NOTE: currently we always pass all of them to dynamically manage the containers from where the collectors collect data
Why do you switched to a 3 values comma separated message instead of a structured message? Apart performance of communications, are there other advantages?

Cerfoglg · 2015-12-15T19:37:39Z

@VincenzoFerme

You're right, maybe it is best to only send the trial id and use that to obtain other informations like the container id, instead of sending too much information via kafka. I'll change it back to only the file location and trail id in the message
It's mostly for performance. We are not sending too much data through kafka, so we can keep it compact and use a simple coma separated message, which is also easier to process when received by just splitting the message. Of course, there's always the option to send a structured message in JSON format and unmarshal that once received, but the message will be bigger.

VincenzoFerme · 2015-12-15T19:53:34Z

@Cerfoglg

ok
ok. I would prefer the option in which we send a bigger, but self-descriptive, message. This because:
- We are not sending big messages, so we can spend some more bytes to add metadata;
- We can define the data structure of these messages in commons and share it in our projects so that we are sure to always be compliant to the defined structure.

Cerfoglg · 2015-12-15T19:57:47Z

@VincenzoFerme

That second point about the commons is very true. Alright, I'll change it to send a JSON object instead of a coma separated message. Unmarshalling JSON into a data structure in Golang is really quick, so it should be easy to deal with them.

Cerfoglg · 2016-01-05T14:35:52Z

@VincenzoFerme

Now when a collector signals on kafka, it will send a JSON with this structure:

{
    minio_key: "MINIO_LOCATION",
    trial_id: "TRIAL_ID"
    experiment_id:"EXPERIMENT_ID"
    container_id:"CONTAINER_ID"
    host_id:"HOST_ID"
    collector_name:"COLLECTOR_NAME"
}

With location and trial id being the key of the stored data, and trial id being the trial id associated with them

VincenzoFerme · 2016-02-25T16:20:20Z

@Cerfoglg Update the structure of the message, so that the names match with the one we defined in the following issue: #38

VincenzoFerme · 2016-05-19T13:12:10Z

@Cerfoglg discuss about why it is the right choice to have a unique key for each "container folder" and using multiple comma separated key to represent information coming from different containers.

Cerfoglg · 2016-05-27T12:47:22Z

@VincenzoFerme It's acceptable to send a single key containing the container folder because with the Minio API we can obtain a list of all files in that "folder", essentially all keys with that prefix. We can separate by comma keys belonging to different containers, which need to be taken separately by the scripts. This way we don't end up with large kafka messages in the case we have too many files that were collected. We send the container ids the same way as the minio keys: a comma separated list in the same order as the minio keys

Cerfoglg · 2016-07-11T12:29:31Z

The current kafka messages are sent as a json marshalling of this Go structure:

type KafkaMessage struct {
    Minio_key string `json:"minio_key"`
    Trial_id string `json:"trial_id"`
    Experiment_id string `json:"experiment_id"`
    Container_id string `json:"container_id"`
    Host_id string `json:"host_id"`
    Collector_name string `json:"collector_name"`
    }

Where minio keys and container ids can be sent as coma separated lists when dealing with multiple containers to collect data from, such as stats.

Cerfoglg · 2016-07-11T12:29:57Z

@VincenzoFerme This definition should be final

VincenzoFerme · 2016-08-26T15:21:59Z

Evaluate the following:

Decide if using one topic with same type of data from multiple sources (e.g., different dbms) or different topics as it is now.

VincenzoFerme added the feature label Dec 10, 2015

VincenzoFerme assigned Cerfoglg Dec 10, 2015

VincenzoFerme added this to the 0.0.2 milestone Dec 10, 2015

VincenzoFerme mentioned this issue Dec 10, 2015

The collectors should notify Kafka when the file is stored on Minio #29

Closed

VincenzoFerme assigned VincenzoFerme and unassigned Cerfoglg Jan 24, 2016

VincenzoFerme modified the milestones: Documentation, 0.0.2 Feb 8, 2016

Cerfoglg mentioned this issue Feb 16, 2016

Add new informations to kafka messages benchflow/data-analyses-scheduler#24

Closed

VincenzoFerme added documentation and removed feature labels Jul 11, 2016

VincenzoFerme added the enhancement label Aug 26, 2016

VincenzoFerme modified the milestones: Next Tasks, Documentation Aug 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define the Topics and Messages sent to Kafka #28

Define the Topics and Messages sent to Kafka #28

VincenzoFerme commented Dec 10, 2015

Cerfoglg commented Dec 11, 2015 •

edited

Loading

VincenzoFerme commented Dec 11, 2015

Cerfoglg commented Dec 15, 2015

VincenzoFerme commented Dec 15, 2015 •

edited

Loading

Cerfoglg commented Dec 15, 2015

VincenzoFerme commented Dec 15, 2015

Cerfoglg commented Dec 15, 2015

Cerfoglg commented Jan 5, 2016 •

edited

Loading

VincenzoFerme commented Feb 25, 2016

VincenzoFerme commented May 19, 2016

Cerfoglg commented May 27, 2016 •

edited by VincenzoFerme

Loading

Cerfoglg commented Jul 11, 2016

Cerfoglg commented Jul 11, 2016

VincenzoFerme commented Aug 26, 2016

Define the Topics and Messages sent to Kafka #28

Define the Topics and Messages sent to Kafka #28

Comments

VincenzoFerme commented Dec 10, 2015

Cerfoglg commented Dec 11, 2015 • edited Loading

VincenzoFerme commented Dec 11, 2015

Cerfoglg commented Dec 15, 2015

VincenzoFerme commented Dec 15, 2015 • edited Loading

Cerfoglg commented Dec 15, 2015

VincenzoFerme commented Dec 15, 2015

Cerfoglg commented Dec 15, 2015

Cerfoglg commented Jan 5, 2016 • edited Loading

VincenzoFerme commented Feb 25, 2016

VincenzoFerme commented May 19, 2016

Cerfoglg commented May 27, 2016 • edited by VincenzoFerme Loading

Cerfoglg commented Jul 11, 2016

Cerfoglg commented Jul 11, 2016

VincenzoFerme commented Aug 26, 2016

Cerfoglg commented Dec 11, 2015 •

edited

Loading

VincenzoFerme commented Dec 15, 2015 •

edited

Loading

Cerfoglg commented Jan 5, 2016 •

edited

Loading

Cerfoglg commented May 27, 2016 •

edited by VincenzoFerme

Loading