Performance and Scale team Red Hat
This tool is created for getting all the data related to a consumer-id and for analysing data in Satellite logs.
We can print all the data related to all the consumer-ids using:
python3 main.py --all
Or print data related to a particular consumer-id:
python3 main.py --consumer-id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
To print
python3 main.py --analyse
Or to save as JSON in analyse.json
python3 main.py --analyse json
python3 main.py --trace
python3 main.py --all 2018-07-02T03:32:27 2018-07-13T16:13:36
python3 main.py --consumer-id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 2018-07-02T03:32:27 2018-07-13T16:13:36
python3 main.py --analyse 2018-07-02T03:32:27 2018-07-13T16:13:36
python3 main.py --trace 2018-07-02T03:32:27 2018-07-13T16:13:36
py.test Tests.py
Install ElasticSearch API of Python
pip install elasticsearch
Install tqdm for progress bar
pip install tqdm
Tools can be used by setting up few things:
ElasticSearch is needed with log data indexed in it. It could be done by using this example or follow installation guide
Filebeat is used to read data from log files, rather than reading directly from a file. Data that is read by Filebeat is passed to Logstash, then to Elasticsearch where it gets indexed into JSON.
Elasticsearch reads the indexed data using elasticsearch-py API. The search request is limited to reading a maximum of 10,000 lines at a time. To overcome this limit, the scroll API is used to retrieve large numbers of results from a single search request. Consumer-id tool is configured to scroll 10,000 lines per scroll for 10 minutes.
The flow of data is from a consumer-id in production.log to candlepin.log of Satellite logs.
Here, In this image we can see production.log file of Satellite logs. A consumer-id is highlighted in this image which is 36 characters long in the form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
This consumer-id is then extracted and we look for the lines related to this consumer id in candlepin.log.
Here, In this image we can see candlepin.log file of Satellite logs. This file consists of logs related to consumer-id. Every log is having a csid. A csid is highlighted in this image which is 8 characters long in the form xxxxxxxx.
After finding lines with csid, we are finding all the csid and grouping all the log lines related to a particular log together.
This tool is related to fetching particular data such as ActiveRecord, Views, totaltime, ID, etc from production.log indexed data in ElasticSearch. We are getting data directly from ElasticSearch instead of a log file.
After getting all the required data, we are formatting it in JSON format so as to be later used in a script to create new index in ElasticSearch which is later displayed in the form of visualizations on Kibana. production_es.json consists of the output JSON form data.
After getting all the data in json format we are creating a new index in ElasticSearch which can be seen on Kibana.
Able to display the trace records for stack traces that may appear in production.log. It is able to handle multiline logs also. Warnings and Errors are multiline logs, Filebeat or Logstash can not read multiline logs as a single log message. They split a single multiline message into different messages.
### Multiline options
# Mutiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
multiline.pattern: '^\/'
# Defines if the pattern set under pattern should be negated or not. Default is false.
multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
multiline.match: after
input {
stdin {
codec => multiline {
pattern => "pattern, a regexp"
negate => "true" or "false"
what => "previous" or "next"
}
}
}
ritwik12/Satellite-Log-Data-Analysis is licensed under the GNU General Public License v3.0
Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.
- The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.
- Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.