Loggraph

Automatic generation of graphs from log files

Consider the use case of analyzing HDFS logs. Apache HDFS is a distributed file system designed to handle large data. A distributed system running algorithms on Hadoop with HDFS has an inherent graph structure produced by the block (data) transfers between nodes. For example, the first line in the figure above has source (src) and destination (dest) IP/port. A python script scans each line for src and dest in the line and is considered as an edge from source to destination. Multiple occurences of an edge in the log file will result in a higher edge weights.

PCAP graphs:

As another example, consider pcap logs. Pcap can capture network traffic and save it as a .pcap file. The IP Layer captured by PCAP can be used to create a meaningful graph of source IPs and destination IPs. Like in HDFS logs, we assign higher edge weights for multiple occurences of edges.

Graph from any arbitrary log file:

To generate a random log for testing, we can leverage Python's Logging facility. For example, the figure above shows a snippet of a random log file. This log file documents the day, time, user, type of error and IPs in question. A random log file might use any delimiter, but certain ones are more common

Method:

We desire an approach that works on all of the above types of logs. We developed a tool that allows the user to specify which columns correspond to the nodes (and the infers edges). While this of course yields good results, it is a far cry from our goal of automation. We therefore also developed a rules-based approach. This approach first predicts what delimiter a log file is using by checking for delimiters ',' ' ' '\t' ';' in that order. After the delimiter is predicted, we look for matching columns as follows, leveraging the fact that all nodes must be of the same “type.” We compare the set of values in each column with the set of values in every other column, and look whether a set intersection between the columns will results in an intersection greater than a pre-selected threshold. A set intersection is performed by finding unique elements that are common in the two columns. If the intersection is higher than the threshold, we select the columns as nodes to generate a Giraph graph from it. We do this for each combination of column that has a set intersection size higher than the threshold. In testing, this succeeded in perfectly handling all log files described above, including random log files generated for IP addresses, call records, and more. It is packaged it as a container.

How to run

Build docker container

docker build -t logtogiraph .

Enter into the docker container

docker run -it --entrypoint="/bin/bash" logtogiraph

Generating Giraph graph from HDF5 logs

python3 hdfs-graph.py <hdf5_log_file>

Generating a random log file

cd random-log
python3 logexample.py

Generating Giraph graphs from log files

cd random-log
python3 log-graph.py <random_log_filename>

The graphs will be stored in json files numbered from 0-n e.g. 0.json, 1.json, 2.json.. etc.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
random-log		random-log
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
hdfs-graph.py		hdfs-graph.py
pcap-graph.py		pcap-graph.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loggraph

Table of contents

Background

Giraph graphs:

Graphs from HDFS logs:

PCAP graphs:

Graph from any arbitrary log file:

Method:

How to run

Build docker container

Enter into the docker container

Generating Giraph graph from HDF5 logs

Generating a random log file

Generating Giraph graphs from log files

About

Releases

Packages

Languages

License

onai/loggraph

Folders and files

Latest commit

History

Repository files navigation

Loggraph

Table of contents

Background

Giraph graphs:

Graphs from HDFS logs:

PCAP graphs:

Graph from any arbitrary log file:

Method:

How to run

Build docker container

Enter into the docker container

Generating Giraph graph from HDF5 logs

Generating a random log file

Generating Giraph graphs from log files

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages