LogProcessing Map-Reduce

LogProcessing Map-Reduce is a Collection of Map Reduce program to process Log Files and extract information. This project can be configured to work with Log Files of multiple types without any changes to the Code base, and just by modifying the applications.config file.

Author: Lakshmanan Meiyappan
Email: lmeiya2@uic.edu

Dependencies

Scala 3.0.2
SBT 1.5.2
hadoop-core 1.2.1
slf4j-api 2.0.0
typesafe config 1.4.1

Functionality

LogProcessing Map-Reduce comprises Four Map-Reduce Tasks.

LogLevel Frequency: Compute Count for each Log Level across all input files.
Most Error in TimeInterval: Find Time Intervals with most errors, results in descending order.
Longest Substring matching Regex: Length of Longest Substring that matches a Regular Expression.
LogLevel Frequency Distribution in TimeIntervals: LogLevel distributions in specified TimeIntervals.

Users can inject Regex pattern in the Config files, and the Map-Reduce jobs will search for the pattern in the LogFiles and produce results for the requested pattern.

See How to Run LogProcessing MapReduce section for more instructions on how to execute this program.

Documentation and Demo Video

Please find the Documentation of this Project hosted in Github pages here: LogProcessing Documentation

Demo and Walk-through Video:

Running LogProcessing Map-Reduce on AWS EMR

Report and Results

Detailed Report and results after executing the Map Reduce task can be found here: LogProcessing MapReduce Report.

How to Run LogProcessing Map-Reduce?

Creating LogProcessing-MapReduce-assembly-0.1.jar

Step 1: Clone the Project

git clone https://github.com/laxmena/LogProcessing-MapReduce.git
cd LogProcessing-MapReduce

Step 2: Generate JAR File

sbt clean compile 
sbt test
sbt assembly

This command will generate a jar file in target/scala-3.0.2/LogProcessing-MapReduce-assembly-0.1.jar

How to Execute the MapReduce Jobs?

For detailed step-by-step guide on how to execute LogProcessing-MapReduce jobs on AWS or Hortonworks Sandbox, refer this guide: Deploying on AWS/Hortonworks Guide

Connect to the remote hadoop master using putty or command line.
Transfer the Input Log Files and JAR file to remote machine. Copy the input files to the HDFS directory. (See Commands 1, 3 and 4 in Useful commands section below.)

Run the Hadoop map reduce job by executing the following command:

hadoop jar LogProcessing-MapReduce-assembly [input-path] [output-path] [job-key] [pattern-key]

On successful completion of Map-Reduce task, the results will be generated in the [output-path]. See commands 5 and 6 in Useful commands section below to read the output.

List of available [job-key] and its associated Map-Reduce task:

job-key	Map-Reduce Task	Supports Regex Search?
log-frequency	LogLevel Frequency	✘
most-error	Most Error in TimeInterval	✔
longest-regex	Longest Substring matching Regex	✔
log-freq-dist	LogLevel Distribution in TimeIntervals	✔

List of available pattern-key by Default:

key	pattern	Description
pattern1	.*	(Default) Matches Entire String
pattern2	\([^)\\n]*\)	String enclosed within Parantheses
pattern3	[^\\s]+	String without any spaces
pattern4	[\d]+	Consecutive Numbers
pattern5	([a-c][e-g][0-3] or [A-Z][5-9][f-w]){5,15}	Pattern1 or Pattern2 should repeat between 5 to 15 times, inclusive

Different combinations of job-key and pattern-key can be used to execute Map-Reduce tasks.

Examples:

hadoop jar LogProcessing-MapReduce-assembly-0.1.jar logprocess/input logprocess/longest-regex-1 longest-regex pattern1
hadoop jar LogProcessing-MapReduce-assembly-0.1.jar logprocess/input logprocess/log-freq-dist-3 log-freq-dist pattern3
hadoop jar LogProcessing-MapReduce-assembly-0.1.jar logprocess/input logprocess/logfrequency
hadoop jar LogProcessing-MapReduce-assembly-0.1.jar logprocess/input logprocess/mosterror most-error pattern5

Useful commands:

Transfer file from Local Machine to a Remote machine

scp -P 2222 <path/to/local/file> <username@remote_machine_ip>:<path/to/save/files>

Transfer directory from Local Machine to Remote machine

scp -P 2222 -r <path/to/local/directory> <username@remote_machine_ip>:<path/to/save/files>

Create HDFS Directory
```
hadoop fs -mkdir <directory_name>
```

Add Files to HDFS

hadoop fs -put <path/to/files> <hdfs/directory/path>

Reading Hadoop Map-Reduce Output

hadoop fs -cat <hdfs/output/directory>/*

Save Hadoop Map-Reduce output to Local file

hadoop fs -cat <hdfs/output/directory>/* > filename.extension

Running JAR with multiple main classes

hadoop jar <name-of-jar> <full-class-name> <input-hdfs-directory> <output-hdfs-directory>

List files in HDFS Directory

 hdfs dfs -ls
 hdfs dfs -ls <directory/path>

Remove file or directory in HDFS

hdfs dfs -rm -r <path/to/directory>
hdfs dfs -rm <path/to/file>

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
project		project
report		report
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
feedback.txt		feedback.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogProcessing Map-Reduce

Dependencies

Functionality

Documentation and Demo Video

Report and Results

How to Run LogProcessing Map-Reduce?

Creating LogProcessing-MapReduce-assembly-0.1.jar

Step 1: Clone the Project

Step 2: Generate JAR File

How to Execute the MapReduce Jobs?

Useful commands:

About

Releases

Packages

Contributors 2

Languages

License

laxmena/LogProcessing-MapReduce

Folders and files

Latest commit

History

Repository files navigation

LogProcessing Map-Reduce

Dependencies

Functionality

Documentation and Demo Video

Report and Results

How to Run LogProcessing Map-Reduce?

Creating LogProcessing-MapReduce-assembly-0.1.jar

Step 1: Clone the Project

Step 2: Generate JAR File

How to Execute the MapReduce Jobs?

Useful commands:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages