Spark Version 2.3.0 and Scala Version 2.11 is used for the development of this project.
- Logs Generator:
src/main/scala/com/whiletruecurious/analysis/LogAnalysis.scala
- Logs Aggregator:
src/main/scala/com/whiletruecurious/generate/LogsGenerator.scala
- Generated Logs:
src/main/resources/logs/
- Aggregated Logs:
src/main/resources/output/
Columns:
- user_id -> Unique Id for each user.
- timestamp -> Timestamp in epoch (milliseconds)
- session_id -> Unique session Id for each session. Whenever there is a time interval of greater than 4 hours; the new session is started.
- session_count -> Contains the count of unique session ids in the sliding window (20 days)
- engagement_tag -> {upu, pu, eu, au} depending on the
session_count
.