This coding challenge asks us to read the log file line by line and by using a predefined inactivity time to determine how long a session is and how many documents were accessed by the user (by their IP address) in that session.
Since we are not allowed to read the whole file at once, I read in one line at a time. After getting the indexes of ip, date, time, I then used those indexes to get the corresponding values from each line. I decided to use a dictionary to hold all the information.
The algorithm is very simple:
- Check if ip is already present in the dictionary, if not, add a new entry
- In the meantime, check to see if the specified inactivity time has passed for all entries
- If yes, write those to output and delete those entries
- If ip is already present, then compare time difference to the length of inactivity time
- If less or equal, then increment the document count by 1 since it's considered one session
- If more, then write the previous entry to output, delete it, and enter the new session
- When reaching EOF, simply write all remaining entries into the output as end of all sessions
Simply execute run.sh at terminal and output file will be produced in the output folder.