This project was conceived to connect to gmail, fetch mails and parse their contents to excel. Although initially indended to parse alert and notification emails other use cases could also be fullfilled via utils.
Inorder for the program to access your gmail messages you will need to give permissions as described in the Documentation.
- Create a google cloud account at https://console.cloud.google.com
- Create a new project (I called mine Python-Gmail-Worker-24), see here
- Enable the Gmail API for the project see link
- Configure the OAuth consent screen, goto Menu > APIs & Services > OAuth consent screen. Complete the app registration and save.
- Generate a credentials file, goto Menu > APIs & services > Credentials
- Click Create Credentials > OAuth client ID
- Click Application type > Desktop app
- enter a name and click Create
- download the credentials file to ./src/main/resources/credentials.json
The project is driven mainly by the properties file found at: ./src/main/resources/configuration.properties
.
The file contents look as follows:
writeMessagesToFile=false
fromEmailFilter=LinkedIn Job Alerts <jobalerts-noreply@linkedin.com>
subjectEmailFilter=
messageSearchQuery=from:jobalerts-noreply@linkedin.com newer_than:1d
messageSearchQueryLimit=50
- writeMessagesToFile - flag to control whether to output the message body to a file called: messages.txt
- fromEmailFilter - secondary msg filter to filter mails by sender string
- subjectEmailFilter - secondary msg filter to filter mails by subject string
- messageSearchQuery - primary query string, allows you to control the messages to fetch. Uses the gmail search query syntax
- messageSearchQueryLimit - max msg fetch limit. Msg fetch limit to help prevent the breach of the api daily quota.
In order to build the project you will need Java 21 & Gradle 8 on your system.
gradle clean build
To run the main LinkedIn alerts parser run:
gradle -x test run
The program can also be run from the jar file as follows:
java -jar ./build/libs/gmail-worker-client-1.0.jar
To list the labels defined in Gmail run (this is just a basic command to confirm successful setup):
gradle -x test run --args="labels"
To simply list the emails run:
gradle -x test run --args="list"
When the program is first run a url will be output to the console. Click on this url to open it within a browser. Then select you google account and select continue on the following dialog windows. Once done an access token will be returned to the running process allowing it to continue (press return in the console if it does not move on).
The token will expire after around a week, when this happens the program will exit with an error. In this scenario it is best to remvoe the expired token and then re-run the program which will output the url for you to go to.
rm tokens/StoredCredential
The project can be build and run with docker, to build it run the following command from the project folder:
docker build -t klairtech/gmail-parser .
To run the program:
docker run -it --rm -P klairtech/gmail-parser
The project contains a makefile that has all major targets to setup the project to running the program.
Target | Description |
---|---|
all | runs the program to fetch and export emails to csv |
build | builds the java project and runs the tests |
removetoken | clears the expired gmail access token |
initpython | setups the python environment and installs dependencies |
clean | remove the log files |
The project contains a python script to remove duplicate entries from generated csv files. I chose Python to implement this functionality over Java because I wanted to make use of the excellent library I was familiar with: Pandas. To use this script first install and setup the environment:
cd ./src/main/python
python -m venv env
source env/bin/activate
pip install -r ./requirements.txt
To run the script, you must give it the source csv file and the script will output the results to a file of the same name but with a -filtered
suffix.
cd ./src/main/python
source env/bin/activate
python ./filter.py ../../../linkedInAlerts-`date '+%d%m%y'`.csv
You can easily extend the functionality by adding your own parser and targetting whatever subset of emails you wish. Simply extend the following interfaces:
src/main/java/parser/MessageParser.java
src/main/java/parser/CSVRecord.java
The parser should have a no args contructor and you can add your own functionality with the parse method. The parser returns a list of objects from a single message that hold the data. This design was to handle a linkedIn job alert email that has many jobs, which is then returned as a list of parsed jobs. The CSVRecord interface allows the parsed record to be output to a csv file. The parser has a method to determine the name of the output file.