I wanted a way to quickly analyze my SSH access logs, as I saw quite a bit of traffic. Inspired by this (with this underyling GH repo) and this, I made some more tools to use in a notebook and turned them into a streamlit app.
Logs are read into a Protobuf object (SSHLogs
), with each IP address being sent to a GeoIP DB and returned metadata stored as IPLookupData
in each SSHLog
(each SSHLog
is stored as an element of SSHLogs
). This SSHLogs
object is transformed into a dataframe for more analysis.
Metadata associated with an IP address can be queried from online geo-coding services or from geolite2 DBs. This project uses an older, self-contained geolite2 package. You could easily modify this example to use a more accurate geoIP DB by modifying this function that maps an IP address to an IPLookupData
object.
Since the most attacks came from US, you may want to better understand where in the US attacks are coming from. Getting FIPS codes from lat/lon requires the FCC Census API, which can take a very long time to process thousands of IP-lookup requests. Thus, you may want to cache things - this has already been done for the included datasets (data/*df_us.csv
).
I use poetry for dep management and install. You can install it easily by following these intstructions.
- If you plan to use VS Code for running jupyter notebooks you will need set the venv location to the project directory to make it easy for VS Code to find the venv.
- If you plan to use Jupyter for running jupyter notebooks, follow this guide if you want to be able to run the
pyDeck
visualizations.
Once you have poetry installed and on your path, install via:
git clone ~/git/ssh_analysis
cd ~/git/ssh_analysis
# ONLY IF USING VS Code - changing where venvs are stored for VSCode
poetry config virtualenvs.in-project true
poetry install
If you wish to re-compile the python protobuf definitions, you will also need to install the protobuf compiler
Before you can run the analysis, you need to prep your SSH log data. You can do this via:
ll /var/log/auth.*
sudo gzip -d /var/log/auth.log.*.gz
sudo cat /var/log/auth.* > ~/Desktop/my_auth.log
You should now have all of your /var/log/auth.log
concatenated into one file, ~/Desktop/my_auth.log
. You should now move this file to the root of the ssh_analysis
directory. Assuming you cloned ssh_analysis
to ~/git
:
mv ~/Desktop/my_auth.log ~/git/ssh_analysis/data/my_auth.log
The package powers a streamlit app which can be easily run via:
streamlit run streamlit_app/SSH_Analysis_Home.py
To run the example, all you need to do is open the notebook ./ssh_analysis/auth_log_analysis.ipynb
and run it with the python kernel you installed earlier. This can be done easily with Jupyter Lab or VS Code.
cd ~/git/ssh_analysis
poetry shell
jupyter lab