This repository contains the basic tooling I used to keep track of my writing progress for my PhD thesis, when I started focusing on write-up in 2023.
I am not the first person to have this idea, and this letter explains the how and why.
I found another tool which does something similar, however the setup seemed a bit complex for something that I could easily do myself, especially if I wanted to hack and add extra parts.
Thus, this repository was born. Feel free to use it as is, or adapt for your own needs, it's under the MIT license. I've build it under the assumption that you're using a Linux-like environment, with Python.
The system is enabled by other free open tools:
- opendetex: for getting approximate plaintext from a LaTeX project.
- GitPython: for automatically pulling new project updates.
- Add word count
- Add conservative logging (only log on update)
- Add automatic git pull
- Add automatic pdflatex build
- Extract page count
- Extract number of references
- Extract number of figures
- Add handling of stuck or crashing latex compilation
- Add automatic processing of log files into a CSV for plotting
- Give more info with breakdown (e.g., number of sections/chapters, words per section/chapter)
Features I like about my system:
- will only save data if there has been a change since the last update
- some of the code was written by
code-davinci-002
(e.g.get_timestamp()
andget_most_recent_file()
): why should easily verified non-critical path code be written by a human? - the method to count instances of TeX commands (using
process_project()
), is pretty extensible
Features I don't like about my system:
- It relies on using non-Python shell packages, called via
subprocess
, which has some issues- Is not very "pythonic", and less portable
- Does not take advantage of Python's package manager, and I'm not pinning versions
- Thus, outputs and invocation of the tools used may change and the tool will throw errors
- The documentation is still pretty spotty
Why are you even using Python for this?
Sure, maybe a shell scripting language like bash makes more sense for this sort of thing, but Python is my daily driver, and I want to minimise the time I spend working with this system.
Setup opendetex, which is used to get a word count:
git clone https://github.com/pkubowicz/opendetex
sudo apt-get install make gcc flex
cd third_party/opendetex
make
sudo make install
The code assumes your LaTeX project is a git repo (which if it is not, it probably should be). If you are using Overleaf, you can clone using git in the settings in the menu on the left.
For this tool to be automated, you should ensure that your git credentials for your repo are stashed, so you do not need to enter them every time. If using Overleaf or HTTPS, you could run from within your thesis directory:
git config --local credential.helper store
And then run git pull
and enter your details one more time.
However, there are some security implications to doing this, as it will store your credentials in clear text in a local file (.git-credentials
) under your project directory.
If you are using GitHub, GitLab, or similar, you could create a deployment key, which can mitigate a lot of the security risks.
Alternatively, you can leverage a credential manager to store things, see this StackOverflow post.
Some extensions may require to compile your document with LaTeX to extract useful info (e.g., number of pages). If you're using Overleaf, then it may be irritating to set it up to be exactly the same compilation pipeline on the machine you're running this tool. If you accept that your numbers will be an estimate, you can save yourself some headaches.
TeX-live can be pretty large, so you may want to install a stripped down version (~200MiB).
apt-get install texlive-latex-base
You can check if you need any other packages by running pdflatex $MAIN_TEX_FILE
and noting what failed.
For me I had to install texlive-fonts-recommended
too, and then after chasing another missing package for a minute realised that I was fine using 6GiB with the full texlive-full
package if it meant I could get back to work.
For a basic run, the script can be invoked with:
python3 thesis-o-meter.py \
--git_dir $YOUR_THESIS_DIR \
--main_tex $YOUR_MAIN_FILE \
--log_dir $YOUR_CHOSEN_LOG_DIR
However, since this is about longer term measurement, you will want probably want to have it run as a cronjob (i.e., automatically, periodically).
Run crontab -e
, and add the invocation to the bash script, for example to run the script every hour.
0 */1 * * * bash $SOME_DIR/crontab/$SOME_SCRIPT
- opendetex: approximate word count from LaTeX