word2grattex
is an R
packaged deisgned to make Word to LaTeX conversion faster and more accurate. It is relatively flexible but is most effective when used within the wider Grattan report production ecosystem.
Importantly: because of unpredictable (and usually accidental) Word formatting choices, this tool is certainly more of an art than it is a science. For example, there may be loose brackets or braces in your post-conversion document that you'll need to tidy yourself (sorry!).
There may also be incorrect conversions of references when using bib2grattex
(via word2grattex(bibReplace = TRUE)
, the default), because of 'Not Quite Twins' see here.
That being said: it will do most of the work for you, and the problems it can create are -- in 25 full conversions and counting -- fewer than the problems it solves.
It requires:
R
andpandoc
(which usually ships withR
, otherwise download here)- A Word document in the Grattan report template.
- Eg Mapping Australian higher education 2018.docx
It can do the most if it also has:
- The
.bib
file associated with the report.- Eg mapping2018.bib.
- In-text citations in Word, formatted as The Style of Quiet Achievers or Harvard Reference Format 1.
- Eg Norton & Cherastidtham (2018);
- or (Norton & Cherastidtham 2018).
- The
.pdf
chart deck in order of appearance in the report.- Eg mapping2018.pdf
- Note that if this is missing, the figure environments will default to calling chartdeck.pdf, which can easily be manually fixed using find-replace.
- Linked or manual cross-references (not broken).
- Eg See figure 14 in section 3.4.
Before you use the function, follow the steps described under Starting a new report on the Grattex
homepage.
Once you have created a repository for your new report, add your .docx
, .bib
and .pdf
files. Your repository should look something like this, using the Mapping Australian higher education 2018 example:
./he-mapping-2018/
- atlas
- bib
- doc
- logos
- tests
- travis
- Report.tex
- mapping2018.docx
- mapping2018.bib
- mapping2018.pdf
etc
Note that you can run the function before you set up a report repository. Just point the word2grattex
function to any folder that contains your .docx
(and .bib
, .pdf
) files.
The package is run through R
and can be installed using remotes::github_install
:
# Install devtools if you haven't already (remove the comment):
# install.packages(remotes)
remotes::install_github("wfmackey/word2grattex")
library(word2grattex)
The package contains two large functions: word2grattex
and bib2grattex
. Note that, by default, word2grattex
runs bib2grattex
. To run all features described in the what it does section below, point word2grattex
to a folder that contains a .docx
document and a .bib
file (and, preferably, a .pdf
file containing charts. If this is absent, figure environments will be built using the default chartpack.pdf
).
For our Mapping Australian higher education 2018 example, we can then run:
word2grattex(path = "Dropbox (Grattan Institute)/Apps/Overleaf/mapping2018")
Note: if you do not have a bib file, set bibReplace = F:
word2grattex(path = "Dropbox (Grattan Institute)/Apps/Overleaf/mapping2018",
bibReplace = F)
Also, there is a yet-to-be-addressed issue with Tables that pops up. If you receive an error about tables, please set buildTables = F
:
word2grattex(path = "Dropbox (Grattan Institute)/Apps/Overleaf/mapping2018",
bibReplace = F, buildTables = F)
This will produce a .tex
file.
If there are any issues, please get in touch with Will.
The function takes a Word document in the Grattan template and:
- Converts it to LaTeX using
pandoc
. - Adds the current LaTeX preamble of the Grattan report template.
- Cleans up after the
pandoc
conversion:- removes
\protect
,\hyperlink
,\hypertarget
,\texorpdfstring
; - removes empty headings.
- removes
- Adds graphics where an image was found in the Word document:
- builds Figure environment;
- converts Word metadata to LaTeX
\caption
,\unit
,\noteswithsource
(or\notes
/\source
only, or\noteswithsources
, etc); - creates
\insertgraphics
for a standard Grattan chart size and inserts the nth page of the name of your PDF chart deck, or defaults to the nth page ofchartdeck.pdf
if a PDF is not provided (this can be quickly fixed afterwards with find/replace); - labels charts with
fig:figure-caption
.
- Builds Table environments.
- It doesn't build your tables sorry :(.
- But, look, it has a proper crack at it. It will make the rows (kind of) but will use a
\longtable
format by default. i.e. it does a bit. The table will be commented out for compiling convenience, but if it's an easy table it might work (kind of) out-of-the-box. - See the
kableExtra
package in R, orExcel2LaTeX
Excel plugin for assistance.
- Applies appropriate labels to chapters, sections, subsections, etc.:
chap:
,sec:
,subsec:
, etc.
- Replaces cross-references with appropriate labels.
- "See Section 2.2" →
See \Cref{subsec:section-name}
.
- "See Section 2.2" →
- Replaces figure-references with appropriate labels.
- "Figure 19 shows" →
\Cref{fig:figure-caption} shows
.
- "Figure 19 shows" →
- Replaces in-text citations with appropriate
cite
commands.- Uses
bib2grattex
function. .bib
keys are automatically generated in the formatAuthorYearTitle
(default is to cap the title to 20 characters).- Handles (all-but-one) citation complications:
- 'Norton et al. (2018a).' →
\footcite{Norton2018droppingoutthecostsa}
- 'See discussion in Terrill (2018), p. 10.' →
\footnote{See discussion in \textcite[][10]{Terrill2018unfreezingdiscountra}.}
. - 'Daley and Wood (2016), chapter 1; Smith (1776e), chapter 3.' →
\footcites[][chapter~3]{Daley2018hotproperty}[][chapter~3]{Smith1776thewealthofnat}.}
.
- Uses
Note: as our Style of Quiet Achievers fails to distinguish between 'not-quite twins', the bib2grattex
conversion can't tell the difference. A solution to this problem is being considered. For now, manual identification of not-quite twins is required. Mainly: check your Daley et al. references.
When word2grattex
is finished, it will produce a .tex
file that can be built out of the box. Some things need to be done manually:
- Check for errors.
- Add Tables.
- Add Box environments.
- This feature can't be added because there is no way to tell when a box ends.
- Add Overview and Recommendations. Update Acknowledgements, ISBN, report number, FrontPage.
- Optimise figure placement.
- Use
FigurePlacementScore
to help.
- Use
- Proofread lots.
Then, run through checkGrattan
/Travis
and release.
This is a work-in-progress. You will notice errors or think something could be improved (these ideas usually come when you're doing something repetively after the conversion and think "heck I wish this could be done automatically").
If you do, please get in touch. This can be done by raising an issue on the word2grattex
Github page (my preference -- it helps keep everything in one place!), or by emailing william.mackey@grattaninstitute.edu.au.