Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA] Auto adjust table fences before PDF conversion #560

Merged
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
241d6ce
Update version and date in header (from pulling latest version of ups…
sebastientourbier Jul 31, 2020
5321d6a
Updated my contributions
sebastientourbier Jul 31, 2020
a68ed8d
Update version and date in header
sebastientourbier Aug 4, 2020
0ddd609
Added functions to correct tables before pdf generation
sebastientourbier Aug 4, 2020
89fe1ab
DOC: update docstring
sebastientourbier Aug 4, 2020
b22a65e
FIX: correct most all tables automatically (number of dashes and alig…
sebastientourbier Aug 4, 2020
d17fb62
FIX: install numpy in build_docs_pdf job
sebastientourbier Aug 5, 2020
4bc42ed
MAINT: code refactoring
sebastientourbier Aug 5, 2020
66f1fc8
DOC: updated docstring and comments
sebastientourbier Aug 5, 2020
d70d48a
MAINT: code refactoring
sebastientourbier Aug 5, 2020
d981443
FIX: attempt to fix installation of numpy in circleci
sebastientourbier Aug 5, 2020
fec1670
FIX: 3rd attempt to fix installation of numpy in circleci
sebastientourbier Aug 5, 2020
dc6c4a6
An other attempt to install numpy
sebastientourbier Aug 5, 2020
f75568e
An other attempt to install numpy via pip3 by upgrading the texlive d…
sebastientourbier Aug 5, 2020
5613e03
An other attempt to install numpy via pip3 by upgrading the texlive d…
sebastientourbier Aug 5, 2020
efd9417
FIX: add offset to second column as an attempt ot fix table p28
sebastientourbier Aug 5, 2020
ca985ea
FIX: tables with `` such as `contrast_label` p.29
sebastientourbier Aug 5, 2020
8783a0a
MAINT: Applied suggestions from PR review
sebastientourbier Aug 5, 2020
3a34542
FIX: better handles :--: delimiters
sebastientourbier Aug 5, 2020
89dc17f
MAINT: apply suggestions from review of docstrings
sebastientourbier Aug 5, 2020
6d942ae
FIX: detection of tables
sebastientourbier Aug 5, 2020
acb2f70
FIX: review offset and correction of first column width
sebastientourbier Aug 5, 2020
0546d63
MAINT: added blank new line
sebastientourbier Aug 5, 2020
738d293
MAINT: reviewed blank line at end of file
sebastientourbier Aug 5, 2020
01efbd3
MAINT: should revert the cover.tex and header.tex as they are on master
sebastientourbier Aug 5, 2020
1d4c529
Update pdf_build_src/process_markdowns.py
sebastientourbier Aug 5, 2020
d57c896
Update pdf_build_src/process_markdowns.py
sebastientourbier Aug 5, 2020
3b509ec
Update pdf_build_src/process_markdowns.py
sebastientourbier Aug 5, 2020
2bca21d
Update pdf_build_src/process_markdowns.py
sebastientourbier Aug 5, 2020
da58081
Update pdf_build_src/process_markdowns.py
sebastientourbier Aug 5, 2020
a169303
Update pdf_build_src/process_markdowns.py
sebastientourbier Aug 5, 2020
dd6545e
MAINT: refactored code as suggested and include the entity table
sebastientourbier Aug 7, 2020
0bcba77
MAINT: code and prints cleaning
sebastientourbier Aug 7, 2020
5d0aad7
MAINT: simplify rule to correct the tables and get rid of NB_CHARS_LI…
sebastientourbier Aug 7, 2020
202619d
MAINT: missing changed lines related to previous commit
sebastientourbier Aug 7, 2020
01864d1
MAINT: add comment
sebastientourbier Aug 7, 2020
e5987a9
MAINT: add debug bool parameter to _contains_table_start()
sebastientourbier Aug 7, 2020
1d50c1e
MAINT: updated contributor list
sebastientourbier Aug 7, 2020
55d1f71
Fix spaces around ``=``
sebastientourbier Aug 10, 2020
cab3b00
Fix spaces around ``=``
sebastientourbier Aug 10, 2020
1dbbab3
Fix typo in docstring
sebastientourbier Aug 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,13 @@ jobs:
build_docs_pdf:
working_directory: ~/bids-specification/pdf_build_src
docker:
- image: danteev/texlive:TL2017
- image: danteev/texlive:latest
steps:
- checkout:
path: ~/bids-specification
- run:
command: |
pip3 install numpy
- run:
name: generate pdf version docs
command: sh build_pdf.sh
Expand Down
5 changes: 2 additions & 3 deletions pdf_build_src/pandoc_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
"""
import os
import subprocess



def build_pdf(filename):
"""Construct command with required pandoc flags and run using subprocess.

Expand Down Expand Up @@ -40,4 +39,4 @@ def build_pdf(filename):


if __name__ == "__main__":
build_pdf('bids-spec.pdf')
build_pdf('bids-spec.pdf')
194 changes: 193 additions & 1 deletion pdf_build_src/process_markdowns.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import subprocess
import re
from datetime import datetime
import numpy as np


def run_shell_cmd(command):
Expand Down Expand Up @@ -141,6 +142,194 @@ def modify_changelog():
file.writelines(data)


# Number of chars maximal in one line approximated from a line of the PDF
NB_CHARS_LINE_PDF = 100
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved

def correct_table(table, offset = [20, 80], debug=False):
"""Create the corrected table.

It computes the number of characters maximal in each column and reformat line to make sure
the first and second lines have enough dashes (in proportion) and fences anr correctly aligned
for correct rendering in the generated PDF.
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
table : List of List of str
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved
Table content extracted from the markdown file.
offset : [x, y]
Offset that can be used to ajust the correction of number of dashes in the first (x) and
second (y) columns by the number specified
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved
debug : Bool
If True, print debugging informations (By default: False)
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
new_table : List of List of str
List of corrected lines of the input table with corrected number of dashes and aligned fences.
To be later join with |'s
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved
"""

sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved
nb_of_rows = len(table)
nb_of_cols = len(table[0])

nb_of_chars = []
for i, row in enumerate(table):
# Ignore number of dashes in the count of characters
if i != 1:
nb_of_chars.append([len(elem) for elem in row])

# Convert the list to a numpy array and computes the maximum number of chars for each column
nb_of_chars_arr = np.array(nb_of_chars)
max_chars_in_cols = nb_of_chars_arr.max(axis=0)

# Computes number of dashes based on the maximal number of characters in each column
nb_of_dashes = max_chars_in_cols
prop_of_dashes = nb_of_dashes / nb_of_dashes.sum()
nb_of_chars_in_pdf = prop_of_dashes * int(NB_CHARS_LINE_PDF)

# Computes the corrected number of dashes. An offset can be used to extend
for i, (value, prop) in enumerate(zip(max_chars_in_cols,prop_of_dashes)):
# Correction for first column (Rules could be changed here for instance)
if i == 1:
if int(value) < int(NB_CHARS_LINE_PDF) and prop < 0.2 and max_chars_in_cols[2] > 2 * NB_CHARS_LINE_PDF:
first_column_width = int(nb_of_dashes.sum() * (value / int(NB_CHARS_LINE_PDF)) + 6 * offset[0])
elif int(value) < int(NB_CHARS_LINE_PDF) and prop < 0.2 and max_chars_in_cols[2] <= 2 * NB_CHARS_LINE_PDF:
first_column_width = int(nb_of_dashes.sum() * (value / int(NB_CHARS_LINE_PDF)) + offset[0])
else:
first_column_width = int(value)
# Correction for second column
elif i == 2:
if int(value) < int(NB_CHARS_LINE_PDF) and prop < 0.2:
second_column_width = int(nb_of_dashes.sum() * (value / int(NB_CHARS_LINE_PDF)) + offset[1])
else:
second_column_width = int(value)

if debug:
print(' - Number of chars in table cells: {}'.format(max_chars_in_cols))
print(' - Number of dashes (per column): {}'.format(nb_of_dashes))
print(' - Proportion of dashes (per column): {}'.format(prop_of_dashes))
print(' - Number of chars max in column (PDF): {}'.format(nb_of_chars_in_pdf))
print(' - Final number of chars in first column: {}'.format(first_column_width))
print(' - Final number of chars in second column: {}'.format(second_column_width))

# Format the lines with correct number of dashes or whitespaces and
# correct alignment of fences and populate the new table (A List of str)
new_table = []
for i, row in enumerate(table):

if i == 1:
str_format = ' {:-{align}{width}} '
else:
str_format = ' {:{align}{width}} '

row_content = []
for j, elem in enumerate(row):
# Set the column width
column_width = max_chars_in_cols[j]
if j == 1:
column_width = first_column_width
elif j == 2:
column_width = second_column_width

if j == 0 or j == len(row) - 1:
row_content.append(elem)
else:
if '`' in elem:
str_format = ' {:{align}{width}} '
row_content.append(str_format.format(elem, align='<', width=(column_width)))
elif '-:' in elem and ':-' in elem :
str_format = ' {:-{align}{width}}: '
row_content.append(str_format.format(':-', align='<', width=(column_width)))
elif not '-:' in elem and ':-' in elem :
str_format = ' {:-{align}{width}} '
row_content.append(str_format.format(':-', align='<', width=(column_width)))
elif '-:' in elem and not ':-' in elem :
str_format = ' {:-{align}{width}}: '
row_content.append(str_format.format('-', align='<', width=(column_width)))
elif i == 1 and not '-:' in elem and not ':-' in elem :
str_format = ' {:-{align}{width}} '
row_content.append(str_format.format('-', align='<', width=(column_width)))
else:
row_content.append(str_format.format(elem, align='<', width=(column_width)))
if debug:
print(row_content)

new_table.append(row_content)

return new_table


def correct_tables(root_path):
"""Change tables in markdown files for correct rendering in PDF.

This modification makes sure that the proportion and number of dashes (---) are
sufficiently enough for correct PDF rendering and fences (|) are corrected aligned.

Parameters
----------
root_path : str
Path to the root directory containing the markdown files

"""
markdown_list = []
for root, dirs, files in os.walk(root_path):
for file in files:
if file.endswith(".md") and file != 'index.md' and file != '01-contributors.md' and file != '04-entity-table.md':
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved
print('Check tables in {}'.format(os.path.join(root, file)))
markdown_list.append(os.path.join(root, file))
with open(os.path.join(root, file),'r') as f:
content = f.readlines()
tables = []
table_mode = False
start_line = 0
new_content = []
for line_nb, line in enumerate(content):
if line:
# Use dashes to detect where a table start and
# extract the header and the dashes lines
if '--' in line and '|' in line and not table_mode:
sebastientourbier marked this conversation as resolved.
Show resolved Hide resolved
table_mode = True
start_line = line_nb-1
print(' * Detected table starting line {}'.format(start_line))
table = []
header_row = [c.strip() for c in content[line_nb-1].split('|')]
row = [c.strip() for c in line.split('|')]
table.append(header_row)
table.append(row)
elif table_mode:
row = [c.strip() for c in line.split('|')]
# Add row to table if this is not the end of the table
if row != ['']:
table.append(row)
else:
end_line = line_nb-1
table_mode = False

# Correct the given table
table = correct_table(table, debug=True)

# Update the corresponding lines in
# the markdown with the corrected table
count = 0
for i, new_line in enumerate(content):
if i == start_line:
new_content.pop()
if i >= start_line and i < end_line:
new_content.append('|'.join(table[count])+' \n')
count += 1
elif i == end_line:
new_content.append('|'.join(table[count])+' \n\n')
count += 1
else:
new_content.append(line)

line_nb += 1

# Overwrite with the new markdown content
with open(os.path.join(root, file),'w') as f:
f.writelines(new_content)


def edit_titlepage():
"""Add title and version number of the specification to the titlepage."""
title, version_number, build_date = extract_header_string()
Expand Down Expand Up @@ -188,4 +377,7 @@ def edit_titlepage():

# Step 6: remove all internal links
remove_internal_links(duplicated_src_dir_path, 'cross')
remove_internal_links(duplicated_src_dir_path, 'same')
remove_internal_links(duplicated_src_dir_path, 'same')

# Step 7: correct number of dashes and fences alignment for rendering tables in PDF
correct_tables(duplicated_src_dir_path)
2 changes: 1 addition & 1 deletion src/99-appendices/01-contributors.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ your name is not listed, please add it.
- Nicole C. Swann 📖
- François Tadel 📖🔌💡
- Roberto Toro 🔧
- Sébastien Tourbier 🤔👀📢
- Sébastien Tourbier 🤔👀📢🐛📖
- William Triplett 📖
- Jessica A. Turner 📖
- Bradley Voytek 📖
Expand Down