Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf version of spec #135

Closed
sappelhoff opened this issue Jan 24, 2019 · 35 comments
Closed

pdf version of spec #135

sappelhoff opened this issue Jan 24, 2019 · 35 comments
Labels
enhancement New feature or request help wanted Extra attention is needed infrastructure

Comments

@sappelhoff
Copy link
Member

https://bids.neuroimaging.io/bids_spec.pdf does currently not link to the most recent version of the spec, which is as @KirstieJane pointed out 1.1.2

This should be updated and I think we should automate this in our release cycles.

@sappelhoff sappelhoff added the enhancement New feature or request label Jan 24, 2019
@franklin-feingold
Copy link
Collaborator

Hi @sappelhoff,

On the website under The BIDS Specification it does currently link out to the latest stable version of the specification on readthedocs. Previously the pdf spec was on the website at that location, but I am not aware of where the pdf link currently is promoted. May you please direct me to where it may be and I will update it?

@sappelhoff
Copy link
Member Author

Previously the pdf spec was on the website at that location, but I am not aware of where the pdf link currently is promoted.

right! I forgot about this. So currently we do not have a pdf spec. Correct?

@franklin-feingold
Copy link
Collaborator

That is right as far as I know. There is a nice plug-in that exports all the markdown files into pdfs and there is an option to combine them and set the path (default is the same directory as the md files). I can investigate and see how this works so we can get a built pdf into the repo

@sappelhoff
Copy link
Member Author

thanks @franklin-feingold, this sounds perhaps worth an experiment. A portable (i.e., not online and a single file) of the specification would perhaps be good to have.

I'll close this issue however, because its main purpose is "solved"

@KirstieJane
Copy link
Member

Thanks @sappelhoff & @franklin-feingold.

Just commenting that I had the link in the documentation for BEP001, it isn’t linked from the website at the moment (as Franklin said). No need to track down an incorrect link 🕵🏼‍♀️😁

@franklin-feingold
Copy link
Collaborator

I think for the portable (pdf) specification, it should be of the latest stable version (versus latest). I think we would want the portable version to be something that we support and being the latest stable version we support. This can be part of the release protocol. WDYT?

In the interest of keeping PRs to small manageable reviews and additions, I will wait until we have worked through #137 before adding in how the pdf generation will work. Though, in the meantime I will PR the latest stable version pdf (v1.1.2) before fleshing out that step in the release protocol (if you agree) so at least we have it in the repo.

@franklin-feingold
Copy link
Collaborator

this is how it will roughly look - https://github.com/franklin-feingold/bids-specification/blob/pdf_spac/BIDS-Specification_v1.1.2.pdf - still a WIP, I want to add a TOC and brief title to more resemble the previous pdfs.

@sappelhoff
Copy link
Member Author

sounds like a great plan @franklin-feingold thanks for tackling it.

this is how it will roughly look

looks good to me. Some points:

  1. Can we copy paste from the pdf?
  2. Can we have "links" in the pdf?
  3. Can we have page numbers?
  4. Can we get rid of "Documentation built with MkDocs..." on the second page?

@franklin-feingold
Copy link
Collaborator

To answer your point -

  1. We are able to copy paste from the pdf (i..e. into a web browser or or other word doc)
  2. The links that are present in the pdf are clickable (and direct to the URL listed)
  3. One is able to add page numbers post hoc online (like here
  4. I believe this is hard coded into the tool I am using (I will see if there is way around it)

Just a note - I am going to reopen this issue and broaden the name out to pdf version of the specification since this issue has broadened a bit and I think this issue is fine to track this pdf development

@franklin-feingold franklin-feingold changed the title pdf version of spec should point to most recent version pdf version of spec Jan 28, 2019
@franklin-feingold
Copy link
Collaborator

franklin-feingold commented Feb 1, 2019

small update - still WIP, but have another iteration of the pdf.

There are still 3 outstanding issues I see:

  1. Indenting in the TOC. This is rendered properly in my markdown files, but does render properly in the pdf. I have raised this issue in the pdf-exporter repo
  2. Removing the 'Documentation built with MkDocs'. Possible solution is being discussed in pdf-export repo (editing html's before pdf generation)
  3. The conversion is currently removing all the internal links to headers that are properly linked in the markdown file.

The pdf-exporter repo is https://github.com/zhaoterryy/mkdocs-pdf-export-plugin

I am also going to explore possible alternatives

@franklin-feingold franklin-feingold added the help wanted Extra attention is needed label Mar 8, 2019
@sappelhoff sappelhoff pinned this issue Mar 22, 2019
@franklin-feingold franklin-feingold removed their assignment Mar 22, 2019
@sappelhoff sappelhoff unpinned this issue Mar 22, 2019
@nicholst
Copy link
Collaborator

Hi folks, Just wanted to ping this issue.

I work with industrial collaborators, and they love BIDS: Every time a pharma company transfers data they require a Data Transfer Specification (DTS), now lots of the DTS text can be replaced with "We use BIDS". However, for record keeping / due diligence they want to be able include a fixed copy of the standard, i.e. a PDF.

Having viewed the PDFs, here are some minor/major things to consider:

  • It would be good if there was a footer or header that gave the current version of the Spec and its date.
  • In many places (e.g. Example or Template text), the text is super long (eg. see Example in 03-modality-agnostic-files.md), and in the PDF this runs right off the page. Digging around I found that a CSS tag of white-space : pre-wrap (see here and here) might fix this, but not sure this is compatible with RTD.

@franklin-feingold franklin-feingold pinned this issue Jul 22, 2019
@franklin-feingold franklin-feingold unpinned this issue Sep 25, 2019
@yarikoptic
Copy link
Collaborator

I wonder if meanwhile it is better to keep a PDF which gets more and more severity outdated, or add a redirect for https://bids.neuroimaging.io/bids_spec.pdf to just go to https://bids-specification.readthedocs.io ? or is PDF version is feasible to achieve in a short time? (it seemed to work, although the last version posted is no longer available for preview @franklin-feingold)

@nicholst
Copy link
Collaborator

Just wanted to newly +1 this... PDF is still need and important for industrial users and, e.g., other standards efforts that want a single fixed document to study.

@effigies
Copy link
Collaborator

I agree a fixed PDF is very useful, and this should be a priority.

The annoying thing is that RTD can usually do PDF builds alongside HTML builds. I guess this is a mkdocs vs Sphinx thing. Franklin found a plugin that seems to do what we want (though his link is now dead...), but I don't know if activating it on RTD will get us the PDF build for free, or if we would need to upload it somewhere else.

It wouldn't be too hard to have a gh-pages branch that just serves versioned PDFs, and add the newly rendered PDF with each tag.

@nicholst
Copy link
Collaborator

Indeed, a to-the-minute PDF is not what's important, but rather a PDF for each major release.

@effigies
Copy link
Collaborator

effigies commented Nov 6, 2019

Here is an example combined PDF generated by mkdocs-pdf-export-plugin: combined.pdf

It lacks a ToC, and any versioning information.

@effigies
Copy link
Collaborator

effigies commented Nov 6, 2019

Oh, nice. That looks better. Though I don't see a version...

@yarikoptic
Copy link
Collaborator

NICE! Thank you for looking back into it!
note: appendix IV: Entity table, the actual table is cut on the right -- since it is so wide. not sure what we could/should do about that. Partially (and actually fully for some other tables, e.g. search for one with AnatomicalLandmarkCoordinates) could be mitigated by making PDF version useful for reading, not for making notes on the sides -- so reducing all the margins. Now it is a bit too sparse since by default they seems to be quite generous.

Later might be worth enhancing PDF version with URLs for every section to make it easy to get to the updated version

PS glancing over PDF helped already to identify #361 -- there is a good value of having a "hard copy"! ;-)

@yarikoptic
Copy link
Collaborator

oh - long vertically tables seems to be cut as well :-( see the one for Units

@Arshitha
Copy link
Contributor

Hi everyone! I spent some time working on this issue. I wanted to explore a new package to convert from markdown to pdf. I went with pandoc after looking up sphinx and pandoc.

Here is the pdf I was able to generate:
bids-specs.pdf

Changes made:

  • ToC with proper indentation
  • Tables formatted to be within page margins
    • Had to increase the width of the columns of the pipe tables manually within the markdown files. It's was a one time edit, however, not sure if this might lead to problems while rendering them with RTD.
    • Couldn't fix the entity table since it has more than 10 columns. I believe this can be fixed by converting just that markdown document to landscape and then integrate it with the rest of the markdowns with pandoc. Working on it.
  • Long lines of code fit within the margin and don't get cut-off.

Problems with this solution:

  • The contributors page has emojis which doesn't show up in the pdf. Working on this too.
  • Entity table still not fixed
  • Under table of contents, the title 'Modality Specific Files' doesn't show up even though all of the subsections are listed. (content not compromised)

Most of the above problems with the solution can be potentially solved. However, would appreciate thoughts on how much of a priority these are and which takes priority over the other.

More details on the PR: #375

@nicholst
Copy link
Collaborator

nicholst commented Nov 22, 2019

Thank you @Arshitha ! This seems to be some progress.

Here are a few more snagging issues with the PDF:

  • Many hyperlinks referencing within the document don't work; e.g. on page 5, "Extending the BIDS proposal" points to a local file; or "Tabular Files" link on pg 13). There are others, but, oddly, some do work, like the link to "Common Metadata fields" on pg 21.
  • Some non-tabular text runs off the page; e.g. "For example: derivatives/fmriprep/sub-01/..." in Common Principles -> Source vs. raw vs. derived data (page 7; some on page 8 too; also Page 20, 21, 24, 27, 28, etc.).
  • Superscripts on page 18 seem to have been lost.
  • Link on page 20 following "Quantitative T1rho brain imaging" has no space before URL.
  • Somehow, opening with the ToC isn't very polished professional; I don't know if it's feasible, but having some sort of title / splash page would be desirable. Maybe a hack is prepend some text to the "Table of Contents" title?
  • Ideally current version number/data would appear in a footer or header.
  • And, most trivial & subjective, with the default LaTeX font (Computer Modern), it has a very stodgy feel. It'd be nice to have san serif font that roughly matches how the RTD renders on the web.

@Arshitha
Copy link
Contributor

Hi folks, I had the chance to work on some of the bugs pointed out by @nicholst.

Some background on pandoc. It's a powerful haskell library that converts files from one markup format to another. To create a pdf from multiple markdown files, it creates one large LaTeX file which is then converted to a pdf using xelatex pdf-engline.

Here are two versions of the pdf that I was able to generate using pandoc with different options:
verison 1:
bids-specs-version1.pdf
version 2:
bids-specs-version2.pdf

  • Many hyperlinks referencing within the document don't work; e.g. on page 5, "Extending the BIDS proposal" points to a local file; or "Tabular Files" link on pg 13). There are others, but, oddly, some do work, like the link to "Common Metadata fields" on pg 21.

    • Internal links that reference another .md file don't work. However, like in case of "Common Metadata fields", the internal link references a section within the same markdown are working. With my brief experience with pandoc and latex, I wasn't able to figure out a hack for this.
    • One suggestion by @agt24 was to remove these internal links since the pdf comes with a side panel of TOC so losing internal links shouldn't matter for the pdf version?
  • Some non-tabular text runs off the page; e.g. "For example: derivatives/fmriprep/sub-01/..." in Common Principles -> Source vs. raw vs. derived data (page 7; some on page 8 too; also Page 20, 21, 24, 27, 28, etc.).

    • Fixed this. No text wrapping issues with the current document.
  • Superscripts on page 18 seem to have been lost.

    • with pandoc, I couldn't find an automated fix for superscripts.
  • Link on page 20 following "Quantitative T1rho brain imaging" has no space before URL.

    • this was coz of improper spacing in the original document. Fixing the original .md fixed the pdf.
  • Somehow, opening with the ToC isn't very polished professional; I don't know if it's feasible, but having some sort of title / splash page would be desirable. Maybe a hack is prepend some text to the "Table of Contents" title?

    • added a cover page with bids logo.
  • Ideally current version number/data would appear in a footer or header.

    • added a header page, however not fully automated yet. But can be done. Working on this.
  • And, most trivial & subjective, with the default LaTeX font (Computer Modern), it has a very stodgy feel. It'd be nice to have san serif font that roughly matches how the RTD renders on the web.

    • pandoc comes with fontchange option. however, it requires the fonts to be available on the system and if it's not would need to be installed. I tried out a couple of sans serif family fonts like Open Sans, Roboto, etc but these didn't render very well on pdf.

Other things to note:
I've uploaded two versions of the pdf.

  • version 1
    • Has chapter x marked for each .md file that's concatenated to form the pdf
    • This isn't ideal but is the most stable option (comes with pandoc library) to include section breaks
  • version 2
    • doesn't include chapter numbers and section breaks here are implemented using latex commands
    • but on page 18, "Sequence Specifics" appears after the table instead of before the table. This error in formatting propagates
      to the rest of the document starting on pg 18.

Also, in both versions of the pdf, I was able to fix the tabular text overflow:

  • by increasing column widths of the pipe tables manually within the markdown files. It was a one time edit, however, not sure if this might lead to problems while rendering them with RTD.
  • Couldn't fix the entity table since it has more than 10 columns. Trying to convert it to landscape but so far haven't found a hack. Would replacing the entity table with an external link to the RTD version of the specs be a good temporary fix?

I'd like appreciate feedback on these changes before I proceed to create a PR with these changes.

@sappelhoff
Copy link
Member Author

awesome work @Arshitha!

One suggestion by @agt24 was to remove these internal links since the pdf comes with a side panel of TOC so losing internal links shouldn't matter for the pdf version?

yes I think removing all internal links is preferable to having some work and some others not. Is it possible to remove the links during the build of the doc? Because we cannot remove the links in the spec itself just for the sake of the pdf.

Another question: Internal links in the same .md seem to work --> would it be possible to just remove cross .md links?

added a header page, however not fully automated yet. But can be done. Working on this.

Nice, IMO this is a priority.

PS: tagging @yarikoptic to make sure he sees this after his recent post: #375 (comment)

@Arshitha
Copy link
Contributor

@sappelhoff Yes, it's possible to remove only cross .md files without affecting the original .md files.
will update on header automation soon as well.

@nicholst
Copy link
Collaborator

Amazing work @Arshitha ! Thanks for the push!

I didn't have time to crawl through each PDF, but one quick thought on Appendix IV, Entity Table... there is simply no way that a wide table like this will ever render... I mean you could try to do a landscape page, but if some day we add more entities it'll again run out of space.

So! I would propose that we simply need to have a reasonable limit and that after a table gets 'so wide' (how wide? dunno) it needs to be broken into successive stacked tables.

@yarikoptic
Copy link
Collaborator

  • One suggestion by @agt24 was to remove these internal links since the pdf comes with a side panel of TOC so losing internal links shouldn't matter for the pdf version?
    ...
    @sappelhoff Yes, it's possible to remove only cross .md files without affecting the original .md files.
    will update on header automation soon as well.

could you please also some kind of a "TODO: fix/reenable cross .md links in .pdf" issue, so there would be a note to have it addressed properly in the future?

@yarikoptic
Copy link
Collaborator

version 2 is indeed better with not polluting with all the chapter numbers etc

  • but on page 18, "Sequence Specifics" appears after the table instead of before the table. This error in formatting propagates
    to the rest of the document starting on pg 18.

even more fun:

  • Anatomical landmarks as a widow appears at the end of the page BEFORE the table. So I guess there is some layouting going on, may be there is an option to restrict it to be always before...

Overall -- so nice and so worth merging and addressing any of these or detected later issues later and one at a time ;)

@yarikoptic
Copy link
Collaborator

version 2 is indeed better with not polluting with all the chapter numbers etc

but indeed version 1 provides most stable and less buggy rendering, so I would vote for it instead ;)

@Arshitha
Copy link
Contributor

Great! Thanks, everyone for the feedback. Will update the PR soon.

@nicholst
Copy link
Collaborator

Again, congrats to @Arshitha and the whole team for getting the PDF version up and running so quickly.

I think this Issue can be closed now, but there's one last thing: How is the BIDS community supposed find the PDF version? I searched the Read The Docs site and the BIDS website... I couldn't find any links to the PDF version.

Presumably there should be reference to it on the webpage here: https://bids.neuroimaging.io/specification.html and, if not implicitly within RTD, in the landing page src/index.md... right? Or am I missing this? (I searched the floating RTD menu but didn't find it there either).

@sappelhoff @franklin-feingold

@sappelhoff
Copy link
Member Author

Or am I missing this?

nope, you are right, this is still on the To Do list.

Presumably there should be reference to it on the webpage here: https://bids.neuroimaging.io/specification.html and, if not implicitly within RTD, in the landing page src/index.md... right?

Yes, I would put a short paragraph on each: (1) the website, and (2) the landing page of the spec.

@nicholst
Copy link
Collaborator

I can propose a PR... but what's the target URL I should use for the PDF?

@sappelhoff
Copy link
Member Author

sappelhoff commented Apr 16, 2020

cool, would be much appreciated 👍

on the zenodo page, you can see this small snippet saying:

Cite all versions? You can cite all versions by using the DOI 10.5281/zenodo.3686061. This DOI represents all versions, and will always resolve to the latest one. Read more.

so I think we should use this URL: https://doi.org/10.5281/zenodo.3686061, which will lead to the general zenodo entry, resolving to the "latest" newest stable PDF (but previous versions are accessible from a side panel)

@sappelhoff
Copy link
Member Author

I am closing this now. We only need to upload the PDFs between versions 1.0.0 and 1.3.0 to Zenodo to finish this process, and that is tracked in #407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed infrastructure
Projects
None yet
Development

No branches or pull requests

8 participants