Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset READMEs and Data Dictionaries from metadata #561

Open
1 of 3 tasks
alexrichey opened this issue Jan 29, 2024 · 9 comments
Open
1 of 3 tasks

Dataset READMEs and Data Dictionaries from metadata #561

alexrichey opened this issue Jan 29, 2024 · 9 comments
Assignees

Comments

@alexrichey
Copy link
Contributor

alexrichey commented Jan 29, 2024

designs in fIgma

weasyprint (the python package we're using to convert html to pdf) has a helpful example here

  • format data dictionary pdfs
  • add readme metadata to the metadata model and yml
  • generate readme pdfs

screenshots of figma designs

the text in these designs were for a modified version of PLUTO's README

Image

@damonmcc
Copy link
Member

found an example of our code converting a markdown file to a pdf here

@alexrichey alexrichey changed the title DE<>GIS: Generate product READMEs from metadata DE<>GIS: Generate product READMEs and DataDictionaries from metadata Sep 9, 2024
@alexrichey
Copy link
Contributor Author

@damonmcc Is this the correct issue, or is there another?

@alexrichey alexrichey changed the title DE<>GIS: Generate product READMEs and DataDictionaries from metadata Dataset READMEs and Data Dictionaries from metadata Sep 9, 2024
@damonmcc
Copy link
Member

looking into using weasyprint (already the engine we're using in our use of pandoc) to do html + css = pdf

weasyprint samples here

@alexrichey
Copy link
Contributor Author

alexrichey commented Sep 30, 2024

@damonmcc Here's how I'd go about the remainder of this task. Let's chat when you have a minute. In terms of timeline, the next check-in with Amanda+Matt+myself is Thursday next week, and I'd like to show progress on this item. POC/WIP is fine 🙂

Data Dictionary (All fields come from metadata)

  • POC of approach
    basic and ugly, but with some colors, and the DCP logo. Generated from manually created HTML with inlined CSS, fed into weasyprint. Goal is to prove out the approach, and get a sense of the difficulties.
  • Create skeleton of Jinja HTML template. Probably start from scratch, but take inspo from Heng's work. Use real product metadata, but don't focus effort on transforming/testing the data.
    • get team feedback on template via draft PR
  • Productionize:
    • Make the template look good
    • CSS should probably be in a separate file, and inlined*
    • tests, if applicable
  • Implement auto-asset-autogeneration (similar to OTI data dict), and implement in TemplateDB

README

Most fields from metadata, except changelog. Potentially other fields? TBD. e.g. rich text with images.

  • POC of approach
    basic and ugly, but with some colors, and the DCP logo. Generated from manually created HTML with inlined CSS, fed into weasyprint. Goal is to prove out the approach, and get a sense of the difficulties.
  • Create skeleton of Jinja HTML template with mix of real and mock data (mock the changelog).
    • get team feedback on template via draft PR
  • Productionize:
    • Make the template look good
    • CSS should probably be in a separate file, and inlined*
    • tests, if applicable
  • Implement auto-asset-autogeneration (similar to OTI data dict), and implement in TemplateDB
  • Figure out where changelog lives (can get going on this whenever, but it's a trivial/minor implementation detail, and it shouldn't block us)

For inlining, potentially use something like this: https://pypi.org/project/css-inline/

@damonmcc
Copy link
Member

damonmcc commented Nov 20, 2024

notes from 11/20 chat

To generate PDFs that align with the designs in figma we'd like to:

  • use a generalized jinja template for both Data Dictionary and REAME pdfs
  • construct complete html files from smaller jinja snippets (e.g. header, sections)
  • add styling to the complete html file before it's converted to a PDF

notes on scope:

  • will use the existing Data Dictionary yaml/mode for this initial work, will not add README yaml/model

@damonmcc
Copy link
Member

damonmcc commented Dec 2, 2024

having trouble writing html+css to produce the exact format in the figma design, specifically the inline sections for a name and values of a metadata field (e.g. "Abstract" next to all the content)

gonna focus on the content and then improve details of the layouts

@damonmcc
Copy link
Member

damonmcc commented Dec 2, 2024

just noting that it seems like we'll still have to do some of the styling in between html -> pdf in order to do things like page numbers

from the CSS perspective, a PDF is paged media

@alexrichey
Copy link
Contributor Author

just noting that it seems like we'll still have to do some of the styling in between html -> pdf in order to do things like page numbers

from the CSS perspective, a PDF is paged media

Gotcha - good callout

@damonmcc
Copy link
Member

milestone ideas

  • data dictionary sections look identical to design (done next week)
  • build version is in data dictionary
  • generate data dictionary pdf in all builds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants