Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly paging prototype #273

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

carlosmunoz
Copy link
Contributor

@carlosmunoz carlosmunoz commented Apr 17, 2020

Create a prototype for a potential assembly paging mechanism.
Using the module infrastructure this revision creates a new api endpoint at
/api/assembly?id=<identifier>&page=<page num>

where id is a module id (eventually an assembly id) and page num is the a page to fetch from the assembly (it's optional and defaults to the first page)

The api returns the html for a single page, determined by section tags. These tags are the result of asciidoctor's processing and they are top level tags in the body of the assembly's html.

This PR also fixes a bug in the SlingResourceIncludeProcessor which causes invalid includes for includes at a third level or below.

Update: Enabled to assembly "paging" strategies (to see each in action the code must be modified). The second strategy inserts html tags when including any modules. The splits will happen at the start of the new module and will end before the start of the next module include.

@carlosmunoz carlosmunoz added the not ready Not ready to be reviewed or merged label Apr 17, 2020
@codecov-io
Copy link

Codecov Report

Merging #273 into master will decrease coverage by 0.64%.
The diff coverage is 6.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #273      +/-   ##
============================================
- Coverage     63.00%   62.35%   -0.65%     
- Complexity      337      338       +1     
============================================
  Files            80       81       +1     
  Lines          2638     2667      +29     
  Branches        397      398       +1     
============================================
+ Hits           1662     1663       +1     
- Misses          907      935      +28     
  Partials         69       69              
Flag Coverage Δ Complexity Δ
#java 62.35% <6.66%> (-0.65%) 338.00 <0.00> (+1.00) ⬇️
Impacted Files Coverage Δ Complexity Δ
...n/java/com/redhat/pantheon/jcr/JcrQueryHelper.java 77.77% <0.00%> (-15.56%) 9.00 <0.00> (ø)
...servlet/assembly/AssemblyPageRenderingServlet.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...octor/extension/SlingResourceIncludeProcessor.java 77.77% <100.00%> (+0.50%) 10.00 <0.00> (+1.00)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0254734...5a5564f. Read the comment docs.

Comment on lines 59 to 64
String pageHtml = Html.parse(Charsets.UTF_8.name())
.andThen(Document::body)
.andThen(element -> element.select("section.sect1"))
.andThen(elements -> elements.listIterator(pageNum.intValue()-1).next())
.andThen(element -> element.outerHtml())
.apply(fullHtmlContent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting technique. This is different from what we discussed, but seems like it accomplishes the same goal. Are you confident that this "sect1" construct is reliable? I would worry that it only exists because of our haml, and that some future unrelated haml change would unexpectedly break this functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After additional testing, I found out that "sect1" does show up with vanilla asciidoctor (which is probably a good sign) BUT it is tightly tied to the module having a title. (I tested by using == Some Title syntax). If the title is at a deeper level, such as === Some Subtitle, or if the module has no title at all, this code causes the assembly "pages" to span multiple modules.

Copy link
Contributor Author

@carlosmunoz carlosmunoz Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section.sect1 selector happens to be the default rendering tag and style for vanilla asciidoctor, and one which we are not overriding in our haml templates. It can however be overriden, in which case we would have to change our splitting algorithm (e.g. instead of section tags we may decide to use p tags). The benefit of this approach is that these sections appear at the root of the body, so the splits are very clean. As for the technique you and I discussed, I did try it, but the problem is that the breaks may not happen cleanly (e.g. breaking in the middle of a p tag, or worse, a deeply nested structure).

We could explore if Jsoup allows us to "clean up" the html, but I did not go that far.

The scenario you described with the titles is an interesting one for sure. I guess we would need to explore the implications more deeply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jsoup in theory corrects badly broken html, so we could try to see how it does in this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosmunoz I can't remember where we left off in our discussions, but let me know if you'd like to merge this in as-is - or if you intend to take it further. Even though there is certainly more research to be done, I think this code already could serve as a good starting point for another dev to start looking at in earnest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I added a bit more code to try different assembly paging strategies. I still like the one by section better (the original one). Have a look and let me know what you think. I would like to keep this out of the code for a little while longer still, but we can discuss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was finally able to take a look at this. I see where you're going with the code, and I like it.

I don't know specifically what technique would be more robust - keying off an html comment, or an html tag (that we would add in either case), rather than something that we "hope" is present in the doc, feels like the right way to do it.

When I checked out the code and tried to run it locally, it didn't work. I don't see the <!-- pantheon-module-whatever --> tags being represented in the cachedHtml data, so there's nothing for the assembly servlet to split. Am I correct in thinking I ought to see the comments there? I see the include processor adding them at the code level, so it's quite odd that they would simply disappear...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you would still see them there. They get inserted at every include of a module, so perhaps you need to make sure in your sample you are including modules (as opposed to just resources).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not ready Not ready to be reviewed or merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants