Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Nuclia Widget #792

Draft
wants to merge 40 commits into
base: nuclia
Choose a base branch
from
Draft

Integrating Nuclia Widget #792

wants to merge 40 commits into from

Conversation

justdaksh
Copy link

@justdaksh justdaksh commented Sep 3, 2023

CC: @stevepiercy
Here is a draft for Nuclia widget integration into training.

Some of the issues to be solved:

  • Null and Undefined results
  • Loading of widget w.r.t page and Breadcrumbs w.r.t results in Offset.

Pull request preview URL:

https://plone-training--792.org.readthedocs.build/

docs/_static/searchtester.js Outdated Show resolved Hide resolved
upload.py Show resolved Hide resolved
@stevepiercy
Copy link
Contributor

Here's that git command from Volto that builds only when docs change:

https://github.com/plone/volto/blob/master/netlify.toml#L5

__pycache__/upload.cpython-310.pyc Outdated Show resolved Hide resolved
git config --global user.email github-actions@github.com
git add .
git commit -m "Nuclia Sync: Updated docs" --allow-empty
git push
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So everytime there is a commit on main, we run the job, which makes a commit on main.
It looks like an infinite loop :)

Maybe the job should have a condition like if the last changes are only about our 2 generated json files, we stop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebrehault I have put the check for nuclia_sync.json only. This should work, right?

upload.py Show resolved Hide resolved
@stevepiercy
Copy link
Contributor

Final items to work on:

  • Increase search item's title font size.
  • Increase space between each search result item.
  • Keep width of search results to 896px.

@stevepiercy
Copy link
Contributor

This is much easier to read. Thank you!

I found a couple issues after make livehtml and previewing locally.

  • The title link needs to have a hover state. An underline would be sufficient, and is typical. Here's what PyData theme does:
    a:hover {
      text-decoration-skip: none;
      color: var(--pst-color-link-hover);
      text-decoration-skip-ink: none;
      text-decoration-thickness: max(3px,0.1875rem,0.12em);
    }
  • In dark mode, search for schema, then hover over the "Display all results" link. It turns black. This should be readable. I'd suggest a grey that has sufficient contrast in both light and dark mode. Screenshot 2023-10-01 at 6 22 32 PM
  • When clicking a search result, it scrolls to the location of that result in the page, and highlights it with pink, which is a reserved color for inline code. Here's what the PyData theme uses:
    dt:target, span.highlighted {
      background-color: var(--pst-color-target);
    }

@stevepiercy
Copy link
Contributor

I've just updated this branch. I tried to get it to work with Sphinx 8.1.2, but had no luck in local development. I have no idea what to do to get it to work. @justdaksh I know it's been a long time, and I understand if you've moved on. Is this something you would like to work on? Also pinging @ebrehault to discuss.

@ebrehault
Copy link
Member

Hi @stevepiercy !
I see we have errors in the sync script (probably a minor issue), but what is the problem you get with Sphinx exactly?

# Conflicts:
#	requirements.txt
@stevepiercy
Copy link
Contributor

@ebrehault I updated again with the Plone Sphinx Theme and recent merges to main, falling back to Sphinx 7.4.7. The Nuclia search widget does not appear.

I could post the JavaScript console with its 10 errors, but nothing is obvious to me, so I reckon it goes deeper.

@ebrehault
Copy link
Member

ebrehault commented Nov 9, 2024

@stevepiercy I have pushed a fix, the widget is now displayed as expected.
I have also updated the options of the Nuclia widget.

…ed div to prevent overlap with preceding text
@stevepiercy
Copy link
Contributor

@ebrehault thanks for the assist. It works for me now.

I restarted styling things, too.

Looking at the search results, I see a lot of MyST markup in the results. This is not good. I was checking out the docs for a better option, and found Index a full website. It looks like I can tell Nuclia to index only the content in the CSS selector article.bd-article section. Before I go that route, which method yields better search results, indexing MyST or HTML?

If it turns out to be better to index HTML, then we can add a CI step to build the site first.

Please let me know your thoughts. Thank you!

@ebrehault
Copy link
Member

Indexing MyST or HTML will make no difference in terms of quality of the search results (unless the HTML contains side contents like the navigation menu, the footer, etc., that might produce unwanted matches).

We have 3 possibilities:

  • index the MyST contents (as of now), and if we want to avoid MyST markup in the results, we would need to convert it to pure Markdown (not sure if that's possible)
  • index HTML contents after the build, and to avoid unwanted side content, we will have to parse the files with BeautifulSoup or equivalent to get the core content
  • index the online web pages after deployment, in that case the Nuclia API lets us define a CSS selector to restrict the indexation to the core content).

@stevepiercy
Copy link
Contributor

stevepiercy commented Nov 10, 2024

  • MyST contains both reStructuredText and Commonmark+, and there is no reliable converter to plain old Markdown. This is probably not the best choice.
  • Build HTML, then index, sounds good. It sounds like Nuclia does not support directly indexing HTML from files that are not hosted on a web server, and that an intermediary parser is required. Is that correct? If so, then BS is fine to scrape the content from <article.bd-article><section ...> (this is the content of the page, excluding all navs, header, and footer), and I can figure out the details of generating a file that Nuclia can consume.
  • Index online is too late, but sounds like the easiest option.

@ebrehault
Copy link
Member

Nuclia does process local HTML files, but then we do not have the CSS selector option to narrow down to a part of the page, so we need to do that manually.

@justdaksh
Copy link
Author

hey @stevepiercy, apologies for the delay I was out of town for two weeks and will be back home the day after tomorrow. I'll get to this as soon as I reach back.

@stevepiercy
Copy link
Contributor

Hi @justdaksh, I'm glad you're still around!

I would like to go with Option 2, build HTML then index content.

I will be pushing a few commits to this branch, to bring it up to date with the latest Plone Sphinx Theme.

Ping me on Discord, if you want to chat about anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants