Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext indexing for nodes "hidden through parent" is not correct #214

Open
kdambekalns opened this issue Jun 2, 2017 · 6 comments
Open

Comments

@kdambekalns
Copy link
Member

Given:

  • a node that is contained in a container, e.g. a multi-column element, containing text "Foobar"
  • that container is marked as hidden
  • the fulltext index is built

Problem:

If the document that contains this node is found using the FLowpack.SearchPlugin, the content excerpt contains the "Foobar" string, even though it's not visible in the rendered page

Conclusion:

The indexer checks each node for it's hidden state, but since the node containing "Foobar" is not hidden by itself, but only invisible because it's container is hidden, it is indexed and the extracted fulltext parts are added to the fulltext root (document node).

@kdambekalns
Copy link
Member Author

Now, fixing this seems hard. It would mean we need to check the nodes on the path to the site node for every node we index. Given the performance issues with indexing we have already, this seems not to be a viable solution.

Any other ideas or clever solutions highly welcome!

@dfeyer
Copy link
Contributor

dfeyer commented Jul 15, 2017

That's something we can not solve before the new CR unfortunatly ...

@kdambekalns
Copy link
Member Author

Not sure how the new CR would solve that!?

@PRGfx
Copy link

PRGfx commented Mar 30, 2021

For updating the complete index I tried using an aspect memoizing node visibility per context-path as a workaround for a small project. That did work quite nicely however ES is probably mostly used for larger projects...

I think there is a second notable problem in updating the index for a sub-tree e.g. when hiding a document with real-time indexing. Especially toggling visibility for a large sub-tree would be quite intense I imagine. Adding and removing the nodes for every change doesn't seem viable. Maybe one could work with some in_visible_path property*. (Using something like an SQL join with a (contextpath, visible) doesn't seem to be the elasticsearch way.)

I currently see these three scenarios:

  • real-time indexing; a node is set to invisible:
    this state could be applied to the whole sub-tree (would something like that be possible? update where path like ...%?)
  • real-time indexing; a node is set to visible:
    the sub-tree needs to be traversed from the "top" down to the first invisible node to reset in_visible_path
  • building the index
    the nodes are indexed "from the bottom" (or anywhere in between) and the parents would need to be traversed (at least once)

Probably this whole thing should be tied to some other configurable property of a node, as one maybe wants to exclude an otherwise visible node from search-results (i.e. some "internal" part of the website) without setting a checkbox on each sub-page, while still indexing the nodes to maybe populate list elements?

Writing this and thinking about it made me rewrite this multiple times, so I think it would be helpful to summarize the facets of the problem somewhere in this issue.. Are there additional problems to think about here?


* since the new CR has the comment on node-visibility "[the NodeHiddenState] can NOT answer the question whether a Node is hidden because some node above it has been hidden - for that, use the Content Subgraph.", I assume there will be some aggregation of that sort built into the CR in the future..

@kdambekalns
Copy link
Member Author

I think this is not something that can only be solved with the new CR, it did not happen earlier, see #377 (comment)

@mficzel
Copy link
Member

mficzel commented Nov 10, 2022

I am suspecting this is somehow caused by the cr. It is not new that hidden nodes were returned by elastic but previously they were removed during convertHitsToNodes https://github.com/Flowpack/Flowpack.ElasticSearch.ContentRepositoryAdaptor/blob/master/Classes/Eel/ElasticSearchQueryBuilder.php#L897-L950 which mainly uses $contextNode->getNode($nodePathFromElasticSearch) ... might be a cr issue after all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants