Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.x]: Issue with Search Indexing and Invisible Characters in Multi-Site Craft CMS Project #16430

Closed
romainpoirier opened this issue Jan 14, 2025 · 1 comment

Comments

@romainpoirier
Copy link

What happened?

Description

In a multi-site Craft CMS Pro project with the "craftcms/redactor": "3.1.0" plugin, I am encountering an issue with content indexing. The issue seems related to invisible characters that affect search results in the admin panel and frontend.

Example Content

If I click the Source button in the Redactor field, the content is displayed as follows:

<p>More infor­ma­tion on the projects sub­mis­sion can be found in the <strong>TOOL­BOX</strong>.</p>

However, when I search for the word toolbox in the admin, this page does not appear in the results. Conversely, if I search for TOOL%C2%ADBOX, the page is found. The same behavior occurs in the frontend when using .search().

Field Configuration

  • Clean up HTML: Remove inline styles, Remove empty tags, Replace non-breaking spaces with regular spaces
  • Purify HTML: Enabled
  • HTML Purifier Config: Default

It seems that some invisible characters are being introduced and retained after saving. These characters interfere with the indexing process.

Steps to Reproduce

  1. Create a field using Redactor plugin with the configurations mentioned above.
  2. Add the following content in the field:
    <p>More infor­ma­tion on the projects sub­mis­sion can be found in the <strong>TOOL­BOX</strong>.</p>
  3. Save the entry and perform a search for the word toolbox in the admin or frontend.

Expected Behavior

The page containing the word TOOLBOX should appear in the search results without requiring the exact invisible character sequence (TOOL%C2%ADBOX).

Actual Behavior

The page only appears in the search results if the invisible character sequence is included in the search query. Regular searches for toolbox do not return the expected result.

Additional Questions

  1. Why are these invisible characters added and retained after saving the content?
  2. How can I prevent such characters from being saved in the first place?
  3. What is the recommended approach to clean all content encodings before re-index using --update-search-index?

Craft CMS version

4.13.8

PHP version

No response

Operating system and version

No response

Database type and version

No response

Image driver and version

No response

Installed plugins and versions

  • "craftcms/redactor": "3.1.0"
@brandonkelly
Copy link
Member

Thanks for reporting that! Craft 4.13.10 and 5.5.10 are out with a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants