Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Message search breaking URL and emojis, and missing results [WIP] #18913

Closed
wants to merge 0 commits into from

Conversation

TBG-FR
Copy link
Contributor

@TBG-FR TBG-FR commented Sep 15, 2020

Proposed changes

It has been reported that when searching for a message, the results were displaying with bugs : URLs where broken, emojis title too, and some results are missing. This PR aims to fix these issues.

Elements resolved:

  • URLs are broken by search <mark>
  • Emojis are broken by search <mark>
  • mailto: links are broken by search <mark>
  • Some search results are missing (is it really a bug ? See comment below)

Issue(s)

#18770 => reports that highlighted links, emojis (and mentions ?) are broken, as well as missing results
#18696 => reports that highlighted links are broken
#18456 => report that highlighted links are broken

How to test or reproduce

Go into a channel, and post these messages :

smile
😄
smile.com
smile-not.com
smilenot.com

Then search for the word smile in that channel. You'll notice that:

  • you can't click on links anymore (they're broken by <mark> tags)
  • if you hover on the smile emoji, its title is broken by <mark> tags too
  • the smilenot.com link isn't part of the results

Screenshots

See issues for screenshots

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves a current function)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Hotfix (a major bugfix that has to be merged asap)
  • Documentation Update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works (if applicable)
  • I have added necessary documentation (if applicable)
  • Any dependent changes have been merged and published in downstream modules

Changelog

Further comments

Broken links/emojis

I fixed the <mark> tag issue by doing three replace steps

  1. Using a (bit tricky) REGEX, we find every occurence of searchedText which is inside a title="..." or href="..." and replace searchedText in them by an unique UUID
  2. Using the previous line of code, we find every occurence of searchedText left and we mark it
  3. We find every occurence of the UUID and replace them with searchedText

I did this because I couldn't make a better REGEX, which would exclude 1 and find only 2. If you are able to do it, or if you see a better solution, feel free to comment ! Also, you may wonder "Why an UUID ?" : I used it because I thought that any arbitrary set of characters could end up being part of the initial search, and I think we're safe about that with a V4 UUID.

Missing results

Considering the other issue, which we'll call "missing results", I searched in the whole project and found that it uses a Meteor function messageSearch. So I suspect Meteor for being responsible of that bug, but we need more investigation to be sure.

Related PR

Results highlighting has been implemented in #16166

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Oct 26, 2020

Alright, I fixed the part about URLs and emojis broken by the <mark> tag

Advancements on search results :

  • Used/called method is located here server/methods/messageSearch.js
  • As far as I understand, on line 203 there's a check, to determine if the search text is a REGEX, or a simple text
  • If I type a "simple text" in the search input, like "smile", words or elements containing that word won't show up (smilestep, smallsmile, smilenot, smilenot.com, :slight_smile)
  • However, I can get them with a REGEX like /.*smile.*/i in the search input

See following screenshots for results with smile and results with /.*smile.*/i.

I'd like to know if it's a normal behavior ? In my opinion (and not only mine #18770), typing smile in search input should show all results containing that word, even if smile is part of a longer word like smallsmile. However, I ask myself if the philosophy here is "use a regex if you want more precise results" ? (which isn't really easy for non-IT users)

@MartinSchoeler what do you think ? Is this normal, or should I try to fix this ?

@TBG-FR

This comment has been minimized.

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Nov 26, 2020

@sampaiodiego Sorry to bother you. I got yet another feedback today from a user of our RocketChat instance, complaining about this search behavior (TLDR: some search results won't show up, and it forces people to use regex). I just need to know if this is a wanted behavior, or if I need to investigate on that to complete that PR.

Many thanks in advance for your answer !

@sampaiodiego
Copy link
Member

we might need to enhance the documentation to mention that the message search uses MongoDB text search (https://docs.mongodb.com/manual/text-search/) (@faria-techwrite what do you think?) and because of that (I guess specially depending on the language) the results might really not be as the expected. it has a lot of benefits, but I can see it being frustrating some times. also it is worth mention that it supports exact search between quotes, that might help getting some expected results.

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Nov 27, 2020

we might need to enhance the documentation to mention that the message search uses MongoDB text search (https://docs.mongodb.com/manual/text-search/) (@faria-techwrite what do you think?) and because of that (I guess specially depending on the language) the results might really not be as the expected. it has a lot of benefits, but I can see it being frustrating some times. also it is worth mention that it supports exact search between quotes, that might help getting some expected results.

You're right, I didn't know it was using MongoDB text search ! And I can understand that choice and its benefits !
The main issue here, I think, is that it seems to be always in "exact search" mode... I tried yesterday again, and if I have some messages containing permission word, the search perm will return nothing, which should be the result of a search with "perm", no ?

However permission and "permission" gave the same result (without the highlighting color on the second one, I'll look into it), as expected, and I'm able to get the wantedresult by typing /.*perm.*/i.

I'll try to do the same search operations on MongoDB directly, to see if there is any difference

@sampaiodiego
Copy link
Member

The main issue here, I think, is that it seems to be always in "exact search" mode... I tried yesterday again, and if I have some messages containing permission word, the search perm will return nothing, which should be the result of a search with "perm", no ?

that really depends.. I think (and that's only my personal opinion) it might be the expected result if you have some tech background (like you may think it uses SQL LIKE so you expect it behaves like that).. an example, what is your expected result if you search for perm on google? you won't get results for permission there.

the mongodb text index isn't working only for exact searches.. it works for plurals or typos for example, if you search for permision or permissions you'll probably get results containing the permission word. =)

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Nov 27, 2020

The main issue here, I think, is that it seems to be always in "exact search" mode... I tried yesterday again, and if I have some messages containing permission word, the search perm will return nothing, which should be the result of a search with "perm", no ?

that really depends.. I think (and that's only my personal opinion) it might be the expected result if you have some tech background (like you may think it uses SQL LIKE so you expect it behaves like that).. an example, what is your expected result if you search for perm on google? you won't get results for permission there.

the mongodb text index isn't working only for exact searches.. it works for plurals or typos for example, if you search for permision or permissions you'll probably get results containing the permission word. =)

I totally understand your opinion, and I think it's kinda an history of habits, even with tech background, as a lot of search engines differ ! Do you think it would be worth mentioning that to the user (for example, just above the "you can search using a regex" text, which is above the search field) ?

That behaviour is nice to know, I just tried that with a french word, and it handled well the plural (but not the typo, maybe because it's in french). However, these results will be in the result list but won't be highlighted, but it's kinda "normal"

@sampaiodiego
Copy link
Member

Do you think it would be worth mentioning that to the user (for example, just above the "you can search using a regex" text, which is above the search field) ?

I do, not sure how to that though (pinging @faria-techwrite again, maybe she has an idea).

That behaviour is nice to know, I just tried that with a french word, and it handled well the plural (but not the typo, maybe because it's in french). However, these results will be in the result list but won't be highlighted, but it's kinda "normal"

Looks like I was mistaken about the "typo feature", sry about that, but it has few other interesting features.

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Feb 12, 2021

I pulled last modifications from develop and added a slight modification to my code to handle multiple matches (e.g. smile-smile@something.com where only the first occurence would be taken previously). Now all links should work correctly, especially mailto: and other urls which were broken (unable to click on them or containing wrong data).

image

I also fixed <mark> highlighting for usernames

@TBG-FR TBG-FR marked this pull request as ready for review February 12, 2021 12:10
@TBG-FR
Copy link
Contributor Author

TBG-FR commented Feb 22, 2021

@sampaiodiego I may have done a rebase instead of a merge before my last modifications on this PR, which included a lot of commits from the repo, is this an issue ? If yes, I should I make a new "clean" one ? If not, this PR is ready for review ;)

@sampaiodiego
Copy link
Member

@sampaiodiego I may have done a rebase instead of a merge before my last modifications on this PR, which included a lot of commits from the repo, is this an issue ? If yes, I should I make a new "clean" one ? If not, this PR is ready for review ;)

@TBG-FR yes, that's an issue 😬 we can't see the real diff anymore.. can you please clean it up so we can see only the changes you've made? thx

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Feb 23, 2021

@TBG-FR yes, that's an issue 😬 we can't see the real diff anymore.. can you please clean it up so we can see only the changes you've made? thx

@sampaiodiego I fixed it, that should be good now 😉

EDIT : Looks like one of my git commands closed the PR... Can you reopen it, or should I try something else ? Sorry for that fail 🤕

@sampaiodiego
Copy link
Member

looks like it closed because there are no more changes 🤔

@TBG-FR
Copy link
Contributor Author

TBG-FR commented Feb 23, 2021

Yeah, I had one commits with the changes, with I force pushed onto my branch to erase all the messed up stuff, but then I did another force push which took the develop branch as a "source", it kinda erased the changes and looks like I can't revert it, even with another force push.

Anyway, I've reopen a separate PR (#20878), so we'll finally be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants