-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: improve search for documentation #6952
Conversation
@tmair, this is great PR, thank you for this one. I will review it as soon as I find some time - I should be able to do it in the next couple of days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great PR, thank you a lot for making it. It took me a while until I got into this stuff so I could do the proper review. Changes look good to me, however, I've got some questions/comments.
One of them is related to the duplicated results (not related to this PR, but could be handy if we fixed the issue). For example, in this picture:
You can see that we have repeatWhen
appearing two times (the on is the link to this path: /api/index/function/repeatWhen
, and the other one to this path: /api/operators/repeatWhen
). Is it possible to remove docs with /api/operators/
path (since this path is deprecated)?
Other questions/comments in files.
docs_app/tools/transforms/angular-base-package/processors/generateKeywords.js
Show resolved
Hide resolved
docs_app/tools/transforms/angular-base-package/ignore-words.json
Outdated
Show resolved
Hide resolved
"forth", | ||
"found", | ||
"four", | ||
"from", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this line is breaking existing functionality of the generated keywords. Please take a look at this gif (where I compared changes from this PR and official site):
However, removing this line won't make it much better:
Querying "from"
will find 244 results instead of only one (like in the gif). So, I'm not sure what would be the best approach here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the explanation of this issue in my comment below.
'we', | ||
'were', | ||
'what', | ||
'when', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching for "when"
will result in some results (because of repeatWhen
and others).
How do these stop words work? If "when"
is declared as a stop word, shouldn't there be no results in that case?
Anyways, I think that "when"
should be removed from stop words as there are some operators that have this word in their name. "with"
and "while"
as well (e.g. withLatestFrom
and skipWhile
) and possibly others (I haven't checked the entire list).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop words will match only the stop word. So when the word when
is included within a term that term is not filtered.
thanks for the review. I will look at your questions but right now I don't have the time for it. It will take me 3 to 4 weeks to get back to it, but I will continue the work on it |
I rebased the branch on the current master and tried to incorporate your review feedback into my branch. Since the search is somewhat difficult to understand I have made some notes on how the search works. I hope this helps to understand the code base better and should also serve as a reference for future work in this area. How does the search work?The search is split into a preprocessing step at build time and a index creation step (inside a web worker) on runtime. The preprocessing happens in
The service worker performs the following steps, before it can be used to answer any search results:
During the preprocessing and indexing of the search data there are two places where words are ignored. The ignore word list, will be used to filter common english terms during the preprocessing phase (not used with member tokens). The stop word filter will perform the same operation when building the index on the service worker. Improvements to the searching process done in this PRIgnore words and stop words filteringBecause it is strange to have two ignore word lists and filtering steps, I have decided to remove the stop word filter from the index builder pipeline completly, as it is does the work twice and is quite confusing. However I have left all words (including from, every, of, with) included in the ignore word filter, as the search results are not great when those words are not ignored. For example Searching for
|
@jakovljevic-mladen is this still something you're considering merging? |
@jakovljevic-mladen #6913 is approved and ready to merge. Sorry for the delay. |
ecabaed
to
82436e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmair, thank you a lot for making these changes. I read your exhaustive explanation of how search works and I agree with these changes. During my testing phase couple of months ago, I had issues when I was searching term from
where my results were different to what you were describing. So, I assumed this is due to conflicting dependencies. Now that #6913 is merged, I rebased this PR against master, did npm install
and tried search and it worked the way you described it.
Since I did rebase, I'd like you to review this PR to check if merge conflicts are resolved correctly. Also, we have issues with merging to master due to issue with Node 14 which is now removed, but pipelines still show this as a required step.
@jakovljevic-mladen I reviewed the PR and also did a short local test of the new search implementation. So far everything looks good. The only thing that was different for me was a |
It shouldn't be an issue. Doing an |
* docs: update search to aio search * chore: use custom stop word filter for lunr * docs: applay review suggestions * docs: remove stop word filtering during search * docs: improve first search query to search only in title * docs: remove deprecated import paths from search results * chore: commit package-lock.json --------- Co-authored-by: Mladen Jakovljević <jakovljevic.mladen@gmail.com> (cherry picked from commit 0175187)
Description:
This PR ports the search web worker and its integration from angular.io to the RxJS documentation site. The main changes are outlined on the original commit in the angular repository: angular/angular@fccffc6
In Addition to the changes ported from the angular documentation the following changes where made:
from
,of
andevery
as a stop word (this allows for a search of theof
function)disabling of the ignore word list for class and interface members (this allows to find results forThis is now also fixed on angular.ionext
asnext
is a member of theObservable
interface)Note:
This PR needs to be rebased after the dependency issues in #6913 are merged.
Related issue (if exists):
This should fix #6500
The should be no regression for #4536