Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Documentation - New search is borderline unusuable. #47108

Closed
YoNoSoyVictor opened this issue Jul 7, 2024 · 18 comments
Closed

Kubernetes Documentation - New search is borderline unusuable. #47108

YoNoSoyVictor opened this issue Jul 7, 2024 · 18 comments
Assignees
Labels
area/web-development Issues or PRs related to the kubernetes.io's infrastructure, design, or build processes priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@YoNoSoyVictor
Copy link

YoNoSoyVictor commented Jul 7, 2024

I just noticed today that the functionality of kubernetes.io/search has changed and it is extremely bad. I have to use "site:kubernetes.io {search term}" now in Google whenever I want to search for something.

As a quick example, let's say you search for: "Service".

Google -> First result: Service overview, Second result: Example of service usage, Third result: Debugging services. This is in line with what you would expect

In the docs -> First result: Ingresses, Second result: Tailing traffic with Stern, Third result: Define Dependent Environment Variables. Literally none of the results are about Services. Only the first result is somewhat-related

The previous search was not the best, but this is a considerable downgrade. What's the rationale behind these changes?


Edit: After some superficial debugging it seems like this change was meant for China only, and is caused by the is_china cookie being set to true. You can manually replace this by using your Browser's developer tools (in the case of Firefox, right click anywhere on the page -> Inspect Accesibility Properties -> Storage tab.

There's been a shift in the issue and it appears that multiple users outside of china are also encountering this. Some sample Reddit threads 1 2. This should be addressed.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 7, 2024
@dipesh-rawat
Copy link
Member

Thank you for the feedback. It appears you are in China, as a new search method has been deployed there. The recent switch to a new search solution (Pagefind) was intended to reduce costs for CNCF caused by the previous search setup for China. For more details, please refer to issue #44475 and PR #46768

@YoNoSoyVictor
Copy link
Author

Thank you for the feedback. It appears you are in China, as a new search method has been deployed there. The recent switch to a new search solution (Pagefind) was intended to reduce costs for CNCF caused by the previous search setup for China. For more details, please refer to issue #44475 and PR #46768

Thank you for the feedback Dipesh. As you say, I inspected the cookies through the browser developer tools and found that is_china was set to true. I manually set it to false and the old search function came back.

It seems like I am not the only one experiencing this, and based on this Reddit thread many users are being detected as if they were in China. I am in Spain so I don't know why that cookie was set to true.

@dipesh-rawat
Copy link
Member

I am in Spain so I don't know why that cookie was set to true.

In that case this sounds like it might be a bug. It would be helpful to add this information to the issue description, especially if you're outside of China and seeing the new search option.
@YoNoSoyVictor By the way, do you happen to be using any kind of VPN? It's possible that this could be causing the issue.
/area web-development

@k8s-ci-robot k8s-ci-robot added the area/web-development Issues or PRs related to the kubernetes.io's infrastructure, design, or build processes label Jul 8, 2024
@YoNoSoyVictor
Copy link
Author

I am in Spain so I don't know why that cookie was set to true.

do you happen to be using any kind of VPN? It's possible that this could be causing the issue. /area web-development

Nope. No VPNs or anything. Tested on multiple devices as well. As per the threads I shared in my last message it seems like this is a widespread issue.

I'll update the issue description with this information. Thanks for the assistance 🙏

@TPXP
Copy link
Contributor

TPXP commented Jul 8, 2024

I'm using the uBlock Origin ad blocker and it blocks connections to ipinfo.io with default settings. We set is_china=true when failing to get this address, which is probably how it was set on your side

document.cookie = "is_china=true;" + path + expires;

To change, paste document.cookie="is_china=false;expires=Sun, 09 Nov 3023 09:39:28 GMT;path=/" in a JS console on kubernetes.io

@natalisucks
Copy link
Contributor

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 9, 2024
@natalisucks
Copy link
Contributor

/assign @nate-double-u @cjyabraham

Given the recent work you've just completed on the CNCF side regarding PageFind, it would be great for the two of you to look into this. We can make sure SIG Docs prioritises review of this work as well, thank you 🙏

@k8s-ci-robot
Copy link
Contributor

@natalisucks: GitHub didn't allow me to assign the following users: cjyabraham.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @nate-double-u @cjyabraham

Given the recent work you've just completed on the CNCF side regarding PageFind, it would be great for the two of you to look into this. We can make sure SIG Docs prioritises review of this work as well, thank you 🙏

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@cjyabraham
Copy link
Contributor

cjyabraham commented Jul 9, 2024

We've accepted that the new PageFind search is likely not going to be as good as the Bing search that it replaces. Bing, however, was costing CNCF thousands of dollars per month and so was no longer a reasonable option. After reviewing several replacements, PageFind was the best we could find.

The intention is for PageFind to serve users in China, where Google is blocked, and Google Programmable Search to serve everyone else. We detect the location of a user with the ipinfo.io service. It's likely that this service is blocked for users in China, so if an error occurs, we assume the person is in China. Unfortunately, if people use ad-blocking software that blocks ipinfo.io, then they too will be assumed to be in China.

Where can we go from here? Here are some suggestions:

  1. More work could be done to tune the PageFind results.
  2. If PageFind results are still not acceptable, we could go back to the drawing board and consider other search solutions.
  3. For detecting whether a user is in China, perhaps there are other methods for doing this that won't be blocked by common ad-blockers. Or maybe we could set up a proxy so that the ipinfo script isn't recognized by ad-blockers.

@mbianchidev
Copy link
Member

mbianchidev commented Jul 9, 2024

After the sig docs meeting there's an alternative that came to my mind

OramaSearch

Good performance, probably we can give it a try?

Also, could we check browser language (or doc language) directly?
Thanks @natalisucks for claryifing my question on the call

@nate-double-u
Copy link
Contributor

Another idea that came up during the weekly meeting: we may be able to check the language currently being served by the site when the search is made to help us determine which search to provide.

@cjyabraham
Copy link
Contributor

cjyabraham commented Jul 10, 2024

I've submitted a PR to improve the way we determine which search results to serve.

As for improving the PageFind search results, rather than focusing on one particular query, it'd be good to have a much broader set of tests and a way of measuring the quality of the results across all of them at once. Then we can start playing with search parameters and know when we're improving things.

We have already tried to tweak the PageFind search results by increasing the weight on the page title. There are some other methods of tweaking the ranking that we can play with once we have a way of measuring the impact.

Is it possible to look at the search history of the Google Programmable Search engine to find the most common searches? We could start with the most common 20 search terms and grade the results we get from PageFind?

@nate-double-u
Copy link
Contributor

nate-double-u commented Jul 10, 2024

Is it possible to look at the search history of the Google Programmable Search engine to find the most common searches? We could start with the most common 20 search terms and grade the results we get from PageFind?

@a-mccarthy, is this something we can see with our analytics tools?

@dipesh-rawat
Copy link
Member

Is it possible to look at the search history of the Google Programmable Search engine to find the most common searches?

Based on the information available in this page (here), the analytics dashboard for kubernetes.io appears to be located here: Click to view the dashboard. The ‘Top search terms’ dashboard might be useful for finding the data we’re interested in.

@nate-double-u
Copy link
Contributor

nate-double-u commented Jul 11, 2024

Following up on #47108 (comment), it'd be great to get a lot of help testing the proposed fix for the wrong search being served outside of China. (From both folks inside and outside of China)

Here's the deploy preview:
https://deploy-preview-47128--kubernetes-io-main-staging.netlify.app/

@cjyabraham
Copy link
Contributor

While we're waiting for the #47128 to be approved, I've split out an independent issue to improve the quality of the PageFind results: #47137. It needs to be assigned to someone familiar with the Kubernetes docs who can more systematically assess the quality of the PageFind results across a broad range of searches.

@nate-double-u
Copy link
Contributor

Fixed in #47128

/close

@k8s-ci-robot
Copy link
Contributor

@nate-double-u: Closing this issue.

In response to this:

Fixed in #47128

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/web-development Issues or PRs related to the kubernetes.io's infrastructure, design, or build processes priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

8 participants