-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malformed document page when URL query contains slash #2214
Comments
Thanks, @aplhk! 🙇🏻 I can reproduce the error you're seeing. Are you able to share the Google search and/or pages that directed you to that URL? They seem malformed in the first place, so in addition to fixing the behavior, I'd like to fix the URLs at the source if that's something we have control of. |
I think this Google dork cover some of the URLs: https://www.google.com/search?q=site%3Awww.elastic.co%2Fguide+inurl%3Aref |
Thanks again, @aplhk! @AnneB-SEO Do you know where these URLs might be coming from? I don't think we use the |
I'll need to look into it but upon quick glance it looks like the links could coming form 3rd-party sites, like hackermoon.co and driverlayer.com
Likely not
Yes, but only when we are adding the parameters. If they are coming from a 3rd-party, then we can't instruct Google to ignore them Let me look into it and also yet loop in @brianjolly for good measure : ) |
It looks like Google's https://support.google.com/webmasters/answer/6080548 It says the requirements for using the tool are:
Would you say this issue falls in that category? |
Thanks, @brianjolly , that looks promising. I'd want to first confirm that the equivalent pages are getting indexed without the |
@brianjolly & @gtback - The parameter exclusion only applies to pages we create versus pages created by others. Even so I added the |
This problem is more extensive and expanding. When this was originally raised there were ~7 URLs from 2 different site (hackermoon.co and driverlayer.com). Today there are over 80 and more than docs are being targeted including Elasticon. We'll need to file a DMCA takedown notice with Google thru Legal based on: Thanks for finding and raising @aplhk aplhk. Let's leave this one open until we file. Thanks all!!! |
I came across a few links from Google search and found out that precedence of slash (
/
) in the URL query string will lead to malformed / unresponsive document page.Example of malformed page: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html?example.com/a
I believe the root cause is in the TOC fetching script:
docs/resources/web/docs_js/index.js
Lines 253 to 260 in 5b6ac79
In this case
location.href
ishttps://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html?example.com/a
, and after replacing the string it will fetch and appendhttps://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html?example.com/toc.html
which causes infinite loop and unresponsive page.The text was updated successfully, but these errors were encountered: