Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend a redirect strategy for docs #1820

Closed
dedemorton opened this issue Apr 28, 2020 · 16 comments
Closed

Recommend a redirect strategy for docs #1820

dedemorton opened this issue Apr 28, 2020 · 16 comments
Labels
link-checking Link Checking & Redirects team-discuss

Comments

@dedemorton
Copy link
Contributor

We will be refactoring a lot of content in the coming months (beats, cloud, etc). Right now, our strategy for handling moved content (changed URLs) is not clear.

In the past, we've requested that the website team create redirects. Here's a random example of a request.

We also have a legacy redirects page that we planned to use in the future to manage redirects, but I don't think that page is being updated, and I don't think it's actually used by the build.

Some teams maintain a Deleted pages appendix and use that to redirect users manually to a page that's moved.

We need a clear strategy going forward, and I'm not sure whether redirects are the right way to go.

According to our internal wiki (copied from there):

  • Web team says it's best practice to update links on the site that reference the updated URL - search engines will see the old link if we don't update it, and it's not best practice to rely on the redirects.
  • From an SEO perspective, it's best to replace URLs where possible / worth the effort since the positive impact to SEO is diminished if the crawlers see the old URL first. When crawlers hit the old URL, this will pass along some information about the new URL but is read as coming from a second-hand source. It's better overall for SEO if the crawler hits the live page first instead of coming to the live page from a redirect.
  • Another consideration for the Web team is the monitoring and maintenance for redirects - after too many redirects (more than 4?), the page will not load due to too many redirects (the redirect gets stuck in a loop) - at this point, the page will stop being indexed. If we replace the URLs instead of redirecting, we're saving the team from having to monitor / remember some of these.
@alaudazzi
Copy link
Contributor

@dedemorton thank you for raising this issue. This is going to impact a major doc refactoring that is currently ongoing on the Cloud ECE docs.

@dedemorton
Copy link
Contributor Author

Also see the related issue: #1357

@debadair
Copy link
Contributor

Yes, this is a major issue for restructuring content. The current process (request redirects from the web team) has led to pain because we have no insight into what redirects already exist. We've ended up with circular redirects and assorted broken-ness. The redirects file in the docs repo was Nik's attempt at bringing some order to the chaos, but was never adopted by the web team & the folks at RAW who handle the actual infra/deployment of the site.

Website redirects also only work at the page level, so if the chunking of content changes, they don't really solve the problem.

The redirects appendices feel like a very hacky solution--but it's one we can control pretty easily. On the ES side, @jrodewig and I discussed a strategy where we would clean up old entries when a new version was released, rather than keeping them around indefinitely. (Knowing that there are likely to be a number of necessary redirects between the last minor and a new major, so it's not just a matter of deleting all of them and starting over.)

One motivation for the redirect appendices was to minimize surprise cross doc links that caused chaos on release days. Now that we have the CI checks in place and have improved the process around releases, it's probably worth enforcing that if you move/remove a topic and break a link from somewhere else in the docs, you need to fix it, not just add an entry to the redirects appendix. The redirects appendix should be used to keep external links (like Google search results) from 404-ing.

Ideally, it's best to mark pages that are being removed with a noindex tag and request a reindex from Google before they disappear. I think that we could basically accomplish that by adding a noindex tag to the redirect appendices and requesting a reindex when we roll out big changes or before we clean up old entries.

@AnneB-SEO might have other insight into how to minimize the SEO disruption as we reorg the docs.

@debadair
Copy link
Contributor

Also, simply keeping track of everything that has moved around is a chore. It would be really helpful to have a new & deleted anchor report generated for each PR.

@gtback
Copy link
Member

gtback commented Apr 29, 2020

Also, simply keeping track of everything that has moved around is a chore. It would be really helpful to have a new & deleted anchor report generated for each PR.

This is exactly the type of feedback which helps me prioritize what I'm working on!

@bmorelli25
Copy link
Member

I hate to link to a private Slack conversation in a public repository, but I think it's necessary to illustrate how big of a problem this is: Slack 🧵. We should remember to clean up the broken links that already exist with whatever solution we choose to move forward with.

@jrodewig
Copy link
Contributor

The redirects appendix should be used to keep external links (like Google search results) from 404-ing.

From an SEO perspective, a manual redirect done using a redirect appendix is inferior to a server-side 301 or 302 redirect.

Those manual redirect pages still respond with a 200 HTTP status code, which indicates to search engines that the old page is still alive. This means our new page is competing with the old (redirect) page. As older pages typically have more link juice, the redirect pages may be returned in SERPs before actual content pages.

The best case is a 301/302 that passes that link juice on to the new page. However, even a 404 would at least let the old page die. Right now, the redirect appendices are keeping old, zombie pages alive.

I also don't think the redirect appendix is the best experience for users.

I would love to abolish redirect appendices entirely, except maybe in cases where there is no good redirect. Better control and visibility of server-side redirects would be my preferred path forward.

@debadair
Copy link
Contributor

Agreed. The redirect appendices are a patch for a broken process. Beyond the issue of ending up with zombie pages that never go away, the manual process simply doesn't scale for major reorganization of existing content. We need to be able to automatically detect changes that require redirects, and manage the redirects in a way that doesn't require multiple spreadsheets and teams.

@dedemorton
Copy link
Contributor Author

dedemorton commented Jul 2, 2020

After spending a several hours today updating links throughout all the docs, I had a thought about how we can approach linking with the tools and processes that we have now. My solution isn't ideal. Our tools should really maintain the link and link text for us. But having to manually go through a dozen repos to update links (even for a handful of topics) is a major PITA.

What if we create one or more shared link files in the docs repo. Each team would externalize links for the rest of the team to use. We can anticipate a lot of the links, then add more when people need to add links.

So we might have something like:

links.asciidoc (or maybe beats-links.asciidoc) that contains attributes like:

:metricbeat-quick-start-link: {metricbeat-ref}/metricbeat-installation-configuration.html[{metricbeat} quick start]

Writers could use {metricbeat-quick-start-link} instead of hard coding all the links in their books.

If we want to provide writers with more control over the link text, we could use two attributes:

:metricbeat-quick-start-link: {metricbeat-ref}/metricbeat-installation-configuration.html
:metricbeat-quick-start-text: {metricbeat} quick start

Then writers would resolve the link by using:

{metricbeat-quick-start-link}[metricbeat-quick-start-text]

I know this is hacky, but I've had a long day of monkey work and feel like I'm stuck in 1985. (I guess I don't have to use carbon paper or leave enough space for footnotes, but seriously, all this manual monkey work is a time sink.)

Hmm...but then we'd also need some kind of versioning, maybe similar to what we do for the versions file?

@gtback
Copy link
Member

gtback commented Jul 2, 2020

@dedemorton 💯 for this approach for any links that are used more than 2 or 3 times in a book. It's obviously not a complete solution, but hopefully reduces some of the pain.

If you're trying to link to the same version, could you just throw a {branch} in the URL?

@lcawl
Copy link
Contributor

lcawl commented Jul 2, 2020 via email

@AnneB-SEO
Copy link

Hi @jrodewig - missed this post from a few months ago - so sorry. Great summary! Adding a couple notes.

From an SEO perspective, a manual redirect done using a redirect appendix is inferior to a server-side 301 or 302 redirect.

Yes and no. A server-side redirect is always preferred yet only a 301. 302's still can be a bit problematic to search engines and are really for temporary redirects, such as a login URL that performs language detection before assigning a destination URL.

Those manual redirect pages still respond with a 200 HTTP status code, which indicates to search engines that the old page is still alive. This means our new page is competing with the old (redirect) page. As older pages typically have more link juice, the redirect pages may be returned in SERPs before actual content pages.

Finally an explanation of how docs generates all those "soft 404s" (a 404 that returns a 200). Thank you!

The best case is a 301/302 that passes that link juice on to the new page. However, even a 404 would at least let the old page die. Right now, the redirect appendices are keeping old, zombie pages alive.

Link juice will only get passed with a 301. Even if we were to redirect with a 302 and then change to a 301 all the link authority would be lost.

I also don't think the redirect appendix is the best experience for users.

Sounds like it's a poor experience for both users and search engines!

I would love to abolish redirect appendices entirely, except maybe in cases where there is no good redirect. Better control and visibility of server-side redirects would be my preferred path forward.

YES!

Thanks again for the write up and background on the soft 404s!

@dedemorton
Copy link
Contributor Author

dedemorton commented Jul 2, 2020

@gtback RE your comment:

If you're trying to link to the same version, could you just throw a {branch} in the URL?

The {metricbeat-ref} attribute would take care of resolving the correct branch. I'm thinking more about the situation where we change the HTML filename (maybe to improve SEO) but the change only applies to a specific version and later. The lack of branches in the docs repo makes it hard to version attributes that might change over time. The way we handle versions of shared attributes right now is a little hacky, so I think this needs a bit more thought, especially as we're on the cusp of some big refactoring.

@lcawl RE your comment:

Should we consider using external links at all times or should we use citation maps for all links in each book/context (and define the URL attribute and whether it is an external or internal link appropriately for each book)

I wouldn't want to use external links everywhere because we'd lose out on link validation in local builds and that would make it harder to diagnose some build problems before we push to GitHub. Plus we'd have to maintain all the link text manually.

Hmmm...it would be cool if we could somehow harness the logic that asciidoctor uses when it creates links and use it to generate a file that's populated with external links that other books can use. I guess we'd need logic so that once an attribute is defined in the link file, only the filename and link text would get updated. (Just trying to think of ways to automate the creation and maintenance of this file so that it doesn't become yet another time sink.)

EDITED: As a first step, we could manually create files that capture the high traffic links (like getting started and installation topics).

@gtback
Copy link
Member

gtback commented Jul 2, 2020

@AnneB-SEO @jrodewig

Better control and visibility of server-side redirects

💯 for this. I'm looking forward to hosting the docs ourselves, and a big part is exactly for that reason.

@gtback
Copy link
Member

gtback commented Jul 2, 2020

The lack of branches in the docs repo makes it hard to version attributes that might change over time

@dedemorton That makes sense, thanks. I'll have to think more about it. @benskelker was asking me a similar question this morning.

@ollyhowell ollyhowell pinned this issue Sep 30, 2021
@dedemorton
Copy link
Contributor Author

Would be nice to get this fixed for Next Docs, but probably not worth changing the process in the current doc system...so I'm closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
link-checking Link Checking & Redirects team-discuss
Projects
None yet
Development

No branches or pull requests

8 participants