Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonical tag contains duplicated base URL when trailingSlash is false #6315

Closed
2 of 7 tasks
ltm opened this issue Jan 11, 2022 · 9 comments · Fixed by #6993
Closed
2 of 7 tasks

Canonical tag contains duplicated base URL when trailingSlash is false #6315

ltm opened this issue Jan 11, 2022 · 9 comments · Fixed by #6993
Labels
bug An error in the Docusaurus core causing instability or issues with its execution

Comments

@ltm
Copy link
Contributor

ltm commented Jan 11, 2022

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

If baseUrl is set to value other than '/' (e.g. '/docusaurus-canonical/') and trailingSlash is set to false, the canonical tag of the main index page will contain a duplicated base URL. Note that this only applies to the canonical tag set by JavaScript and not the static HTML.

I suspect this is caused by the shouldAddBaseUrl check in addBaseUrl():

const shouldAddBaseUrl = !url.startsWith(baseUrl);

Assuming baseUrl is '/docusaurus-canonical/' the URL of the main index page is '/docusaurus-canonical', so url does not start with baseUrl.

Steps to reproduce

  1. Set baseUrl to a value other than '/', e.g. '/docusaurus-canonical/'
  2. Set trailingSlash to false
  3. Serve the site from a web host that doesn't use trailing slashes (e.g. Vercel with trailingSlash=false)
  4. Open the main index page in a browser

Expected behavior

The canonical tag should contain the correct URL of the page, i.e. ${siteUrl}/docusaurus-canonical.

Actual behavior

The canonical tag contains a duplicated base URL, i.e. ${siteUrl}/docusaurus-canonical/docusaurus-canonical:
Screen Shot 2022-01-11 at 12 39 13 PM

Your environment

Reproducible demo

https://github.com/ltm/docusaurus-canonical

Self-service

  • I'd be willing to fix this bug myself.
@ltm ltm added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Jan 11, 2022
@Josh-Cena Josh-Cena removed the status: needs triage This issue has not been triaged by maintainers label Jan 12, 2022
@Josh-Cena
Copy link
Collaborator

Oh... yeah. Thanks for reporting! Very clear repro 👍

@slorber
Copy link
Collaborator

slorber commented Jan 12, 2022

also weird duplicate // for the hreflang links 😅

@markharrison
Copy link

The title of this issue seems akin to my problem ...

Im finding that Google is finding but "Excluding" all my Doc pages - other than the root.

I notice when I browse to Doc page there a redirect (301) that adds a / onto the end of the URL.

This means the URL (which returns a 200) is in conflict with what is in the sitemap.xml ?

Is this an issue - or is there another reason Google is excluding Doc pages ?


as an aside - when I access a doc using the the side menu - i see the URL in the address bar doesn't have a trailing /
kind of inconsistent ?

@Josh-Cena
Copy link
Collaborator

@markharrison Did you set the trailingSlash config? You should, in order to ensure consistency in your URL. Your issue is unrelated to this one, which is about base URL, not about trailing slashes.

@kochis
Copy link

kochis commented Feb 8, 2022

It looks like there was a change to the default canonical tag that introduced this behavior: https://github.com/facebook/docusaurus/pull/4109/files#diff-5a9766c1abdd0dbd7f0ebafa788297edbaf531835e8b55c6d542e727e4507affL94-L95

From what I can tell, this only occurs on the root page when using baseUrl (at least it does in our case, pages with a permalink have the correct canonical tag). This is likely due to the fallback when a permalink isn't present: https://github.com/facebook/docusaurus/pull/4109/files#diff-5a9766c1abdd0dbd7f0ebafa788297edbaf531835e8b55c6d542e727e4507affR61-R63

Not sure what the expected behavior is, but it also presenting as a bug for our use case as well.

Edit
Looks like this only applies when the trailing slash is omitted:
https://radar.com/documentation
image

https://radar.com/documentation/
image

@slorber
Copy link
Collaborator

slorber commented Feb 10, 2022

I see

The problem is that you misunderstand that trailingSlash: false has no impact on the baseUrl

The baseUrl always contains a trailing slash.

If you set baseUrl: /documentation/, then you MUST access your site through mysite.com/documentation/, not mysite.com/documentation

Note it's a bad SEO practice to make your site available on both URLs, you'd rather redirect /documentation to /documentation/ with a server-side redirect.

This is something that many hosts will do for you automatically, unfortunately, maybe you have a host that has different behavior and you must configure this explicitly.

Note that when using baseUrl, your deployment might look like this:

- build/index.html
- build/blog.html

It's not possible for Docusaurus to remove the trailing slash of a baseUrl reliably (ie that would work with the most popular hosting platforms) because it means Docusaurus would have to emit HTML files outside of its build folder.

- documentation.html
- build/blog.html

Also worth mentioning: Docusaurus assumes that it is served from the configured baseUrl (and NOT the baseUrl without trailing slash).

This means that if you use relative links like <Link to="xyz"> on your homepage, then depending on how you access that page, the link might target either /documentation/xyz or /xyz => nondeterministic behavior that you really want to avoid as it can link to pages that do not exist and produce 404 errors.


We could eventually fix the canonical URL problem but IMHO it's just hiding a deeper issue that you have.

We might as well want to make your site explode when accessed through /documentation (fail-fast), so that you are forced to do what is right

@slorber
Copy link
Collaborator

slorber commented Feb 10, 2022

Note:

  • at build time, docusaurus outputs the correct canonical URL in the static HTML file
  • this URL becomes wrong only at runtime, after React hydration, because it's reading the current history location, which is not the expected one.

We have many other places where we read current browser location.

Accessing your site through /baseUrl is likely to cause other problems in your site (now or later) that may stay unnoticed.

Maybe we want to add a warning when the site homepage is served from the wrong location

Or do a redirect to add the trailing slash automatically? (I don't like that much because it remains bad for SEO and we might break the site served from the wrong location in the future without even noticing)

Would it be a good solution for your use-case?

@guerrero
Copy link

It seems that, if trailingSlash is undefined the problem is still present and the base URL is duplicated in the meta tags for canonical and hreflang.

Here's a link to a repo to reproduce the bug:
https://github.com/tinybirdco/clickhouse_knowledge_base/tree/332fa84e6b9a22f984e64415b7ccabfa1f5d2669

Once the trailing trailingSlash is set to false everything works as expected.

Sorry in advance if this comment in this closed issue is not the the right way to report this. If you prefer, I can create a new issue.

@slorber
Copy link
Collaborator

slorber commented Oct 28, 2022

@guerrero that would help me review this faster if you opened a dedicated issue and where your repro code has a deploy preview showing the problem that I can inspect without checking out your repo

Also if you have similar problems in hreflang headers that's worth reporting too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants