Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Means to exclude content from sitemap.xml #653

Closed
jameslai opened this issue Nov 18, 2014 · 38 comments · Fixed by #12329
Closed

Means to exclude content from sitemap.xml #653

jameslai opened this issue Nov 18, 2014 · 38 comments · Fixed by #12329

Comments

@jameslai
Copy link

I've looked through the existing documentation on the "Front Matter" and have not been able to find an option to exclude content from appearing in the sitemap.

This use case arises when we want content to be included in "master" pages. For example, our careers page lists all careers by ranging over .Data.Pages and displaying the title and content (which amounts to a brief description), though we technically do not want each individual career to have its own landing page.

Normally this isn't an issue, but it becomes one on the sitemap.xml as the pages, which shouldn't really have their own URL, are included in the sitemap. As these pieces of content don't have dates associated with them, Google Webmaster Tools reports an invalid date. It also produces a fairly pointless URL with negligible content.

@derekperkins
Copy link
Contributor

This is more important once you start building PPC landing pages. You'll typically create a ton of duplicate content with slight changes for A/B testing, and you won't want any of that indexed.

It would be nice to standardize on a lot of these things across themes. This is shamelessly stolen from the WordPress SEO settings page:
image

It would be great to have these options as 1st class hugo settings, that can be mapped at the site, section, type or page level, with the same type of inheritance that everything else has.

@bep
Copy link
Member

bep commented Mar 1, 2017

Note/Update: This issue is marked as stale, and I may have said something earlier about "opening a thread on the discussion forum". Please don't.

If this is a bug and you can still reproduce this error on the latest release or the master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.

@bep bep closed this as completed Mar 27, 2017
@XhmikosR
Copy link
Contributor

XhmikosR commented Jun 5, 2017

This is a valid request. For example, it makes no sense to have the error pages like 404.html in sitemap.xml.

Please, reconsider this.

As a temp solution would be to provide your own sitemap.xml template and specify sitemap_exclude: true in the frontmatter:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  {{ range .Data.Pages }}
  {{ if not .Draft -}}
  {{ if ne .Params.sitemap_exclude true -}}
  <url>
    <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
    <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
    <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
    <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
    <xhtml:link rel="alternate" hreflang="{{ .Lang }}" href="{{ .Permalink }}"/>{{ end }}
    <xhtml:link rel="alternate" hreflang="{{ .Lang }}" href="{{ .Permalink }}"/>{{ end }}
  </url>
  {{- end }}
  {{- end }}
  {{- end }}
</urlset>

EDIT for newer Hugo versions:

{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  {{- range .Data.Pages -}}{{ if and .Permalink (ne .Params.sitemap_exclude true) }}
  <url>
    <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
    <lastmod>{{ safeHTML (.Lastmod.Format "2006-01-02T15:04:05-07:00") }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
    <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
    <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
    <xhtml:link rel="alternate" hreflang="{{ .Language.Lang }}" href="{{ .Permalink }}"/>{{ end }}
    <xhtml:link rel="alternate" hreflang="{{ .Language.Lang }}" href="{{ .Permalink }}"/>{{ end }}
  </url>{{ end }}{{ end }}
</urlset>

@bep bep reopened this Jun 5, 2017
@bep
Copy link
Member

bep commented Jun 5, 2017

I assume you can either:

  1. Provide your own sitemap.xml template?
  2. Disable the built-in sitemap and create your own custom output format?

And being an "Old and stale" issue shows that no one have shown enough interest in it to do something about it. Prove me wrong.

@XhmikosR
Copy link
Contributor

XhmikosR commented Jun 5, 2017

@bep: like I say above:

As a temp solution would be to provide your own sitemap.xml template and specify sitemap_exclude: true in the frontmatter...

I'm already doing this, but it's a functionality which can be very handy if it exists and will reduce custom code all around.

It's a minor feature request, which will make Hugo even cooler :)

I'm still in the process of using Hugo for the first time, so I notice some things coming from another generator.

@XhmikosR
Copy link
Contributor

XhmikosR commented Jun 5, 2017

Related, Hugo should exclude the 404 page from sitemap.xml by default. It gives no value to have it in.

@moorereason
Copy link
Contributor

@XhmikosR, please open a separate issue for removing the 404 template by default.

@ecow
Copy link

ecow commented Mar 31, 2018

I confirm interest in this topic. Landing page do not need to be indexed by robots, if you exulude them in robots.txt but you include them in sitemap google bots complain.

@XhmikosR
Copy link
Contributor

Any updates on this? Maybe just respecting any frontend sitemap: false?

@bep
Copy link
Member

bep commented Aug 10, 2018

@XhmikosR no updates on this, but there are several similar requests floating around, making me think that this needs a general design and not loads of ad-hoc flags.

@bep
Copy link
Member

bep commented Aug 10, 2018

Also note that if you really want this now, it is possible to override Hugo's sitemap template(s):

https://gohugo.io/templates/sitemap-template/#sitemap-templates

@XhmikosR
Copy link
Contributor

@bep: I know, I'm already using a custom sitemap template :) I was doing some cleanup in my templates and thought I'd try sitemap: false in my front matter, but didn't work so I searched the issues and saw this. I had totally forgotten about this until now.

@willertrombix
Copy link

Hi,

No coding skills there but trying to launch a website in HUGO. I have the same problem: I want to include in sitemap.xml only the posts of the website, excluding pages like "Search" or "Contact". Unfortunately I didn't find or understand how to do it with the Hugo's sitemap template (https://gohugo.io/templates/sitemap-template/#sitemap-templates)

Any idea?

@coliff
Copy link
Member

coliff commented Sep 20, 2018

@willertrombix - I recommend the method @XhmikosR provided. I've been using it on a number of sites and it works great.

Simply add a sitemap.xml in your layouts folder. the sitemap.xml should be:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  {{ range .Data.Pages }}{{ if ne .Params.sitemap_exclude true }}
  <url>
    <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
    <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
    <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
    <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
  </url>
  {{ end }}{{ end }}
</urlset>

Then for your Search and Contact pages front matter add sitemap_exclude: false and they will be excluded from the sitemap.

@XhmikosR
Copy link
Contributor

^^ sitemap_exclude: false

@coliff
Copy link
Member

coliff commented Sep 20, 2018

oops ok - I edited my post.

@willertrombix
Copy link

willertrombix commented Sep 20, 2018

Hi guys!

First of all, many thanks for your quick support, you rule!

I've tried your solution (putting sitemap.xml in static folder with the code:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  {{ range .Data.Pages }}{{ if ne .Params.sitemap_exclude true }}
  <url>
    <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
    <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
    <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
    <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
  </url>
  {{ end }}{{ end }}
</urlset>

So I deleted the sitemap.xml I used to have in layout > _default folder.
Then I configured Contact page with this front matter:

+++
title = "Contact"
date = 2018-09-06
description = "Please email me with any questions or concerns"
draft = false
toc = false
sitemap_exclude = false
+++  

And I rung hugo server but in localhost I continue to see the contact page listed in the sitemap.xml

Maybe I'm doing something wrong?

PD: I also tried sitemap_exclude = truein the front matter just in case, but didn't work, hehe

@XhmikosR
Copy link
Contributor

sitemap.xml needs to be in layout folder.

@willertrombix
Copy link

It works!!

I really appreciate your help guys!

@willertrombix
Copy link

Sorry for bothering you again... Any solution to also exclude tags from sitemap.xml?

I mean tags are automatically created in public folder when adding them in front matter (something like: tags = ["hello", "world"]) but I want to exclude them from sitemap.xml...

Any idea?

@willertrombix
Copy link

Hi all,

I finally excluded tags from sitemap by putting this in config.toml:

[taxonomies]
  tag = ""

Hope it helps if you guys have the same problem :)

@onedrawingperday
Copy link
Contributor

@willertrombix No. The above empty tag configuration introduces ambiguity to a Hugo project and sooner or later you will encounter unexpected behavior.

However I would like to point out that GitHub issues are not meant for support.

If you want please open a support topic in the Hugo Discussion Forum

@FelicianoTech
Copy link
Contributor

I'm working on this.

@XhmikosR can you confirm if the 404 page showing up in the sitemap is still an issue for you as of Hugo v0.49? My site has a custom 404 page and it's not in my sitemap.

FelicianoTech added a commit to FelicianoTech/hugo that referenced this issue Nov 18, 2018
This PR adds a new key to .Sitemap called `Exclude`. `Exclude` is a
boolean that is set to false by default meaning all available pages are
included. This is the behavior currently. When `Exclude` is set to true,
it will not appear in any `sitemap.xml` files that Hugo may generate.

`Exclude` can be set to true in the Hugo config turning `sitemap.xml`
into an opt-in rather than an opt-out.

Fixes gohugoio#653
@zxkane
Copy link

zxkane commented Feb 1, 2019

The issue(404 page included in sitemap.xml) still exists when generating site by hugo 0.53.

You can reproduce it via the theme dream plus.

FelicianoTech added a commit to FelicianoTech/hugo that referenced this issue Sep 26, 2019
This PR adds a new key to .Sitemap called `Exclude`. `Exclude` is a
boolean that is set to false by default meaning all available pages are
included. This is the behavior currently. When `Exclude` is set to true,
it will not appear in any `sitemap.xml` files that Hugo may generate.

`Exclude` can be set to true in the Hugo config turning `sitemap.xml`
into an opt-in rather than an opt-out.

Fixes gohugoio#653
FelicianoTech added a commit to FelicianoTech/hugo that referenced this issue Sep 26, 2019
This PR adds a new key to .Sitemap called `Exclude`. `Exclude` is a
boolean that is set to false by default meaning all available pages are
included. This is the behavior currently. When `Exclude` is set to true,
it will not appear in any `sitemap.xml` files that Hugo may generate.

`Exclude` can be set to true in the Hugo config turning `sitemap.xml`
into an opt-in rather than an opt-out.

Fixes gohugoio#653
FelicianoTech added a commit to FelicianoTech/hugo that referenced this issue Sep 26, 2019
This PR adds a new key to .Sitemap called `Exclude`. `Exclude` is a
boolean that is set to false by default meaning all available pages are
included. This is the behavior currently. When `Exclude` is set to true,
it will not appear in any `sitemap.xml` files that Hugo may generate.

`Exclude` can be set to true in the Hugo config turning `sitemap.xml`
into an opt-in rather than an opt-out.

Fixes gohugoio#653
FelicianoTech added a commit to FelicianoTech/hugo that referenced this issue Apr 26, 2020
This PR adds a new key to .Sitemap called `Exclude`. `Exclude` is a
boolean that is set to false by default meaning all available pages are
included. This is the behavior currently. When `Exclude` is set to true,
it will not appear in any `sitemap.xml` files that Hugo may generate.

`Exclude` can be set to true in the Hugo config turning `sitemap.xml`
into an opt-in rather than an opt-out.

Fixes gohugoio#653
FelicianoTech added a commit to FelicianoTech/hugo that referenced this issue Apr 26, 2020
This PR adds a new key to .Sitemap called `Exclude`. `Exclude` is a
boolean that is set to false by default meaning all available pages are
included. This is the behavior currently. When `Exclude` is set to true,
it will not appear in any `sitemap.xml` files that Hugo may generate.

`Exclude` can be set to true in the Hugo config turning `sitemap.xml`
into an opt-in rather than an opt-out.

Fixes gohugoio#653
@jehoshua7
Copy link

I have used hidden = true to hide some content, yet the URI's for the (hidden) content still appears in sitemap.xml

I'm also trying to see where the sitemap is built.

@jwflory
Copy link

jwflory commented Nov 21, 2021

It would be great to see @FelicianoTech's changes in PR #6370 reviewed! Seems like it would successfully implement one of the oldest feature requests in Hugo too. 😀

@sk33lz
Copy link

sk33lz commented Feb 8, 2022

I was surprised this didn't already work with the draft=true front matter setting out of the box, but after some tinkering I found that this functionality apparently already exists. The default Hugo sitemap.xml template just isn't using the existing Draft variable to filter the pages included in the range.

I was able to workaround this issue by simply adding an if conditional for the existing Draft front matter variable to an overridden default Hugo sitemap.xml template inside range .Data.Pages.

{{ range .Data.Pages }}
    {{ if not .Draft }}
    <url>
      <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
      <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
      <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
      <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
      <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
      <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
    </url>
    {{ end }}
  {{ end }}

This 2 line fix is working great for me without any additional front matter variables required.

@jmooring
Copy link
Member

See #6370 (comment).

@kolappannathan
Copy link
Contributor

@jmooring I have created a PR that only makes the suggested one line change. Will this be considered?

@jmooring
Copy link
Member

Yes.

jmooring added a commit to jmooring/hugo that referenced this issue Apr 1, 2024
Define global inclusion/exclusion in site configuration, and override
via front matter. For example, to exclude a page from the sitemap:

    [sitemap]
    disable = true # default is false

Closes gohugoio#653
Closes gohugoio#12282

Co-authored-by: kolappannathan <kolappannathan@users.noreply.github.com>
Co-authored-by: felicianotech <FelicianoTech@gmail.com>
@bep bep closed this as completed in #12329 Apr 2, 2024
bep pushed a commit that referenced this issue Apr 2, 2024
Define global inclusion/exclusion in site configuration, and override
via front matter. For example, to exclude a page from the sitemap:

    [sitemap]
    disable = true # default is false

Closes #653
Closes #12282

Co-authored-by: kolappannathan <kolappannathan@users.noreply.github.com>
Co-authored-by: felicianotech <FelicianoTech@gmail.com>
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.