Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multilingual multihost: Branch bundle resource inadvertently published to other language when content and resource have the same file name #12320

Closed
ov opened this issue Mar 31, 2024 · 11 comments · Fixed by #12331

Comments

@ov
Copy link

ov commented Mar 31, 2024

I have a multilingual multihost site, so different languages are rendered to different domains (and so folders in public). The content for different languages is completely independent, just some layouts, partials and shortcodes are shared.

Starting from v0.123 I see that some content is "leaking" from one language to another. Say I have this:

content
├── en
│   └── folder
│       ├── subfolder
│       │   └── file.php
│       └── xxx.php
└── ru
    └── folder
        └── subfolder
            └── file.php

The config is:

defaultContentLanguage = 'en'
defaultContentLanguageInSubdir = true

[languages]
	[languages.en]
		baseURL = "https://site.com/"
		languageName = "English"
		contentDir = "content/en"
		staticDir = "static/en"
	[languages.ru]
		baseURL = "https://site.ru/"
		languageName = "Russian"
		contentDir = "content/ru"
		staticDir = "static/ru"

The generated sites structure looks like this:

public
├── en
│   └── folder
│       ├── subfolder
│       │   └── file.php
│       └── xxx.php
└── ru
    └── folder
        ├── subfolder
        │   └── file.php
        └── xxx.php

Note that xxx.php has leaked to the other language site. This didn't happen in v0.122.0, it started in v0.123.0 and can be reproduced in latest.

The sites have multiple pages, but this only happens to "resources", not html or markdown files. I see it happens to php files in v0.123.0, but if switching to latest it starts affecting images in folders without html or markdown files. Looks like it's getting worse.

I see some similar issues around, but they are mostly related to "shared" translations where the resources are mixed with multilingual html content. In my case the content is completely separate, assuming no "leaking" should ever happen.

Any ideas?

Thank you!

What version of Hugo are you using (hugo version)?

This started with v0.123.0 and still happens in latest. Works fine in v0.122.0

Does this issue reproduce with the latest release?

Yes

@ov ov changed the title resource "leaking" im multilingual multiple-host site since 0.123 resource "leaking" in multilingual multiple-host site since 0.123 Mar 31, 2024
@jmooring
Copy link
Member

jmooring commented Mar 31, 2024

TLDR

This behavior is expected and desired with v0.123.0 and later, related to these announced breaking changes:

Explanation

Let's change the example to use JPEG files instead of PHP files. This makes no functional difference, but removes any conceptual confusion regarding content vs. template vs. resource.

content/
├── en/
│   └── folder/
│       ├── subfolder/
│       │   └── a.jpg
│       └── b.jpg
└── ru/
    └── folder/
        └── subfolder/
            └── a.jpg

This site has one page... the home page.

Due to the changes noted above, each of these files is a page resource for the en home page:

  • content/en/folder/subfolder/a.jpg
  • content/en/folder/b.jpg

Page resources are shared across languages. In a multihost configuration this produces:

public/
├── en/
│   ├── folder/
│   │   ├── subfolder/
│   │   │   └── a.jpg  <-- copied from content/en/folder/subfolder/a.jpg
│   │   └── b.jpg      <-- copied from content/en/folder/b.jpg
│   └── index.html
└── ru/
    ├── folder/
    │   ├── subfolder/
    │   │   └── a.jpg  <-- copied from content/ru/folder/subfolder/a.jpg
    │   └── b.jpg      <-- copied from content/en/folder/b.jpg
    └── index.html

Again, this is the expected behavior now that a branch bundle's resources may reside in subfolders.

@ov
Copy link
Author

ov commented Mar 31, 2024

@jmooring thanks for the prompt reply and sorry that I missed that this is now by design.

Is there a chance you could consider making this sharing optional?

I see where it helps, but in my case I end up with hundreds of images copied to another website, although that page does not exist on that website at all.

Maybe we could have a flag in the front matter? Or maybe using a different translationKey could stop sharing?

@jmooring jmooring reopened this Mar 31, 2024
@jmooring
Copy link
Member

although that page does not exist on that website at all

In the example above, there is only one page (the home page), and it obviously exists on both sites. All of the JPEG files are page resources, belonging to the home page branch bundle. We have no way of knowing that you don't want to share b.jpg between the two languages.

It sounds to me like you need to change were some of these resources live. I suspect you have images in branch bundles that should instead live in page bundles.

@ov
Copy link
Author

ov commented Mar 31, 2024

I have a few problems after the migration and the example above is a simple one. I could probably survive by moving those files to the language-specific subfolders of static. Not the best solution, but at least they stay together there.

The other example I mentioned is slightly different. Here's what it looks like:

content
└─── en
    └─── gallery
        ├── _index.html
        ├── set1
        │   ├── page1.html
        │   ├── thumbnail1.png
        │   ├── page2.html
        │   └── thumbnail2.png
        └─── set2
            ├── page3.html
            ├── thumbnail3.png
            ├── page4.html
            └── thumbnail4.png

The /gallery/ page on the en website displays all the thumbnails and each thumbnail is a link to the corresponding page.

I might have abused the leaf/branch structure here a bit, but it works within en, so I suppose it is more or less OK. Another thing is that those files are generated by a 3rd party software, so I would avoid changing that structure for now. Finally, of course there's much more sets there and each has quite a lot of pages and thumbnails.

Anyways, the other translation of the website does not have that gallery folder at all. There's no /content/ru/gallery/ folder and I expect that as there's no "sharing" between the translations, the resources should not be copied.

Here's what I get instead:

public
├── en
│   └─── gallery
│       ├──── set1
│       │    ├── page1
│       │    │   └── index.html
│       │    ├── thumbnail1.png
│       │    ├── page2
│       │    │   └── index.html
│       │    └── thumbnail2.png
│       └─── set2
│           ├── page3
│           │   └── index.html
│           ├── thumbnail3.png
│           ├── page4
│           │   └── index.html
│           └── thumbnail4.png
└── ru
    └─── gallery
        ├──── set1
        │   ├── thumbnail1.png
        │   └── thumbnail2.png
        └─── set2
            ├── thumbnail3.png
            └── thumbnail4.png

The question is why do I get a copy of the thumbnails in ru?

We have no way of knowing that you don't want to share b.jpg between the two languages.

I believe that there are some cases when we actually have a way. Besides the specific "no share" flag, here's what comes to mind from the above.

I don't want to share b.jpg between two languages if both translations have contentDir specified (not a filename-based translation) and:

  • if there is no such page bundle in that other translation; or
  • if there is such a page, but their translationKeys are different

I tried the latter one by creating a dummy /content/ru/gallery/ folder, placing index.md there and put different translationKey values to both files. I see that this breaks the .Translations list, but doesn't prevent resources sharing (which feels like inconsistency bug to me).

What do you think?

@jmooring
Copy link
Member

jmooring commented Mar 31, 2024

why do I get a copy of the thumbnails?

Does the ru folder contain a gallery page or section? I ask becaiuse this content structure:

content/
├── en/
│   ├── gallery/
│   │   ├── set1/
│   │   │   ├── page1/
│   │   │   │   └── index.md
│   │   │   ├── page2/
│   │   │   │   └── index.md
│   │   │   ├── thumbnail1.jpg
│   │   │   └── thumbnail2.jpg
│   │   ├── set2/
│   │   │   ├── page1/
│   │   │   │   └── index.md
│   │   │   ├── page2/
│   │   │   │   └── index.md
│   │   │   ├── thumbnail3.jpg
│   │   │   └── thumbnail4.jpg
│   │   └── _index.md
│   └── _index.md
└── ru/
    └── _index.md

Is published to:

public/
├── en/
│   ├── gallery/
│   │   ├── set1/
│   │   │   ├── page1/
│   │   │   │   └── index.html
│   │   │   ├── page2/
│   │   │   │   └── index.html
│   │   │   ├── thumbnail1.jpg
│   │   │   └── thumbnail2.jpg
│   │   ├── set2/
│   │   │   ├── page1/
│   │   │   │   └── index.html
│   │   │   ├── page2/
│   │   │   │   └── index.html
│   │   │   ├── thumbnail3.jpg
│   │   │   └── thumbnail4.jpg
│   │   └── index.html
│   └── index.html
└── ru/
    └── index.html

If you have a gallery section on the ru site, the images will be copied to the ru site because they are resources of the gallery branch bundle. Perhaps you should make the set directories into branch bundles, or place the images within the leaf bundles.

@ov
Copy link
Author

ov commented Mar 31, 2024

these files are resources of the home page which exists in both languages.

The content/en/gallery folder has _index.md, so I suppose the images are the resources of gallery, not the home page.

The home page has its own _index.html, I just didn't mention it above for simplicity.

Does the ru folder contain a gallery page or section ?

No, I mentioned that above. There is no content/ru/gallery folder there at all.

@jmooring
Copy link
Member

jmooring commented Mar 31, 2024

Please try this site and tell me what's different:

git clone --single-branch -b hugo-github-issue-12320 https://github.com/jmooring/hugo-testing hugo-github-issue-12320
cd hugo-github-issue-12320
rm -rf public/ && hugo && tree public

EDIT: I just force pushed a change. 2024-03-31T14:20:05-07:00.

@ov
Copy link
Author

ov commented Mar 31, 2024

OK, my bad, I didn't provide all the information. Your repo works well and there's no gallery folder in ru after cloning.

The thing is that in the real website here, the pageN.html and thumbnailN.png actually share the same name. I didn't think it matter, but it looks like it does.

Try renaming content/en/gallery/set1/thumbnail1.jpg in your repo to page1.jpg and rebuild the site. You will see the problem.

@jmooring
Copy link
Member

jmooring commented Mar 31, 2024

A minimal example looks like this:

Configuration:

defaultContentLanguage = 'en'
defaultContentLanguageInSubdir = true

[languages.en]
baseURL = "https://site.com/"
contentDir = "content/en"

[languages.ru]
baseURL = "https://site.ru/"
contentDir = "content/ru"

Content:

content/
├── en/
│   └── s1/
│       ├── p1.jpg
│       └── p1.md
└── ru/

Expected (v0.122.0):

public/
├── en/
│   ├── s1/
│   │   ├── p1/
│   │   │   └── index.html
│   │   ├── index.html
│   │   └── p1.jpg
│   └── index.html
└── ru/
    └── index.html

Actual (v0.123.0 and later):

public/
├── en/
│   ├── s1/
│   │   ├── p1/
│   │   │   └── index.html
│   │   ├── index.html
│   │   └── p1.jpg
│   └── index.html
└── ru/
    ├── s1/
    │   └── p1.jpg
    └── index.html

This may be related to #12198.

Test case

func TestFoo(t *testing.T) {
	t.Parallel()

	files := `
-- hugo.toml --
disableKinds = ['rss','sitemap','taxonomy','term']
defaultContentLanguage = 'en'
defaultContentLanguageInSubdir = true
[languages.en]
baseURL = "https://en.example.org/"
contentDir = "content/en"
[languages.fr]
baseURL = "https://fr.example.org/"
contentDir = "content/fr"
-- content/en/s1/p1.md --
---
title: p1
---
-- content/en/s1/p1.txt --
---
p1.txt
---
-- layouts/_default/single.html --
{{ .Title }}|
-- layouts/_default/list.html --
{{ .Title }}|
`

	b := hugolib.Test(t, files)

	b.AssertFileExists("public/en/s1/index.html", true)
	b.AssertFileExists("public/en/s1/p1/index.html", true)
	b.AssertFileExists("public/en/s1/p1.txt", true)

	b.AssertFileExists("public/fr/s1/index.html", false)
	b.AssertFileExists("public/fr/s1/p1/index.html", false)
	b.AssertFileExists("public/fr/s1/p1.txt", false) // failing test
}

@jmooring jmooring changed the title resource "leaking" in multilingual multiple-host site since 0.123 multilingual multihost: Branch bundle unintentionally published to other language when markdown and resource have the same file name Apr 1, 2024
@jmooring jmooring changed the title multilingual multihost: Branch bundle unintentionally published to other language when markdown and resource have the same file name multilingual multihost: Branch bundle resource inadvertently published to other language when content and resource have the same file name Apr 1, 2024
@bep bep self-assigned this Apr 2, 2024
bep added a commit to bep/hugo that referenced this issue Apr 2, 2024
bep added a commit to bep/hugo that referenced this issue Apr 2, 2024
@bep bep closed this as completed in #12331 Apr 2, 2024
@ov
Copy link
Author

ov commented Apr 2, 2024

I can confirm the problem is fixed, thank you.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants