Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitea dump duplicates repositories on windows #21662

Open
eeyrjmr opened this issue Nov 2, 2022 · 11 comments
Open

Gitea dump duplicates repositories on windows #21662

eeyrjmr opened this issue Nov 2, 2022 · 11 comments
Labels

Comments

@eeyrjmr
Copy link
Contributor

eeyrjmr commented Nov 2, 2022

Description

When comparing the resultant archive produced via "gitea dump" between windows and linux, the windows archive is twice as large.

It appears the bare repositories are duplicated in two locations

gitea-dump-####.zip
    custom
    data
        gitea-repositories
                  repo1
                  repo2       
   repos
      repo1
      repo2

Gitea Version

1.17.2

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.34

Operating System

windows, linux

How are you running Gitea?

service

Database

MySQL

@lunny
Copy link
Member

lunny commented Nov 2, 2022

So which one is not in the right place?

@wxiaoguang
Copy link
Contributor

Maybe it's related to a long-standing bug, you shouldn't run gitea dump in gitea directory.

@eeyrjmr
Copy link
Contributor Author

eeyrjmr commented Nov 21, 2022

Maybe it's related to a long-standing bug, you shouldn't run gitea dump in gitea directory.

Apologies for the delay... Interesting bug. I have just tried this and the result is the same

So which one is not in the right place?

Very good question :) I suspect it is linux but it is likely due to a subtle difference in on-disk file structure. Looking at the restore part of the docs: https://docs.gitea.io/en-us/backup-and-restore/#restore-command-restore

unzip gitea-dump-1610949662.zip
cd gitea-dump-1610949662
mv data/conf/app.ini /etc/gitea/conf/app.ini
mv data/* /var/lib/gitea/data/
mv log/* /var/lib/gitea/log/
mv repos/* /var/lib/gitea/repositories/
chown -R gitea:gitea /etc/gitea/conf/app.ini /var/lib/gitea 

the repositories are meant to be in the root of the gitea working directory as this is where the restore sequence is instructing the user to act.

Looking at the dump generated from gitea running in an Alpine VE I see the structure aligns with this

  1. repos directory in the root of the zip containing the repos/orgs
  2. no additional repos stored within the data directory of the zip

Looking at the dump generated from a gitea running in a windows MS I see a subtle difference

  1. repos directory in the root of the zip containing the repos/org
  2. a gitea-repositories directory under the data directory of the zip.

I noticed this oddity some months ago where the backup zip was larger than the on-disk structure but I didn't look into it. I recently pushed some older git repos to the instance running on windows and the recent backups are growing

on-disk = 708Meg
gitea-dump-1668736800.zip = 1,411Meg

the sql dump (the only thing that should be different) is 1Meg in size. I spent a bit of time looking over the dump code but I havn't managed to get my head around how it works to try to understand what it is trying to dump, let alone why it is making this additional directory and only for windows

@eeyrjmr
Copy link
Contributor Author

eeyrjmr commented Nov 21, 2022

I do have this in my app.ini

[repository]
ROOT = D:/gitea/data/gitea-repositories

Now thinking about this... could this be related. Looking at:
https://docs.gitea.io/en-us/config-cheat-sheet/#repository-repository
ROOT: %(APP_DATA_PATH)s/gitea-repositories: Root path for storing all repository data. A relative path is interpreted as AppWorkPath/%(ROOT)s.

So I set this "just in case" based upon the "windows as a service" to include full path:
https://docs.gitea.io/en-us/windows-service/

So a running gitea is correctly reading this location. Now the backup... the backup code does two things

  1. copies the repositories
  2. backs up ./data

since I have repositories in the data subdirectory it is getting archived twice.

So in theory I should be able to comment out the [repository] section, move the D:/gitea/data/gitea-repositories to D:/gitea/gitea-repositories and gitea should keep working but also the gitea dump will be ~ the on-disk size

@lunny
Copy link
Member

lunny commented Nov 21, 2022

So should you move repositories out of data or should Gitea check if repositories directory under ./data?

@eeyrjmr
Copy link
Contributor Author

eeyrjmr commented Nov 21, 2022

So should you move repositories out of data or should Gitea check if repositories directory under ./data?

good question :)
For consistency I should move repositories out of data as this way following the restore from backup makes sense.

should gitea check if the repositories are under ./data ... looking at the issue @wxiaoguang linked there is some commonality as the migration also put the repositories under ./data. Its extra logic to check and skip

@eeyrjmr
Copy link
Contributor Author

eeyrjmr commented Nov 22, 2022

ok its a bit more involved than that...

I commented out the [repository] entry and ran git dump to test:

2022/11/22 08:41:00 ...les/storage/local.go:46:NewLocalStorage() [I] Creating new Local Storage at D:\gitea\data\packages
Failed to include repositories: open D:\gitea\data\gitea-repositories: The system cannot find the file specified.
2022/11/22 08:41:00 cmd/dump.go:241:runDump() [I] Dumping local repositories... D:\gitea\data\gitea-repositories
2022/11/22 08:41:00 cmd/dump.go:159:fatal() [F] Failed to include repositories: open D:\gitea\data\gitea-repositories: The system cannot find the file specified.

that aside, the archive is back to an expected size

image

@eeyrjmr eeyrjmr closed this as completed Nov 23, 2022
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
@techknowlogick
Copy link
Member

re-opening as we've received a similar report via chat

@go-gitea go-gitea unlocked this conversation Jul 25, 2023
@Kalyxt
Copy link

Kalyxt commented Jul 30, 2023

I'll post here additional info.

giteasize

First line is zipped gitea folder which contains entire data, second line is dump created by gitea CLI (1.20.1).

I browsed dump file ale there are duplicated repositories at gitea-dump-1690312222.zip\data\gitea-repositories and gitea-dump-1690312222.zip\repos.

@hesseldijk
Copy link

Hi,

Any more information on this? I'm having the same problem (1.20.2)

@wxiaoguang
Copy link
Contributor

When writing #30240 , I think I understand more about the problems now (the "dump" code wasn't written by me, so it really takes a lot of time to understand what it is doing ....)

The root problem is that some directories overlapped. For example: Gitea expects to backup PathA and PathB. But if PathA=C:\git\data and PathB=C:\git\data\sub, then the dumped file contains duplicate files.

At the moment I don't have a clear plan for a complete rewriting. And I can see that the "dump" command has a lot of problems. So a workaround could be "manually copy the data directory and dump the database", it is more flexible and controllable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants