Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some hubic chunks exist but are not listed #290

Open
jer-sen opened this issue Dec 6, 2017 · 15 comments
Open

Some hubic chunks exist but are not listed #290

jer-sen opened this issue Dec 6, 2017 · 15 comments
Labels

Comments

@jer-sen
Copy link
Contributor

jer-sen commented Dec 6, 2017

After a backup and a check of a repository on hubic, I get this kind of result:

2017-12-03 16:56:45.270 INFO perso Snapshot SNAPSHOT-ID revision 1 with 210263 chunks
2017-12-03 16:56:45.325 WARN SNAPHOST_VALIDATE Chunk 913505278db300622dbd730ddb38ef134360ef056a56c6a6368111389311b772 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.332 WARN SNAPHOST_VALIDATE Chunk 4dde4e9410cc4f2c330a0af630a78c443306b7cafb8f00dfab9c18bf3c3f2c0c referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.335 WARN SNAPHOST_VALIDATE Chunk 884c2534fc7e4ac974e6d84beaa11685eba9253a1c40d81522943e9460323f8f referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.345 WARN SNAPHOST_VALIDATE Chunk ba2b2caacdacf3a51a43670b80af3c307a286c707fd7cee3530a8b1ecb55d0c5 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.347 WARN SNAPHOST_VALIDATE Chunk 81f33e9fcefa7516b0a27671937faf4a1b8c9eec5b21835474cf3e5d4d40ddc0 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.353 WARN SNAPHOST_VALIDATE Chunk 993cffc905a8c2d476ad59deb24e13577659750dac9be6424a0a1946026b565d referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.359 WARN SNAPHOST_VALIDATE Chunk c6f2619e93317318f97a663894d1c47b05d92d1b91b11aca88909a5224d378b1 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.389 WARN SNAPHOST_VALIDATE Chunk 91c8048ab89dc8fc063abb03b9a8675689f92ee844959d5243ec524ab1779f3e referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.391 WARN SNAPHOST_VALIDATE Chunk c82b399d16e0cfc5a4fc7561efef479dcd9ab10d0e2ee3782da0c9e19c62061f referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.403 WARN SNAPHOST_VALIDATE Chunk de074a8a818c686470d4b26b689eae40d6fc4047f8a4fa894c2ad6c5538aba22 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.407 WARN SNAPHOST_VALIDATE Chunk 935bfcd81a790c115a17f8485dca6c2acb36cb82505168ae33ab962c747e1c74 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.407 WARN SNAPHOST_VALIDATE Chunk 91811c0fad624fe585c6c969e46362081dc56ba5d19c0c056080027039be74ca referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.408 WARN SNAPHOST_VALIDATE Chunk 70d1dfe37f6981661d9840b96361776fe1d6b6e19eca03a9b2301b88a70766a9 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.410 WARN SNAPHOST_VALIDATE Chunk 3e1cc802cfffb19c86fce8ceb0af05898302f98daf48b9b1bf81bf4756790040 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.431 WARN SNAPHOST_VALIDATE Chunk 96383d284915b85ce8990246807528c176d6b356566ab2ecf0fb0dd945cc41bd referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.461 WARN SNAPHOST_VALIDATE Chunk bb204d4bfecff104cc6926ef67055d30765d5f113012fd5e39175a702c40b703 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.470 WARN SNAPHOST_VALIDATE Chunk cf5d5b8e2fb9427c7b487104a0943c15ebed17f3fed288f10aa0331f6f32bfbc referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.474 WARN SNAPHOST_VALIDATE Chunk 1dec4723e791952382282cd3df0edf255b76002fad8458f18fdacb37bed5b9ed referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.477 WARN SNAPHOST_VALIDATE Chunk b6abf5813e3baaede604ea6b2d8f48a9417c95d5c2668f8759c2cca264b19f53 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.488 WARN SNAPHOST_VALIDATE Chunk cd57a13468105a0d68eabf8162a0dcbd5197ebe899595885d02c697c7b4b86b6 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.493 WARN SNAPHOST_VALIDATE Chunk 919f2cd6f7cd4d1068595b38b132e1c6426f992281ff29d91faefe54daca7add referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.495 WARN SNAPHOST_VALIDATE Chunk 986def29dffbba7a83ff339843f350274ba05e6aacf84739a61f8ee939ad5261 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.495 WARN SNAPHOST_VALIDATE Chunk 9b293582ea0d5d0f8e65680e0b6344d5d7debadb087f9c6e203ce37e66cca540 referenced by snapshot SNAPSHOT-ID at revision 1 does not exist
2017-12-03 16:56:45.506 ERROR SNAPSHOT_CHECK Some chunks referenced by snapshot SNAPSHOT-ID at revision 1 are missing

It seems that calling an HTTP HEAD on these files returns ok (code 200) but an HTTP GET on the parent directory does not return these files.

I tried several times to delete local cache and remote snapshot, then doing a new backup and check. Each time HTTP HEAD returns ok during backup on the chunks warned by the next check. But there is less and less files warned.

I don't think it's a big issue for chunks (just a small performance issue) but if some snapshot revisions are not correctly listed by duplicacy, some chunks could be considered as not used and then deleted after a prune... with data lost...

Is there a bug in duplicacy ? An issue with hubic (that could be solved in duplicacy with a workaround) ?

@gilbertchen gilbertchen added the bug label Dec 8, 2017
@gilbertchen
Copy link
Owner

I can reproduce the bug, but not sure if it is the same bug. I'm running the latest version, in which the chunk directory is nested with a nesting level of 1. There is a bug that causes Duplicacy not to go deeper than the first level and therefore all chunks can't be found.

If all chunks are in the same directory (which is the default behavior of 2.0.9 or earlier) and only some chunks are missing, then it is a different bug.

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 10, 2017

@gilbertchen It may be the same bug. I use my own build that should be >2.0.10.
Link your fix to this issue and I will give it a try.

@gilbertchen
Copy link
Owner

Here is the fix: 612f6e2

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 11, 2017

With your fix I still have missing chunks after a backup and a check (this files were already missing before):

Fetching chunk 85e1b61b84aa74818dfb8a63e5412297ec4a3b5459760093907679e0b61baa2d
Chunk 85e1b61b84aa74818dfb8a63e5412297ec4a3b5459760093907679e0b61baa2d has been loaded from the snapshot cache
Chunk 9b293582ea0d5d0f8e65680e0b6344d5d7debadb087f9c6e203ce37e66cca540 referenced by snapshot repo-id at revision 26 does not exist
Chunk 96383d284915b85ce8990246807528c176d6b356566ab2ecf0fb0dd945cc41bd referenced by snapshot repo-id at revision 26 does not exist
Chunk ba2b2caacdacf3a51a43670b80af3c307a286c707fd7cee3530a8b1ecb55d0c5 referenced by snapshot repo-id at revision 26 does not exist
Some chunks referenced by snapshot repo-id at revision 26 are missing

Note that I've not tried to delete all revisions and cache to see the result of each HEAD request.

An other idea ?

@gilbertchen
Copy link
Owner

Can you add some debugging logs here?

for _, entry := range entries {
if entry.Type == "application/directory" {
files = append(files, entry.Name + "/")
sizes = append(sizes, 0)
} else {
files = append(files, entry.Name)
sizes = append(sizes, entry.Size)
}
}

For example:

		for _, entry := range entries {
                       LOG_INFO("debug", "%s -> %s %s", dir, entry.Name, entry.Type)
			if entry.Type == "application/directory" {
				files = append(files,  entry.Name + "/")
				sizes = append(sizes, 0)
			} else {
				files = append(files, entry.Name)
				sizes = append(sizes, entry.Size)
			}
		}

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 12, 2017

I ran a check with your logs:

2017-12-12 16:05:26.894 INFO STORAGE_SET Storage set to hubic://HubicDir
2017-12-12 16:05:28.585 TRACE CONFIG_FORMAT Using a static salt and 16384 iterations for key derivation
2017-12-12 16:05:28.592 INFO SNAPSHOT_CHECK Listing all chunks
2017-12-12 16:05:28.592 TRACE LIST_FILES Listing chunks/
2017-12-12 16:06:10.151 INFO debug chunks -> 0000248406c49c17a1d811c065085d01dae6bd9a668c4bea5d7d8bcbaa7cdabf application/octet-stream
2017-12-12 16:06:10.151 INFO debug chunks -> 00009082f23360c4ed6054c267812005da5add43d37c98898770dbac18172ae5 application/octet-stream
2017-12-12 16:06:10.151 INFO debug chunks -> 0000f89877bb5407b37805a28d4339fc86af6b7f6e81c9eeff8d03d4f32d6d2e application/octet-stream
2017-12-12 16:06:10.151 INFO debug chunks -> 00016eb0162817db527dcb0d5ebe554da15105e097d046e97d20f4e5caf7f66b application/octet-stream
2017-12-12 16:06:10.151 INFO debug chunks -> 00017a6f52ccb14c84f584221ed3bb864305f56185f1444f6cfdb9352fe99fc2 application/octet-stream
2017-12-12 16:06:10.151 INFO debug chunks -> 00019580dded16ef48229e3f8125bf162f3eea69302762300b0048c46e675159 application/octet-stream
2017-12-12 16:06:10.151 INFO debug chunks -> 0001fd41b266015a8ee43809464fc8ecd3b723396b041d140fe0b810ed60aa3a application/octet-stream
...
2017-12-12 16:06:11.081 INFO debug chunks -> ffff1fe2e3a5f9c020cbf99a27b35c6f07d6725b9f200bcddc9c77245deb971b application/octet-stream
2017-12-12 16:06:11.081 INFO debug chunks -> ffffe245254e68f8b5f0953fe721b947715cd7dcebb10ddbf9e3b64563141530 application/octet-stream
2017-12-12 16:06:11.081 INFO debug chunks -> ffffed87f70f3a565498a66523531963b967796e1cfbbc9a208790cd322da17d application/octet-stream
2017-12-12 16:07:17.395 WARN SNAPHOST_VALIDATE Chunk ba2b2caacdacf3a51a43670b80af3c307a286c707fd7cee3530a8b1ecb55d0c5 referenced by snapshot SNAPSHOT-ID at revision 71 does not exist
2017-12-12 16:07:17.411 WARN SNAPHOST_VALIDATE Chunk 96383d284915b85ce8990246807528c176d6b356566ab2ecf0fb0dd945cc41bd referenced by snapshot SNAPSHOT-ID at revision 71 does not exist
2017-12-12 16:07:17.480 WARN SNAPHOST_VALIDATE Chunk 9b293582ea0d5d0f8e65680e0b6344d5d7debadb087f9c6e203ce37e66cca540 referenced by snapshot SNAPSHOT-ID at revision 71 does not exist
2017-12-12 16:07:17.480 ERROR SNAPSHOT_CHECK Some chunks referenced by snapshot SNAPSHOT-ID at revision 71 are missing

The 3 missing chunks do not appear except at the end.

@gilbertchen
Copy link
Owner

I created about 26K chunks in my test and couldn't reproduce this bug. Can you run duplicacy -d check and search for lines like:

GET https://lb9911.hubic.ovh.net/v1/AUTH_41f3e48bfb5acdfa79843c0c82fedb12/default?format=json&limit=100&delimiter=%2f&prefix=storage%2Fchunks%2F&marker=storage%2Fchunks%2Fffd61597431acb53472dd6a18b5c6c9be339add3b7e498813e4aeee4514c0724

These are the calls to list files on the hubic storage. Can you compare those missing chunks with markers shown in these call? Are missing chunks close to markers or do they happen to be the markers?

You can also play with the number of files returned in a call by changing this line:

Another option is to re-initialize the storage with version 2.0.10 and then the chunk directory will become nested. This can significantly reduce the number of files under one directory and hopefully the issue will go away. However, you'll need to re-upload all chunks.

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 12, 2017

I had already tried changing count with different numbers (not multiple nor divisor of 1000) with same result...

I also tried to start listing from the first missing chunk and got 68 more missing chunks (total 71).

I (re)tried a HEAD on the first missing chunk and it is still ok (and I also get a correct 404 with a wrong chunk id):

a, b, c, d := client.call(client.Credential.Endpoint + "/default/" + "folder/chunks/ba2b2caacdacf3a51a43670b80af3c307a286c707fd7cee3530a8b1ecb55d0c5", "HEAD", 0, nil)
LOG_INFO("debug", "%v %v %v %v", a, b, c, d)

=>

HEAD https://lb1.hubic.ovh.net/v1/AUTH_0749e82ddfd6e997b5b4abd8adf8f467/default/folder/chunks/ba2b2caacdacf3a51a43670b80af3c307a286c707fd7cee3530a8b1ecb55d0c5
{} 1392238 application/octet-stream <nil>

Could it be a permission issue ? (list vs get)
A bug on hubic ?
Do you want a temporary access to my hubic account to run some tests ? (without key to decrypt data)

@gilbertchen
Copy link
Owner

What is the total number of files under the chunks directory? I thought you only have one revision with 210263 chunks but obviously you have much more than that...

There is a similar bug on OneDrive that only occurs when there are more than 20K chunks: OneDrive/onedrive-api-docs#740.

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 13, 2017

I have 203.372 files in my chunks folder according to the number of outputs due to LOG_INFO("debug", "%s -> %s %s", dir, entry.Name, entry.Type) line.

FYI I get a 405 - Not Allowed error when I try to view my chunks folder (and only this folder) from hubic web site...

So what do we do ?

I can try a new backup in an other hubic folder...

@gilbertchen
Copy link
Owner

Your best bet is to start a new backup. If you use the 2.0.10 version to initialize a new storage directory, the chunks directory will have a nesting level of 1.

You can also try using rclone to copy the missing chunk to your local machine to see if they have the same issue.

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 16, 2017

I tried rclone but since listing fails to find the chunks I can't do anything.

I'm starting a new backup... I hope it will work.

I think you can close this issue. This is probably a problem on hubic.

@gilbertchen
Copy link
Owner

A fix is to check the existence of each missing chunk individually if they can't be found by the list call. I started to think that this fix is needed now considering both OneDrive and Hubic have the same issue. And you can't rely on them to fix it -- OneDrive/onedrive-api-docs#740 hasn't been updated in 25 days.

Of course as you pointed out the snapshot revisions may still not being completely listed, but at least we hope there won't be too many revisions for this bug to appear.

@jer-sen
Copy link
Contributor Author

jer-sen commented Dec 16, 2017

Same for hubic, I sent an email to the support 40 days ago, no answer...

Implementing a HEAD to confirm missing files would be a nice feature.
You could also add an option to check all chunk with HEAD instead of LIST.
And an option to reupload missing chunks would also be nice.

Up to you ! For me it's a nice to have, there are more important issues/features.

@gilbertchen
Copy link
Owner

This issue has been mentioned on Duplicacy Forum. There might be relevant details there:

https://forum.duplicacy.com/t/fix-missing-chunks/1095/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants