Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When generating mutiple checksums, Only Read Files once, but generate multiple checksums simultaneously to speed up checksumming. #127

Open
barkoder opened this issue Oct 24, 2024 · 6 comments

Comments

@barkoder
Copy link

Say I tell clonezilla to generate md5 , sha256 and b2sum of individual files on a drive containing 2 partitions. As of clonezilla-live-3.1.1-27-amd64.iso , this is how generating mutiple checksums in Clonezilla works.

  1. Clone Partition 1.
  2. Generate md5sum of files in partition 1 by reading the files.
  3. Generate sha256sum of files in partition 1 by reading the files AGAIN.
  4. Generate b2sum of files in partition 1 by reading the files AGAIN.
  5. Clone Partition 2.
  6. Generate md5sum of files in partition 2 by reading the files.
  7. Generate sha256sum of files in partition 2 by reading the files AGAIN.
  8. Generate b2sum of files in partition 2 by reading the files AGAIN.

Clonezilla is reading the same files three times in steps 2,3,4 and in steps 6,7,8.

This significantly increases wear on the disk.

This disk could fail at Step 2,3 or 4. And failing steps 5-8 altogether.

The way it should work is...

  1. Clone Partition 1.
  2. Clone Partition 2.
  3. Generate md5sum,sha256sum, b2sum of files in partition 1 simultanously.
  4. Generate md5sum,sha256sum, b2sum of files in partition 2 simultanously.

Why not just read the files once and generate multiple checksums simultaneously using cat and tee ?

Generate list of files with full path in a given partition and store in /tmp/list_of_files_in_dev_sda1.txt

I'm not a shell expert but something like this ?

$ IFS=$'\n' ;
for i in $(cat /tmp/list_of_files_in_dev_sda1.txt) ;
	do cat "$i" | tee >(md5sum >> /tmp/md5sum_of_files_in_dev_sda1.txt) | tee >(sha256sum >> /tmp/sha256sum_of_files_in_dev_sda1.txt) | b2sum  >> /tmp/b2sum_of_files_in_dev_sda1.txt ;
done
unset IFS

The above command will not append the name of the files themselves into the checksum file list . But I'm sure there's a way in shell to also get the names of the files into the files.

This would significantly speed up checksumming and the overall cloning process and most importantly minimize wear on the disk.

Also related #126

Thanks!

@stevenshiau
Copy link
Owner

Thanks for this idea. However, I believe actually you should just choose one of the checksum methods. I suggest that b2sum is good enough.
Of course, this can be improved. We will try to do that in the future.

Steven

stevenshiau added a commit that referenced this issue Dec 11, 2024
Improve the checksums mechanism for Clonezilla image. Make it read once
and pass to multiple checksum programs.
Thanks to barkoder.
Ref: #127
@stevenshiau
Copy link
Owner

Thanks for your suggestion. This feature has been implemented in Clonezilla live >= 3.2.0-27 or 20241213-*:
https://clonezilla.org/downloads.php
Let us know the results if you test that.
Thanks.

Steven

@barkoder
Copy link
Author

Tested clonezilla-live-3.2.0-32-amd64.iso .

I selected md5sum and b2sum in expert mode.
After successful completion of the cloning process, it started catting files(including binaries!) into the terminal.

Please fix.
Thanks!

@stevenshiau
Copy link
Owner

Could you please show the files list in your image dir by running:
ls -lh /home/image/IMAGE
(replace IMAGE with your image name).
Thanks.

Steven

@barkoder
Copy link
Author

$ ls -lh

total 463G
-rwxrwxrwx+ 1 Administrators Administrators  979 Jan 12 14:08 B2SUMS
-rwxrwxrwx+ 1 Administrators Administrators 1.3K Jan 12 14:08 blkdev.list
-rwxrwxrwx+ 1 Administrators Administrators  943 Jan 12 14:08 blkid.list
-rwxrwxrwx+ 1 Administrators Administrators  222 Jan 12 12:03 dev-fs.list
-rwxrwxrwx+ 1 Administrators Administrators    4 Jan 12 14:08 disk
-rwxrwxrwx+ 1 Administrators Administrators   13 Jan 12 14:08 dmraid.table
-rwxrwxrwx+ 1 Administrators Administrators  307 Jan 12 14:08 MD5SUMS
-rwxrwxrwx+ 1 Administrators Administrators   20 Jan 12 14:08 parts
-rwxrwxrwx+ 1 Administrators Administrators   33 Jan 12 09:39 sda1.info
-rwxrwxrwx+ 1 Administrators Administrators  26M Jan 12 09:39 sda1.ntfs-ptcl-img.uncomp
-rwxrwxrwx+ 1 Administrators Administrators  48G Jan 12 10:06 sda2.ntfs-ptcl-img.uncomp
-rwxrwxrwx+ 1 Administrators Administrators 208G Jan 12 12:03 sda3.ntfs-ptcl-img.uncomp
-rwxrwxrwx+ 1 Administrators Administrators  512 Jan 12 14:08 sda4-ebr
-rwxrwxrwx+ 1 Administrators Administrators 208G Jan 12 14:08 sda5.ntfs-ptcl-img.uncomp
-rwxrwxrwx+ 1 Administrators Administrators   37 Jan 12 14:08 sda-chs.sf
-rwxrwxrwx+ 1 Administrators Administrators 1.0M Jan 12 14:08 sda-hidden-data-after-mbr
-rwxrwxrwx+ 1 Administrators Administrators  512 Jan 12 14:08 sda-mbr
-rwxrwxrwx+ 1 Administrators Administrators  535 Jan 12 14:08 sda-pt.parted
-rwxrwxrwx+ 1 Administrators Administrators  458 Jan 12 14:08 sda-pt.parted.compact
-rwxrwxrwx+ 1 Administrators Administrators  381 Jan 12 14:08 sda-pt.sf

Also

$ cat B2SUMS

bXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX2  blkdev.list
2XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX67  blkid.list
4XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX21  dev-fs.list
7XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXf3  disk
9XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX9  dmraid.table
8XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX8f  parts
6XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXff  sda1.info

$ cat MD5SUMS

8XXXXXXXXXXXXXXXXXXXXXXXXX6  blkdev.list
aXXXXXXXXXXXXXXXXXXXXXXXXX1  blkid.list
3XXXXXXXXXXXXXXXXXXXXXXXXX5  dev-fs.list
8XXXXXXXXXXXXXXXXXXXXXXXXXb  disk
fXXXXXXXXXXXXXXXXXXXXXXXXX6  dmraid.table
kXXXXXXXXXXXXXXXXXXXXXXXXX5  parts
1XXXXXXXXXXXXXXXXXXXXXXXXX7a  sda1.info

The binary catting appears to have happened while reading the *uncomp files, because that's when I panicked and powered off the computer.

stevenshiau added a commit that referenced this issue Jan 14, 2025
Bug fixed: wrong total number for chosen checksum method.
Thanks to barkoder.
Ref: #127 (comment)
@stevenshiau
Copy link
Owner

Please give testing Clonezilla live >= 3.2.0-33 or 20250114-* a try:
https://clonezilla.org/downloads.php
This issue should have been fixed.
If you test, please let us know the results.
Thanks.

Steven

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants