Skip to content

binbash23/dupfinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 

Repository files navigation


NAME
	dupfinder

DESCRIPTION

Use dupfinder to find duplicate files in big archives of files.
Dupfinder searches the given path for files, creates a hash database for
all files and searches for files which have the same hash. So if you
have a picture or video with 2 diffenrent names or in different folders,
you can identify them and create an archive script and a delete script.

Dupfinder can be used to search for duplicate pictures in huge picture
collections for example.

SYNOPSIS

	dupfinder OPTIONS
   
OPTIONS

	-a Search in path for duplicates (requires -r to set the root path). 
	   This option scans the root path for files, creates hashes for them 
	   and searches for duplicate files. The filehash database will also
     be cleaned from files that do not exist any more in the filesystem.
     You can use dupfinder -g 
	   later to generate scripts for moving the duplicates to an archive 
	   folder. So if you have a picture or video with 2 different names or 
	   in different folders, you can identify them and create an archive 
	   script and a delete script.
	   The delete script will only delete duplicates - one version of the
	   duplicate files will be kept in the root path.
	-c Calculate hashes for files in filelist file and add them to 
	   filehash database
	-C Create duplicate files file from hash database
	-D Show contents of duplicates file
	-d Delete not existing files from filehash database
	-f search root path for files and create plain files file (requires -r)
	-g Generate scripts to copy duplicates and to delete duplicates
	-h Show usage information
	-H Show contents of hash database
	-L Show contents of filelist database
	-n Calculate new hash for all files found, even if they exist in hash db
        -N Skip calculating sum of filesizes (size calculation may take some time 
           in big archives)
	-p Create filelist database for root path (requires -r)
	-r [PATH] set root path
	-R Delete filehash database files
	-s Show statistics of database
	-v Verbose mode
	-V Show program version

EXAMPLES

1) To search duplicate files in path /tmp/picture use:

	> ./dupfinder -avr /tmp/Pictures

   3 files will be created:

	> filelist.db - list of files in root path
	> filehash.db - list of all files with every hash value
	> duplicates.db - list of duplicate files in
			
2) To get help for all options use:

	> ./dupfinder -h
	
3) To generate 2 scripts which can be used to archive the duplicates to
   an archive folder and to delete the duplicates use:
   
	> ./dupfinder -g
	
   2 scripts will be generated:
   
	> copy_dups_from_root_path_to_archive.sh 
   
   This script which will copy all duplicates in root path to
   an archive folder in your current directory called "duplicates".
   
   	> delete_dups_from_root_path.sh
	
   This script can be used to delete all duplicate files from the
   root path. If you want to exclude several folders, you can modify
   the script with grep for example.
   The delete script contents only the duplicates. If there are 3
   similar files, the delete script will only have 2 of them deleted.
   
   Just have a look into the delete script:
   
   > tail delete_dups_from_root_path.sh 
     rm -v '/tmp/2015/921.JPG'
     rm -v '/tmp/2015/.thumbs/923.JPG'
     
   If you don't want to delete file from the thumbnail folders, create
   a new script:
   
   > grep -v ".thumbs" > delete_dups_from_root_path_custom.sh

4) To find out about the duplicates statistics use:

   > ./dupfinder -s

     Dupfinder Statistics

     Files in filehash database : 4774	(537 Kb)	modified: 2020-10-03 17:06:59.953627326 +0200
     Files in plain file list   : 4774	(378 Kb)	modified: 2020-10-03 17:06:08.454297780 +0200
     Files in duplicates file   : 1529	(130 Kb)	modified: 2020-10-03 17:07:04.149572704 +0200
     Size of all duplicates     : 1.489 GB     

REPORTING BUGS
	Report bugs to <binbash@gmx.net>

AUTHOR
	dupfinder by Jens Heine <binbash@gmx.net> 2020
	             and Dennis Brossat <dennis.brossat@email.de>
		     and Benjamin Heine <benjaminheine@gmx.net>