Duplicate File Hard Linker (DFHL)

Small command line tool to reduce the size of duplicate files on one partition. Despite all other known tools the files in this implementation are not being listed, archived or removed but going to be hard linked using NTFS hard links. The tool runs in Windows NT 4.0 / 2000 / XP and 2003 Server and requires a NTFS file system to run on.

Features

  • Command line Tool for Windows NT4.0/2000/XP/2003
  • Links duplicate files in file system using NTFS "Hard Link" facility. For further information please take a look to Website at Microsoft.
  • Runs recursive through a tree of folders on one partition
  • Compares all files byte-per-byte to ensure really only equal files to be hard linked
  • Check-Mode (default) allows listing of all optimization possibilities. In this mode no changes will be done.
  • Additional options allow control of which parameters of files will be compared, to check attributes and time stamps of files
  • All read/compare operations were optimized for maximum performance and will reach almost the maximum throughput of the physical disk. The system will be loaded only in background.
  • File system short named (8.3 Names) can be lost during hard linking. This can lead to problems with programs requiring 8.3 names.
  • Further optimizations by using file hashes. A hash is now compared before running a byte-by-byte compare.

Limitations

  • Hard Links can only be created on same partition. A permission overriding link creation is not possible due to physical limitations.
  • Only NTFS is supported as file system for hard links.
  • All hard linked files share the same set of file attributes, security descriptors and timestamps. Changes on one file Meta attribute will reflect to the other linked file immediately. This is limited by the underlying file system.
  • When removing files which are hard linked, the space will only be freed on hard drive when the last reference of the file is going to be removed.
  • Already linked files can not be linked again to gain space - of course...

Command Line Parameters

Usage:

dfhl [options] [path] [...]

/? Displays the command line help.
/a File attributes must match for linking. Files with different file attributes will be ignored.
/d Debug Mode, extended output of console.
/h Process hidden files. Hidden files will be skipped per default.
/j Also follow junctions (=reparse points) in file system. Per default, junctions will be skipped and not followed.
/l Process Hard Links for Files. If not specified, tool will just read (test) for duplicates.
/m Also Process small files <1024 bytes, they are skipped by default.
/o List duplicate file result to stdout.
/q Silent Mode.
/r Runs recursive through the given folder list.
/s Process system files. System files will be skipped per default if this option is ommitted
/t Time + Date of files must match before comparing files.
/v Verbose Mode
/w Print further statistics after processing

Use Cases

  • If you have a collection of duplicate files in a disk archive
  • Archive with a lot of similar files which take away a huge amount of space

System Requirements

  • Windows NT 4.0, 2000, XP, 2003 Server
  • Full control (as file permission) over the analyzed files
  • NTFS-File System

Change Log

  • Changes from Version 1.2 to Version 2.0
    • Changing to long path support - paths can now contain up to 32000 characters.
    • Further optimizations, file hashes are now used for compare operations besides a byte-by-byte compare to gain speed. Nevertheless, a byte-by-byte compare is always done to ensure that file content matches.
    • Refactored lot of code and made more object oriented stuff...
    • Further statistics for looking inside.
  • Changes from Version 1.1 to Version 1.2
    • Bugfix if supplied path contained a tailing backslash
    • Program continues to run even if a folder is not accessible
    • Set GPL as software license for program
  • Changes from Version 1.0 to Version 1.1
    • Added Support for Windows NT 4.0, missing Hardlink API was created
    • Removed typos
    • A system check prior execution verifies system environment

Installation

Installation is not required. The program just consists of one .EXE-file, which can be called. Just copy the files to a folder inside the PATH of your system.

Authors

AttachmentSize
DFHL 2.0.zip45.27 KB
DFHL-Source 2.0.zip50.14 KB

duplicate files

check out DuplicateFilesDeleter

Can this handle many files? I

Can this handle many files? I tested on my partition D, dfhl /o D:\

and nothing printed, just like it is waiting for my input or something. (even the version 2.0 is not printed)

And the disk usage is 100%, seems it reads the disk.

My partition D contains over 400,000 files, 900GB in total.

 

After glimpse your code, I found that the function

void addPath(LPWSTR path)

might be a problem, since it pushes all the file paths, which in my case over 400,000 invokes of this function before print anything.

 

btw, I can hardly recognise the CAPTCHA...

RE: Can this handle many files?

I think it should be able to hanlde as many files until memory is exhausted. At least the header sachould print immediately...

Nice Information Jen, my

Nice Information Jen, my little advice to use this "DuplicateFilesDeleter"

Recently i found this tool in internet.

Crash on Windows 7

Running Windows 7 Pro, if I set the program to create hard links, Windows popups a dialog at the time of hard link creation saying the program has stopped working.

 

If I omit the /l switch, then the program stops at the "Skipping real linking" message, and hangs using 100% of a CPU core.  (Ctrl+C does quit it, however)

 

Is there any chance that this useful utility could be updated for Windows 7?  :)

this is when output is redirected

I also met this problem, but for me it happens when I either run dfhl.exe in console window, or redirect its output to a file.

If I start it via "Start → Run", Total Commander's command prompt, or command line START command, it does not crash and works well, except nothing can be seen of its final output, because window immediately closes.

Very useful program

Very useful program. Also useful to make full backups (with fastcopy with verify), and then reduce the needed space by hardlinking the new full backup with last full backup with DFHL with new option /i and /n) (not very fast, but supports very long path and files with different content but same size and same time). Currently no updates by Jens Scheffler or Oliver Schneider, so I did some extensions and some bug fixing, see http://hanss.bplaced.net/ (German).

Really nice program, but a

Really nice program, but a little slow on large volumes with large quantities of files (like my terabyte hard disk with 75000 files). You are using a linked list of size-based groups, while what you really need is a red-black tree (and the elements of this tree should also be red-black trees based on hashes). Not counting the file comparison part, the current effectiveness of your algorithm is O(n^2) while with R-B trees it could be O(n*log(n)).

A lot of thank you for

A lot of thank you for creating this useful tool. I am expecting this tool to save 1 hour per week and 15 GB of disk space due to tight hard disk space.

What algorithm

What algorithm do you use to recognize that 2 files are identically?

The name, checksum, content, creation date time, change date time?