Previously I only looked at WinSXS / System32 hard-link problems
I have now looked at duplicates in general.
When I deselect all drives other than C:\ the default result is
19 pairs of duplicates on C:\.
15 pairs are very important to Comodo Internet Security suite and hopefully that is self-protecting.
4 pairs are very important to actually booting up Windows 7 :-
1 pair is C:\Boot\...\Boot.sdi
3 pairs are C:\boot\...\fonts\???_boot.ttf
It appears to me that removing such duplicates could have significant consequences.
Unfortunately some users will blindly trust CCleaner's actions regardless,
and will Wipe Free space because they think it will give more free space.
By unchecking System and Read Only and Hidden check-boxes another 9 pairs.
Many of these appear to be some sort of reparse point, e.g.
C:\ProgramData\Microsoft\Network\Downloader\qmgr1.dat
C:\Users\All Users\Microsoft\Network\Downloader\qmgr1.dat
Both instances are seen by Windows EXplorer as
4.00 MB (4,194,304 bytes) Modified 27 March 2013, 11:51:11
I searched C:\ for qmgr1.dat using Defraggler version 2.13.
It found the real deal at
C:\ProgramData\Microsoft\Network\Downloader\qmgr1.dat
Quite obviously,
depending upon Access Control Levels which Windows can fumble and users can meddle with,
deleting either the real deal file or the counterfeit shadow could destroy both reality and illusion.
If I additionally uncheck the "File Size Under" box it takes much longer to search and produces many results.
It is worth noting that partitions other than C:\ are commonly used for valuable libraries and archives,
and are not always protected as being hidden or system or Read Only,
therefore if Reparse Points provide alternative Reality and Illusion paths the removal of either duplicate could destroy both.
MY CONCLUSIONS :-
It is essential that all reparse-points and hard-links be excluded from any list of "duplicates".
I agree with other commentators that a checksum must be available to determine if files with the same name are really the same.
I think SHA-512 is valuable to defeat skilled malware creators who can infect a download and manipulate its MD5 hash checksum.
but SHA-512 could be overkill for simply deciding whether the user has duplicated a download,
and perhaps a simple CCRC would suffice.
Checksum validation should have a separate check box.
If the user can uncheck the "size" box to see if he has more than one quality of an MP4 download,
He would also need to avoid hash checksum validation.
Regards
Alan