ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Remove duplicate files
Submitter: Brian Davis (other recipes)
Last Updated: 2008/03/17
Version no: 1.0
Category: Files

 

Not Rated yet


Description:

A little script to remove duplicate files. Uses md5sum and a dictionary. There may be a shorter way to do it but this was simple. Works only on cygwin/Linux/Unix systems.

Source: Text Source

import os, sys

# get checksums this may take a while
print "Collecting checksums..."
stdin, stdout = os.popen2("md5sum *.txt")
sums = stdout.readlines()

# sorting files
print "Sorting files..."
ls = {}
for s in sums:
	md5, file = s.split()
	# remove the stupid asterisk
	file = file[1:]
	if md5 in ls:
		ls[md5].append(file)
	else:
		ls[md5] = [file]
		
print "Deleting dupes..."
n = 0
for md5 in ls:
	for file in ls[md5][1:]:
		os.remove(file)
		n += 1
print "Operation complete. %d files removed." % n

Discussion:



Add comment

No comments.



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. Treat the Win32 Registry ...

4. Watching a directory ...

5. Union Find data structure

6. Function Decorators by ...

7. MS SQL Server log monitor

8. Table objects with ...

9. wx twisted support using ...

10. More accurate sum




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.