|
Description:
This recipe uses the win32file.FindFilesW() function to efficiently calculate total size of a folder or volume, and additionally handles cases where a cutoff size is desired or errors are encountered along the path.
Source: Text Source
import win32file as _win32file
import sys as _sys
class FolderSize:
"""
This class implements an efficient technique for
retrieving the size of a given folder or volume in
cases where some action is needed based on a given
size.
The implementation is designed to handle situations
where a specific size is desired to watch for,
in addition to a total size, before a subsequent
action is taken. This dramatically improves
performance where only a small number of bytes
are sufficient to call off a search instead of
waiting for the entire size.
In addition, the design is set to handle problems
encountered at points during the search, such as
permission errors. Such errors are captured so that
a user could further investigate the problem and why
it occurred. These errors do not stop the search from
completing; the total size returned is still provided,
minus the size from folders with errors.
When calling a new search, the errors and total size
from the previous search are reset; however, the stop
size persists unless changed.
"""
def __init__(self):
self.totalSize = 0
self.errors = {}
self._stopSize = -1
self.verbose = 0
def enableStopSize(self, size=0):
"""
This public method enables the stop size
criteria. If the number of bytes thus far
calculated exceeds this size, the search is
stopped.
The default value is zero bytes and means anything
greater will end the search.
"""
if type(size) != int:
print "Error: size must be an integer"
_sys.exit(1)
self._stopSize = size
def disableStopSize(self):
"""
This public method disables the stop size
criteria. When disabled, the total size of
a folder is retrieved.
"""
self._stopSize = -1
def showStopSize(self):
"""
This public method displays the current
stop size in bytes.
"""
print self._stopSize
def searchPath(self, path):
"""
This public method initiates the process
of retrieving size data. It accepts either
a UNC or local drive path.
"""
self.totalSize = 0
self.errors = {}
self._getSize(path)
def _getSize(self, path):
"""
This private method calculates the total size
of a folder or volume, and accepts a UNC or
local path.
"""
if self.verbose: print path
try:
items = _win32file.FindFilesW(path + "\\*")
except _win32file.error, details:
self.errors[path] = str(details[-1])
return
for item in items:
attr = item[0]
name = item[-2]
size = item[5]
if attr & 16:
if name != "." and name != "..":
self._getSize("%s\\%s" % (path, name))
self.totalSize += size
if self._stopSize > -1:
if self.totalSize > self._stopSize:
return
if __name__ == "__main__":
sizer = FolderSize()
sizer.searchPath(r"d:\users1\jsmith")
print sizer.totalSize
sizer.enableStopSize(1024)
sizer.searchPath(r"d:\users1\jsmith")
if sizer.totalSize > 1024:
print "The folder meets the criteria."
elif sizer.totalSize == 0:
print "The folder is empty."
else:
print "The folder has some data but can be skipped."
if sizer.totalSize == 0 and sizer.errors:
print sizer.errors
Discussion:
At my job, I needed to determine whether a given folder had data in it. The criteria therefore was anything greater than zero bytes would be flagged.
Using os.listdir() or os.walk() (without getting into details) weren't fast enough for gigabytes of data. I then started using the COM component FileSystemObject().GetFolder().Size property to get the total size, which had good speed, but didn't work when errors occurred. The component also wasn't as flexible in that I had to wait for the total size before moving to the next root path.
Having sniffed the traffic to inspect how the FSO worked, I noticed it used the Win32 API's FindFirstFile* functions which are much more efficient to calculate size, and already implemented in win32file.FindFilesW() by Mark Hammond and friends. (Didn't realize this before, or forgot about it if I did.)
I wrapped a class around the win32file functions and added additional features like handling errors so that the search could still continue, and a cutoff size so that total sizes weren't returned when very small criteria were needed (like zero bytes).
|