A Simple Directory Walker with Filter

The Problem

In my previous post, I presented a simple directory walker which solved some of my annoyances. That directory walker is not not perfect. There are times when I want to filter out the files:

for path_name in dirwalker('/path/to/dir'):
   if some_condition(path_name):
        pass  # Do something 

The Use Cases

In this case, I want to process the files only if some condition is true. I would be nice if we can tell dirwalker to return only the files that match our condition:

from dirwalker import dirwalker, include, exclude

# Only process *.xml files
for path_name in dirwalker('.', include('*.xml')):
   print path_name

# Process all but *.obj, *.bak
for path_name in dirwalker('.', exclude('*.obj', '*.bak')):
   print path_name

# Create my own predicate: process only empty files
import os
def is_empty(path_name):
   stat = os.stat(path_name)
   return stat.st_size == 0
for path_name is dirwalker('.', is_empty):
   print path_name

The Solution

The implementation of the new dirwalker is:

from fnmatch import fnmatch
import os

def exclude(*patterns):
   """A predicate which excludes any file that matches a pattern """
   def predicate(filename):
       return not any(fnmatch(filename, pattern) for pattern in patterns)
   return predicate

def include(*patterns):
   """ A predicate which includes only files that match a list of patterns """
   def predicate(filename):
       return any(fnmatch(filename, pattern) for pattern in patterns)
   return predicate

def dirwalker(root, predicate=None):
   """ Recursively walk a directory and yield the path names """
   for dirpath, dirnames, filenames in os.walk(root):
       for filename in filenames:
           fullpath = os.path.join(dirpath, filename)
           if predicate is None or predicate(filename):
               yield fullpath

Discussion

The new dirwalker takes in an additional parameter: a predicate which returns True for those files we want to process and False otherwise. To maintain backward compatibility, the predicate is default to None which means dirwalker will yield every file it found.

I also created two predicates creators, include and exclude, which create appropriate predicates. As you can see in the usage, it is easy to create a custom predicate if the built-in ones do not work for your purposes. Here are a few suggestions for predicates:

  • Files that are read-only
  • Files that are larger than a certain threshold
  • Files that have been modified within a time frame
  • Files that are symbolic links
  • Black lists and white lists

Conclusion

The dirwalker is now more powerful, thanks to the added functionality. At the same time, it is still simple to use.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s