I often need to quickly create a Python virtual environment to try out ideas. Normally, I create a temporary directory, create a virtual environment, then install the required packages. While these steps do not take long to finish, it helps to streamline the steps.Continue reading
In my daily work, I often need to know which operating system (OS) or Linux distribution (distro) I am in to customize my bash script. Here is how I detect the OS and distro.Continue reading
I have been using Linux for a long time. Before that, I used Unix and BSD at school. In my heart BSD is something I want to explore and turn into my daily driver.
That is why in 2011, I tried out FreeBSD, OpenBSD, NetBSD, then PC-BSD. Sadly, I kept running into problems which prevent me to use BSD for my daily tasks. Did I mention that I am a software engineer? That means the task of setting up and tweaking an operating system should not be too hard for me. However, for BSD, I gave up after a couple of days.Continue reading
I often work in a Linux environment with multiple windows open in a tmux session. One of the pattern I see often is the need to navigate to the same directory for multiple windows. For example, in window 1, I navigated to a directory deep within my project:
$ cd ~/long/path/to/my/directory
Then, on window 2, I want to navigate to the same directory. This often involves either copy the command from window 1 and paste into window 2, or retype the same command again. There must be a better way.
Sublime Text is my primary text editor and I often need a build system which let me compile the source then run it if success. The build system ships with Sublime Text only compile.
I am using the Jupyter QtConsole all the time to try out Python ideas. One thing I noticed that on my Windows 10 system, the font looks beautiful whereas on my macOS, it looks quite ugly so I am setting out to fix it.
About Jupyter QtConsole
Jupyter QtConsole comes with Anaconda installation.
By default, the console uses the Monaco font, which I don’t like:
From the terminal, I issued the following command:
jupyter qtconsole --generate-config
After that, a configuration file was created in ~/.jupyter/jupyter_qtconsole_config.py.bak. Next, I edited it using my favorite editor and made the following changes:
c.JupyterConsoleApp.confirm_exit = False c.JupyterQtConsoleApp.display_banner = False c.ConsoleWidget.console_height = 60 c.ConsoleWidget.console_width = 120 c.ConsoleWidget.font_family = 'Inconsolata'
The visual change comes from the last line. At this point, my console looks like this:
Overall, I like the font Inconsolata much better than Monaco.
csv module is excellent for parsing comma-separated-values (CSV) files. There are times when we need a CSV file in which we can add a couple of comment lines, but the
csv module does not handle those lines. Continue reading
For my first trip to New Orleans, I spent most of my time in the hospital looking after my mother-in-law. Just hours before getting on the plane back to Seattle, I decided to try the poboy and this place is the nearest to the hospital, so why not?
In my previous post, I presented a simple directory walker which solved some of my annoyances. That directory walker is not not perfect. There are times when I want to filter out the files:
for path_name in dirwalker('/path/to/dir'): if some_condition(path_name): pass # Do something
The Use Cases
In this case, I want to process the files only if some condition is true. I would be nice if we can tell
dirwalker to return only the files that match our condition:
from dirwalker import dirwalker, include, exclude # Only process *.xml files for path_name in dirwalker('.', include('*.xml')): print path_name # Process all but *.obj, *.bak for path_name in dirwalker('.', exclude('*.obj', '*.bak')): print path_name # Create my own predicate: process only empty files import os def is_empty(path_name): stat = os.stat(path_name) return stat.st_size == 0 for path_name is dirwalker('.', is_empty): print path_name
The implementation of the new
from fnmatch import fnmatch import os def exclude(*patterns): """A predicate which excludes any file that matches a pattern """ def predicate(filename): return not any(fnmatch(filename, pattern) for pattern in patterns) return predicate def include(*patterns): """ A predicate which includes only files that match a list of patterns """ def predicate(filename): return any(fnmatch(filename, pattern) for pattern in patterns) return predicate def dirwalker(root, predicate=None): """ Recursively walk a directory and yield the path names """ for dirpath, dirnames, filenames in os.walk(root): for filename in filenames: fullpath = os.path.join(dirpath, filename) if predicate is None or predicate(filename): yield fullpath
dirwalker takes in an additional parameter: a predicate which returns
True for those files we want to process and
False otherwise. To maintain backward compatibility, the predicate is default to
None which means
dirwalker will yield every file it found.
I also created two predicates creators,
exclude, which create appropriate predicates. As you can see in the usage, it is easy to create a custom predicate if the built-in ones do not work for your purposes. Here are a few suggestions for predicates:
- Files that are read-only
- Files that are larger than a certain threshold
- Files that have been modified within a time frame
- Files that are symbolic links
- Black lists and white lists
dirwalker is now more powerful, thanks to the added functionality. At the same time, it is still simple to use.
In Python, I often need to traverse a directory recursively and act on the files in some way. The solution is to use
os.walk, but this method has three problems:
- It returns a tuple of three elements and I often don’t remember the order, which requires to look it up
- It does not return the full path to the file. I always have to call
os.jointo construct the full path
- It returns a list of file names, which requires another loop. That means a nested loop
Here is an example:
for dirpath, dirnames, filenames in os.walk(root): for filename in filenames: fullpath = os.path.join(dirpath, filename) # do something with fullpath
What I really want is a simple function which takes a directory and return a list of file names relative to that directory:
for fullpath in dirwalker(root): # do something with fullpath
dirwalker function is not that hard:
def dirwalker(root): for dirpath, dirnames, filenames in os.walk(root): for filename in filenames: fullpath = os.path.join(dirpath, filename) yield fullpath
dirwalker function is just a shell on top of
os.walk, but it solves the three stated problems. First, it generates a list of path names instead of a tuple. This makes it easier to remember. Second, it returns the path, relative to the root. This is more useful for my usage. Finally, it eliminates the need for nested loops, greatly simplify the coding experience and at the same time improve readability.
dirwalker a generator instead of a normal function for a couple of reasons. First, a generator is faster because it “returns” a path name as soon as it constructed one. The caller does not have to wait for
dirwalker to finish traversing all the sub-directories before receiving the path names. Secondly,
dirwalker does not need to store all the path names in a list before returning to the caller, saving memory. Finally, the caller code sometimes want to break out of the loop based on some condition; A normal function will have to traverse all of the directories anyway—even if the caller decide to break out early. Since a generator only generate output on demand, it does not have this problem.
A common pattern I often encounter while gathering files is to exclude or include those that match a set of patterns. In the next post, I will introduce a new feature to
Gathering files using
os.walk is not that hard, but it has its annoyances. That’s the reason I wrote
dirwalker. I believe
dirwalker can make your code simpler and more Pythonic. Give it a try.