The Problem
Sublime Text is my primary text editor and I often need a build system which let me compile the source then run it if success. The build system ships with Sublime Text only compile.
Sublime Text is my primary text editor and I often need a build system which let me compile the source then run it if success. The build system ships with Sublime Text only compile.
I am using the Jupyter QtConsole all the time to try out Python ideas. One thing I noticed that on my Windows 10 system, the font looks beautiful whereas on my macOS, it looks quite ugly so I am setting out to fix it.
Jupyter QtConsole comes with Anaconda installation.
By default, the console uses the Monaco font, which I don’t like:
From the terminal, I issued the following command:
jupyter qtconsole --generate-config
After that, a configuration file was created in ~/.jupyter/jupyter_qtconsole_config.py.bak. Next, I edited it using my favorite editor and made the following changes:
c.JupyterConsoleApp.confirm_exit = False
c.JupyterQtConsoleApp.display_banner = False
c.ConsoleWidget.console_height = 60
c.ConsoleWidget.console_width = 120
c.ConsoleWidget.font_family = 'Inconsolata'
The visual change comes from the last line. At this point, my console looks like this:
Overall, I like the font Inconsolata much better than Monaco.
The Python csv
module is excellent for parsing comma-separated-values (CSV) files. There are times when we need a CSV file in which we can add a couple of comment lines, but the csv
module does not handle those lines. Continue reading
For my first trip to New Orleans, I spent most of my time in the hospital looking after my mother-in-law. Just hours before getting on the plane back to Seattle, I decided to try the poboy and this place is the nearest to the hospital, so why not?
In my previous post, I presented a simple directory walker which solved some of my annoyances. That directory walker is not not perfect. There are times when I want to filter out the files:
for path_name in dirwalker('/path/to/dir'):
if some_condition(path_name):
pass # Do something
In this case, I want to process the files only if some condition is true. I would be nice if we can tell dirwalker
to return only the files that match our condition:
from dirwalker import dirwalker, include, exclude
# Only process *.xml files
for path_name in dirwalker('.', include('*.xml')):
print path_name
# Process all but *.obj, *.bak
for path_name in dirwalker('.', exclude('*.obj', '*.bak')):
print path_name
# Create my own predicate: process only empty files
import os
def is_empty(path_name):
stat = os.stat(path_name)
return stat.st_size == 0
for path_name is dirwalker('.', is_empty):
print path_name
The implementation of the new dirwalker
is:
from fnmatch import fnmatch
import os
def exclude(*patterns):
"""A predicate which excludes any file that matches a pattern """
def predicate(filename):
return not any(fnmatch(filename, pattern) for pattern in patterns)
return predicate
def include(*patterns):
""" A predicate which includes only files that match a list of patterns """
def predicate(filename):
return any(fnmatch(filename, pattern) for pattern in patterns)
return predicate
def dirwalker(root, predicate=None):
""" Recursively walk a directory and yield the path names """
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
if predicate is None or predicate(filename):
yield fullpath
The new dirwalker
takes in an additional parameter: a predicate which returns True
for those files we want to process and False
otherwise. To maintain backward compatibility, the predicate is default to None
which means dirwalker
will yield every file it found.
I also created two predicates creators, include
and exclude
, which create appropriate predicates. As you can see in the usage, it is easy to create a custom predicate if the built-in ones do not work for your purposes. Here are a few suggestions for predicates:
The dirwalker
is now more powerful, thanks to the added functionality. At the same time, it is still simple to use.
In Python, I often need to traverse a directory recursively and act on the files in some way. The solution is to use os.walk
, but this method has three problems:
os.join
to construct the full pathHere is an example:
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
# do something with fullpath
What I really want is a simple function which takes a directory and return a list of file names relative to that directory:
for fullpath in dirwalker(root):
# do something with fullpath
Implementing the dirwalker
function is not that hard:
def dirwalker(root):
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
fullpath = os.path.join(dirpath, filename)
yield fullpath
The dirwalker
function is just a shell on top of os.walk
, but it solves the three stated problems. First, it generates a list of path names instead of a tuple. This makes it easier to remember. Second, it returns the path, relative to the root. This is more useful for my usage. Finally, it eliminates the need for nested loops, greatly simplify the coding experience and at the same time improve readability.
I made dirwalker
a generator instead of a normal function for a couple of reasons. First, a generator is faster because it “returns” a path name as soon as it constructed one. The caller does not have to wait for dirwalker
to finish traversing all the sub-directories before receiving the path names. Secondly, dirwalker
does not need to store all the path names in a list before returning to the caller, saving memory. Finally, the caller code sometimes want to break out of the loop based on some condition; A normal function will have to traverse all of the directories anyway—even if the caller decide to break out early. Since a generator only generate output on demand, it does not have this problem.
A common pattern I often encounter while gathering files is to exclude or include those that match a set of patterns. In the next post, I will introduce a new feature to dirwalker
: filtering.
Gathering files using os.walk
is not that hard, but it has its annoyances. That’s the reason I wrote dirwalker
. I believe dirwalker
can make your code simpler and more Pythonic. Give it a try.
Here is a problem: given a sequence and an item in that sequence, find the item which follows.
For those who code in C-like languages, the first attempt might looks like this:
def find_item_after(sequence, item):
items_count = len(sequence)
for index in range(items_count):
if sequence[index] == item and index < items_count - 1:
return sequence[index]
return None
The problem with this solution is the ugliness of the code, and the inefficiency of indexing.
def find_item_after(sequence, item):
for item_here, item_after in zip(sequence, sequence[1:]):
if item_here == item:
return item_after
return None
This solution improves in readability, but at the expense of performance. First, the expression sequence[1:]
creates a new sequence from the current one. Then, the zip
function creates yet another sequence that are twice the size of the original requence. Note that in Python 3, zip will return a generator object instead of a sequence, which will help in term of efficiency.
Until we use Python 3, we are better off using a generator version of zip
, namely izip
. We can also avoid the memory hogging aspect of sequence[1:]
using islice
:
from itertools import izip, islice
def find_item_after(sequence, item):
for item_here, item_after in izip(sequence, islice(sequence, 1, None)):
if item_here == item:
return item_after
return None
This solution is both Pythonic and efficient. However, we can still do better.
def find_item_after(sequence, item):
iterable = iter(sequence)
for item_here in iterable:
if item_here == item:
return next(iterable, None)
return None
This solution does not use any external library, nor does it create extra copy of the sequence. The next
function takes a second parameter, the return value in case that we are at the end of the iterable, when next
will generate a StopIteration
exception.
We can adapt the solution which use iter
for this purpose:
def find_item_before(sequence, item):
iterable = iter(sequence)
item_before = None
for item_here in iterable:
if item_here == item:
return item_before
item_before = item_here
return None
However, the above is too clunky and not efficient: we have to assign a new value to item_before
every time we go through the loop. What about the solution which uses izip
and islice
?
from itertools import izip, islice
def find_item_after(sequence, item):
for item_before, item_here in izip(islice(sequence, 1, None), sequence):
if item_here == item:
return item_before
return None
Note that in this solution, we start searching in the sequence starting with the second item (at index 1) and eliminate the first item from the search. If the caller wants an item before the first item, the loop will complete without any result and we return None
.
This problem was originally an interview question, but it does have some practical application