Problem
The Python csv
module is excellent for parsing comma-separated-values (CSV) files. There are times when we need a CSV file in which we can add a couple of comment lines, but the csv
module does not handle those lines. Take the following file, data_with_comments.csv
for example:
# ============================================================
# Demo file: skipping comments
# ============================================================
alias,shell
john,bash # The boss
amanda,bash
# The odd ones
kim,tcsh # Used this in college without knowing better
karen,dash # Wants to be different from Kim
In this file, there are two kinds of comments: the stand-alone and the inline.
Solution
This article introduce a filter, skip_comments
, which takes as input an iterable object (list, tuple, file object, generator, to name a few) and filter out all comments and blank lines. We can use this filter between a file object and a csv.reader
or csv.DictReader
. Here is the code:
from __future__ import print_function
import csv
import re
comment_pattern = re.compile(r'\s*#.*$')
def skip_comments(lines):
"""
A filter which skip/strip the comments and yield the
rest of the lines
:param lines: any object which we can iterate through such as a file
object, list, tuple, or generator
"""
global comment_pattern
for line in lines:
line = re.sub(comment_pattern, '', line).strip()
if line:
yield line
if __name__ == '__main__':
with open('data_with_comments.csv') as f:
reader = csv.DictReader(skip_comments(f))
for line in reader:
print(line)
Output:
{'alias': 'john', 'shell': 'bash'}
{'alias': 'amanda', 'shell': 'bash'}
{'alias': 'kim', 'shell': 'tcsh'}
{'alias': 'karen', 'shell': 'dash'}
Discussion
skip_comments
is a generic filter: we can use it not only for CSV file, but also for all sort of data files in which we would like to add comment support. Think of this filter as a pre-processor for your data file- We can use this filter technique to filter out invalid lines, or lines which we don’t want to feed into the CSV reader
- We can also use this technique for other conditions such as removing the first (or last) N lines, take only odd lines, … For short, we can use this technique to deal with data files containing unwanted lines
Advertisements
I started to learn how to write Vietnamese language by using regular US English keyboard, but I’m stuck at the letters ư = [, ơ = ]. On regular English keyboard, the symbols [ and [ are not readily available unless being inserted from drop down menu called insert. So would someone helps me on this characters (ư = [,ơ = ]), thanks
Tony, please see my post here: https://wuhrr.wordpress.com/2014/01/24/how-to-type-vietnamese-on-windows-7/
nice to hear that. you have the great site!
nice to read this. you have the great site!
Life saver