Process CSV with Comments

Problem

The Python csv module is excellent for parsing comma-separated-values (CSV) files. There are times when we need a CSV file in which we can add a couple of comment lines, but the csv module does not handle those lines. Take the following file, data_with_comments.csv for example:

# ============================================================
# Demo file: skipping comments
# ============================================================

alias,shell
john,bash # The boss
amanda,bash

# The odd ones
kim,tcsh    # Used this in college without knowing better
karen,dash  # Wants to be different from Kim

In this file, there are two kinds of comments: the stand-alone and the inline.

Solution

This article introduce a filter, skip_comments, which takes as input an iterable object (list, tuple, file object, generator, to name a few) and filter out all comments and blank lines. We can use this filter between a file object and a csv.reader or csv.DictReader. Here is the code:

from __future__ import print_function
import csv
import re


comment_pattern = re.compile(r'\s*#.*$')


def skip_comments(lines):
    """
    A filter which skip/strip the comments and yield the
    rest of the lines

    :param lines: any object which we can iterate through such as a file
        object, list, tuple, or generator
    """
    global comment_pattern

    for line in lines:
        line = re.sub(comment_pattern, '', line).strip()
        if line:
            yield line


if __name__ == '__main__':
    with open('data_with_comments.csv') as f:
        reader = csv.DictReader(skip_comments(f))
        for line in reader:
            print(line)

Output:

{'alias': 'john', 'shell': 'bash'}
{'alias': 'amanda', 'shell': 'bash'}
{'alias': 'kim', 'shell': 'tcsh'}
{'alias': 'karen', 'shell': 'dash'}

Discussion

  • skip_comments is a generic filter: we can use it not only for CSV file, but also for all sort of data files in which we would like to add comment support. Think of this filter as a pre-processor for your data file
  • We can use this filter technique to filter out invalid lines, or lines which we don’t want to feed into the CSV reader
  • We can also use this technique for other conditions such as removing the first (or last) N lines, take only odd lines, … For short, we can use this technique to deal with data files containing unwanted lines

One thought on “Process CSV with Comments

  1. tony

    I started to learn how to write Vietnamese language by using regular US English keyboard, but I’m stuck at the letters ư = [, ơ = ]. On regular English keyboard, the symbols [ and [ are not readily available unless being inserted from drop down menu called insert. So would someone helps me on this characters (ư = [,ơ = ]), thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s