Python: Making Complex Regular Expression Easier to Read

In my last post, I shared a way to created regular expression with embedded comments for the Tcl scripting language. It turns out that Python also offers similar feature.

The Problem

I often need to deal with complex regular expression while scripting in Python. The problem is, the expression syntax is terse, cryptic and hard to understand and debug. There must be a better way to deal with regular expression, a way to add comments would be nice.

The Solution

As with my last post, I will use the same example: fishing out email addresses from a chunk of text. Below is the Python counterpart of my previous solution:

import re

if __name__ == '__main__':
    test_data = '''
            This is a bunch of text
            within it, there are some emails such as
            What about mixed case:
            Let see if we can extract them out
    email_pattern = r'''
            # The part before the @

            # The ampersand itself

            # The domain, not including the last dot

            # The last dot

            # The top-level domain (TLD), which ranges from 
            # 2 to 4 characters
    print 'START'
    result = re.findall(email_pattern, 
    print '\n'.join(result)
    print 'END'

The output:



With the re.VERBOSE flag, I can embed white spaces and comments in the regular expression, making it easier to read and understand.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s