Python: Making Complex Regular Expression Easier to Read

In my last post, I shared a way to created regular expression with embedded comments for the Tcl scripting language. It turns out that Python also offers similar feature.

The Problem

I often need to deal with complex regular expression while scripting in Python. The problem is, the expression syntax is terse, cryptic and hard to understand and debug. There must be a better way to deal with regular expression, a way to add comments would be nice.

The Solution

As with my last post, I will use the same example: fishing out email addresses from a chunk of text. Below is the Python counterpart of my previous solution:

import re

if __name__ == '__main__':
    test_data = '''
            This is a bunch of text
            within it, there are some emails such as foo@bar.com
            or one@two.three.net
            What about mixed case: John.Doe@services.company.ws...
            Let see if we can extract them out
            '''
    email_pattern = r'''
            # The part before the @
            [a-z0-9._%-]+

            # The ampersand itself
            @

            # The domain, not including the last dot
            [a-z0-9.-]+

            # The last dot
            \.

            # The top-level domain (TLD), which ranges from 
            # 2 to 4 characters
            [a-z]{2,4}
            '''
    print 'START'
    result = re.findall(email_pattern, 
            test_data, 
            re.IGNORECASE|re.VERBOSE)
    print '\n'.join(result)
    print 'END'

The output:

START
foo@bar.com
one@two.three.net
John.Doe@services.company.ws
END

Conclusion

With the re.VERBOSE flag, I can embed white spaces and comments in the regular expression, making it easier to read and understand.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s