Monday, November 21, 2011

Python idioms: Building strings

I'll start a series of posts about Python idioms.
What is a programming idiom? According to Wikipedia's Programming Idiom article, a programming idiom is defined as "the use of an unusual or notable feature that is built in to a programming language".
Python is very rich in useful idioms. Python programmers who take advantage of them are known as pythonistas. On that sense, I highly recommend you to read the famous article Code like a Pythonista: Idiomatic Python by David Goodger.

Once defined the concept of programming idiom and introduced the concept pythonista, let's talk about one of favorite idioms: building strings from substrings.

Imagine that you have a list of strings,
bands = ['Machine Head','Metallica','Opeth','Veil of Maya']
and you want to concatenate each item in the list to form a unique comma separared string. If you come from the C, Java syntax world, you would write something like this:
output = ''
for band in bands[:-1]:
    output += band + ', '
output += bands[-1] 
If you print the content of the output variable, you will get a list of the bands:
print output
>>> Machine Head, Metallica, Opeth, Veil of Maya
This is a very inefficient way to concatenate strings in Python because in each iteration of the for loop, a temporal string is generated before the string addition and thrown away after.

The pythonic way is faster and more elegant:
>>> print ', '.join(bands)
Machine Head, Metallica, Opeth, Veil of Maya

I coded a small example to compare the performance of both techniques:
from functools import wraps
import time

def timed(f):
  @wraps(f)
  def wrapper(*args, **kwds):
    start = time.clock()
    result = f(*args, **kwds)
    elapsed = time.clock() - start
    #print "%s took %d time to finish" % (f.__name__, elapsed)
    print "%.5gs" % (elapsed)
    return result
  return wrapper

bands = ['Machine Head','Metallica','Opeth','Veil of Maya']*100000

@timed
def func1():
  output = ''
  for band in bands[:-1]:
    output += band + ', '
  output += bands[-1]
  return output

@timed 
def func2():
   output = ', '.join(bands)
   return output

func1()
func2()

The result is:

$ python test.py
0.07s
0.01s

If we increase one order of magnitude the bands list:

$ python test.py
0.62s
0.1s

the difference in performance becomes more relevant.

PS: In my toy example, I used the timed decorator that can be found in this stackoverflow thread.

No comments:

Post a Comment