Python Tips and Tricks

Python -- Tips and Tricks

1.) Python -- continued

1.1.) Some general Tips

Use ipython instead of python for interactive testing. It is also used by sage. It gives you tab-completion other advantages.

Strings can be terminated by single or double quotes: the other quote is allowed inside.

>>> F='3.5" or 5.25" Floppy?'; F
'3.5" or 5.25" Floppy?'
>>> D="F(x)=x^2, f'(x)=2x";D
"F(x)=x^2, f'(x)=2x"

You can use multi-line strings inside 3 quotes:

	
>>> ALongString="""First line
... second line
...
... many lines
... """
>>> print ALongString
First line
second line

many lines

Non printable character: you can use the standard backslash notation for newline, tab and so on (a complete list of the possible backslash-sequences):

>>> print "A new line:\n and a tab \t character. This \\ is one backslash!"
A new line:
 and a tab       character. This \ is one backslash!

Raw strings do not interpret the backslashes:

>>> print r"A new line:\n and a tab \t character. This \\ are two backslashes!"
A new line:\n and a tab \t character. This \\ are two backslashes!

If you use non-ascii characters (and the corresponding encoding in your file), you have to tell the encoding:
```
#!/usr/bin/python
# -*- coding: utf-8 -*-

print "German Umlauts: ÄÖÜäüöß"
    
```
The encoding of the file and the coding in the second line have to match.

Dictionaries are very useful for storing values: (this example is from Chapter 11 of "How to Think Like a Computer Scientist"):

>>> previous = {0: 0, 1: 1}
>>>
... def fibonacci(n):
...     if previous.has_key(n):
...         return previous[n]
...     else:
...         new_value = fibonacci(n-1) + fibonacci(n-2)
...         previous[n] = new_value
...         return new_value
...
>>> fibonacci(40)
102334155
>>> fibonacci(50)
12586269025L
>>> fibonacci(60)
1548008755920L
>>> fibonacci(80)
23416728348467685L
>>> fibonacci(100)
354224848179261915075L

1.2.) Modules and Packages

When you write a script and it grows, you want to split it into several files or at least put some functions into a separate file. So that other scripts can reuse the functions.

This is called writing and importing from a module. Imagine we put the definition of fibonacci and previous into a file fib.py: Now the file fib.py becomes the module named fib (without extension):

previous = {0: 0, 1: 1}
def fibonacci(n):
   if previous.has_key(n):
       return previous[n]
   else:
       new_value = fibonacci(n-1) + fibonacci(n-2)
       previous[n] = new_value
       return new_value
       
if __name__=="__main__": # not in module mode:
   print "Testing: ", fibonacci(100)
else:
   print "As module with name", __name__

>>> import fib
As module with name fib
>>> fib

>>> dir(fib)
['__builtins__', '__doc__', '__file__', '__name__', 'fibonacci', 'previous']
>>> fib.fibonacci(50)
12586269025L
>>> fib.previous
{0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8, 7: 13, 8: 21, 9: 34,
10: 55, 11: 89, 12: 144, 13: 233, 14: 377, 15: 610, 16: 987,
17: 1597, 18: 2584, 19: 4181, 20: 6765, 21: 10946, 22: 17711,
23: 28657, 24: 46368, 25: 75025, 26: 121393, 27: 196418, 28: 317811,
29: 514229, 30: 832040, 31: 1346269, 32: 2178309, 33: 3524578,
34: 5702887, 35: 9227465, 36: 14930352, 37: 24157817, 38: 39088169,
39: 63245986, 40: 102334155, 41: 165580141, 42: 267914296,
43: 433494437, 44: 701408733, 45: 1134903170, 46: 1836311903,
47: 2971215073L, 48: 4807526976L, 49: 7778742049L, 50: 12586269025L}

Note the use of fib.previous instead of previous. The module comes with its own namespace. We could also use from fib import *. Then fibonacci and previous belong to the global namespace. Another form is from fib import fibonacci which only imports fibonacci and not previous.

Python comes with a huge library of Standard Modules.

Sometimes (for example when building a huge CAS with Python) one needs to distribute a lot of module together. Python supports this with packages. You can think of a packages as a directory in the filesystem containing subdirectories and modules. We have in SAGE, for example:

sage.groups.abelian_gps.abelian_group??
File:           /usr/local/sage/local/lib/python2.5/site-packages/sage/groups/abelian_gps/abelian_group.py

The package sage contains a subpackage sage.groups, and sage.groups.abelian_gps and a module abelian_group which belongs to sage.groups.abelian_gps. And everything corresponds to files and subdirectories of /usr/local/sage/local/lib/python2.5/site-packages/

See section 6.4 of The Python Tutorial for more information about packages and an examples with a complex directory layout.

1.3.) Duck typing

If it walks like a duck and quacks like a duck, I would call it a duck.

If a class has the same behaviour (e.g. implements the same methods) as another class they are interchangeable. This is similar to Java Interfaces but in Python it is done at runtime and only the part being accessed is considered.

For example, if code requires at one place a class implementing a method foo and at another place a class with a method bar, then a class A implementing both methods can be used in both places and another class B implementing the first method can only be used in the first place.

class A(object):
    def foo(self):
        print "Foo"

    def bar(self):
        print "bar"



class B(object):
    def foo(self):
        print "B's implementation of foo"



L=[ A(), A(), B()]

for obj in L:
    obj.foo()

for obj in L:
    obj.bar()

Foo
Foo
B's implementation of foo
bar
bar
Traceback (most recent call last):
  File "", line 22, in 
AttributeError: 'B' object has no attribute 'bar'

1.4.) Reserved function names

1.4.1.) ... for emulating types

Section 3.4.5 of The Python Reference Manual lists the method names involved with container types like lists.

So if we want to add list-like behaviour to one of our classes A, we need to implement

__getitem__( self, key), for printing A[1]
__setitem__( self, key, value), assigning to A, e.g. A[1] ="Hello"

We do not need to implement the other methods at once. (But if we are interested in len(A), we should implement __len__(). )

1.4.2.) ... for operator overloading

Section 3.4.7 of The Python Reference Manual list the method names for numeric types. These methods correspond to the different operators + - * and so on.

2.) Performance and Profiler Usage

First make it right, then make it fast.

2.1.) Profiler

With a profiler, we measure the performance of our code, for example the fib module:

python -m cProfile fib.py
Testing:  354224848179261915075
         402 function calls (204 primitive calls) in 0.002 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.002    0.002 :1()
        1    0.000    0.000    0.002    0.002 fib.py:1()
    199/1    0.001    0.000    0.001    0.001 fib.py:2(fibonacci)
        1    0.000    0.000    0.002    0.002 {execfile}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      199    0.000    0.000    0.000    0.000 {method 'has_key' of 'dict' objects}

We can see how often a certain function is called and how much time is spent there. The Python Library Reference has an entire about profiling. Here is a link to a quickstart guide.

There is also a newer profiler called hotshot, which has a smaller performance impact (profiling slows down). But it requires more setup and you have to put it in its own python file. The Python Library Requires has an example.

The profiler helps us to find the function which is called the most or which takes the longest time. This function is the first candidate for optimization.

Here is an example of different ways to express the same function:

import cProfile
import os, string


# create some test data:

P= os.popen("man -Tascii python| col -b")
L= P.readlines()

wordList = []

for l in L:
    for w in l.split():
        if w: wordList.append(w)


# different loopings

def worker1():
    newList = []
    for w in wordList:
        newList.append(w.upper() )


# in theory 2 and 3 should be faster than 1, but they are not
def worker2():
    newList = []

    append=newList.append
    upper= string.upper
    
    for w in wordList:
        append( upper(w) )




newList = []
append=newList.append
upper= string.upper
def worker3():
    for w in wordList:
        append( upper(w) )




def worker4():
    newList= map( string.upper, wordList )

def worker5(): 
    return [w.upper() for w in wordList ]


# the winner is :
def worker6(): # this is much faster than the others
    return (w.upper() for w in wordList )


# worker6 ist the fastest.
# worker5 is faster than the rest but much slower than worker6
# worker1 is not slower than 2 and 3

    
def f():
    for time in range(500):
        worker5()
    

cProfile.run( "f()" )

#for w in worker6():
#    print w

Surprisingly worker2 and worker3 are slower than worker1.

2.2.) Performance tips

2.2.1.) Some common tips

At the end of Python Patterns - An Optimization Anecdote (an essay by Guide van Rossum) there are a few conclusions:

Optimize, only if it is needed.
Optimize the inner loop.
Function calls are expensive, avoid them.
Look in the Library for a function with your desired behaviour. Library functions are often written in C and much faster than Python code.
Avoid dots and use local variables. Store object methods in local variables and call the local ones. (At home with Python 2.5 this was not faster than object derefencing.
In theory accessing an object and the global namespace is more expensive than the access to the local namespace in a function.)
Use the profiler to collect data for your situation.

2.2.2.) EAFP vs. LBYL

(From the Glossary:)
EAFP: Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style that is common in many other languages such as C.
LBYL: Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized the presence of many if statements.

The next example from Section 9 of PythonInfo Wiki: PythonSpeed/PerformanceTips shows the benefits:

import os
# create some test data:

P= os.popen("man -Tascii python| col -b")
L= P.readlines()

wordList = []

for l in L:
    for w in l.split():
        if w: wordList.append( w.upper() )



def worker1(words): # lbyl
    wdict = {}
    for word in words:
        if word not in wdict:
            wdict[word] = 0
        wdict[word] += 1
    return wdict


def worker2(words): # eafp , but twice as slow as lbyl
    wdict = {}
    for word in words:
        try:
            wdict[word] += 1
        except KeyError:
            wdict[word] = 1
    return wdict


def worker3(words): # faster as eafp, but not much
    wdict = {}
    g = wdict.get
    
    for word in words:
        wdict[word] = g(word,0) + 1
        
    return wdict


import cProfile

def f(func):
    for time in range(1000):
        func(wordList)
    

cProfile.run( "f(worker3)" )

# sort for wordcount which is the value
tmp= [ (v, k) for k,v in worker1(wordList).items() ]
tmp.sort() # sorting by values!

# the 50 most frequent words:
print tmp[-50:]

Again the theory is wrong, the eafp version is slower.

2.2.3.) xrange

When a large range of numbers is required, use xrange instead of range. xrange uses a generator object, where each number is created one after another. range creates the whole list at once.

3.) Some interesting recipes from The Python Cookbook

3.1.) Ruby like syntactic sugar

This recipe shows how to overload the __rmul__ function.

3.2.) Decorate an output stream with print-like methods

This recipe shows how to create function with arbitrary argument list. Notice that no inheritance is used, you cannot access stream's write method directly (only via out.stream.write).

3.3.) Examine every permutation of a given sequence

This recipe shows how to get every permutation of a given sequence or string. It uses recursion and generators and also demonstrates slicing (last line).

3.4.) Fast copy of an object having a slow init

This recipe show how to make a faster copy of an object. (The discussion gives more details on how Python copies your own classes.)

In Python an object is referenced by default. So when you call foo( obj ) the method foo can change obj. This is most often the right thing. But sometimes one desires that the function works on its own copy, so that the foo cannot change the original object.

Python provides the copy.copy and copy.deepcopy functions for creating copies. Consider the next example:

aList=[1,2,3]

def foo(L):
    L[1] = 42


print "before:", aList
foo(aList) 
print "after:", aList

# before: [1, 2, 3]
# after: [1, 42, 3]

import copy
aList[1]=2
print "before (2):", aList
foo(copy.copy(aList)) 
print "after (2):", aList 
# before (2): [1, 2, 3]
# after (2): [1, 2, 3]

Due to the fact that object variables are references to some memory location, we have another pitfall: object aliasing:

ripefruits={"apple": "green", "banana": "yellow"}

rottenfruits=ripefruits # only aliasing,
                        # ripe- and rotten point to the same place in memory

rottenfruits["apple"]="brown"
rottenfruits["banana"]="black"

print ripefruits
print rottenfruits
# {'apple': 'brown', 'banana': 'black'}
# {'apple': 'brown', 'banana': 'black'}

nach oben