Python to Cython and benchmarking

Recently I have been coding with Cython for my project PyCAF. Obviously, I am doing this to make my code run faster. My approach was to first write my code in Python and make sure it works correctly. I wrote unit-tests for each function and class, using nosetests --with-coverage to look for test cases I might have missed. Then I profiled the code, finding out hot-spots, and eliminating the obvious issues.

Next up was to write some performance micro-benchmarks for important bits of code. These benchmarks have the same interface: they take an “input size” N as a parameter. What N means is specific to the function. For e.g., for the function event_one_one(N) it means “create N pairs of tasklets, and make each pair exchange one event” but for the function event_one_many(N) it means “create one sender tasklet and N recipient tasklets waiting for the sender to send an event”. You get the drift. The micro-benchmarks are tested as well, by the simple expedient of writing a unit-test case per micro-benchmark that calls it with a small input.

Digression: I spent some time looking for some tool where I could store results of my benchmarks per code commit and then later browse how a particular benchmark varied over code commits, but found nothing…. does anyone have any ideas? If not, I might write a tool for this in the future.

Anyway, the benefit of micro-benchmarks is that you can see how the performance scales when the input sizes grow. For e.g., here is the output of my benchmark tool:

Test Name N Ratio Time (s) Ratio K-PyStones
tasklet_new 100 1 0.000518083572388 1 0.0364847586189
tasklet_new 1000 10 0.00419187545776 8.09111826967 0.295202497026
tasklet_new 10000 10 0.0460090637207 10.9757706746 3.24007490991
tasklet_new 100000 10 0.516650915146 11.2293290357 36.3838672638
tasklet_yield 100 1 0.000921964645386 1 0.0649270877032

Some things to note here: I convert the time taken by a test to kilo-pystones and record that as well as the time taken. What’s a Pystone? Well, its the output of python -c "from test import pystone; pystone.main()". For my machine:
Pystone(1.1) time for 50000 passes = 0.7
This machine benchmarks at 71428.6 pystones/second

So basically pystones is a (somewhat) machine-independent measure of how long a test took to run. While a test runs faster on a fast machine and slower on a slow machine, when you convert the time to pystones, it should be the same.

Now the interesting thing to note is how the time for a test increases when the input grows larges. To help me, my benchmark prints the ratio between successive input sizes and times taken. If I increase the input size by, say 10 times, and the time taken increases 100 times, then I might have a problem. Of course, behind every test are a bunch of functions and algorithms. I have a general expectation of what the complexity of my function should be, and look at the benchmark to confirm my expectations. I can then replace algorithms with high complexity with better ones. A good design and a well chosen algorithm or data structure gives more speedup than mindlessly twiddling around with code.

Digression: My benchmark is not reporting the space-complexity of the tests (aka, how much memory is being consumed). Tarek Ziade mentions some tools in his excellent book “Expert Python Programming”, but I have not understood the details of the tools well enough to incorporate them into my testing, just yet.

Once I find a benchmark that I want to improve, I will first profile it, to find where the function or algorithm is spending most time. This is obvious and there are enough resources on the web about it (just look for “cProfile”). What I was wondering about was how would I profile code that I had converted to Cython, since Cythonized code becomes a binary? Well, the Cython wiki gives the answer: So thats what I will be looking into very soon.

Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: