Python packaging: setuptools and eggs

Developing Packages

This document describes how to create a buildable, distributable package out of
python source code.We’ll look at the popular ‘egg’ distribution format.

Tools we need:

Also, highly recommended is: virtualenv

What is an egg?

  • Eggs are basically directories that are added to Python’s path.
  • The directories may be zipped.
  • Eggs have some meta-data
    • Dependencies
    • Entry-points
  • May be distributed as source
  • Can be discovered from PyPI

What is easy_install?

A tool to find, download, compile (if needed), and install python packages. It
can install eggs, or even source tarballs, as long as the tarball uses the
standard python setup.py method of building itself.

Egg Terminology

  • Distribution
    • a term used by Python distutils;
    • anything which can be ‘distributed’, really;
    • most common: tarballs, eggs.
  • Source distribution:
    • A distribution that contains only source files
  • Binary distribution:
    • A distribution that contains compiled ‘.pyc’ files and C extensions
    • E.g., RPMs and eggs
  • Egg:
    • A kind of binary distribution
  • Platform dependent eggs:
    • Eggs which contain built C extension modules and are thus tied to an OS
  • ‘develop eggs’ and ‘develop egg links':
    • develop egg links are special files that allow a source directory to be
      treated as if it were an installed egg. (That is, an egg that you are
      ‘developing’!)
  • Index server and link servers:
    • easy_install will automatically download distributions from the
      Internet. When looking for distributions, it will look at zero or more
      links servers for links to distributions. They will also look on a single
      index server, typically (always) http://www.python.org/pypi. Index servers
      are required to provide a specific web interface.

Example Project

Our sample project consists of this code:

  • package ‘speaker’
    • module dog:
      • class Dog
      • function DogMain
    • module gendibal
      • class Gendibal
      • function GendibalMain
    • module bjarne
      • class Bjarne
      • function BjarneMain
  • pacakge ‘tests’
    • module dog_test
      • class DogTest
    • module gendibal_test
      • class GendibalTest
    • module bjarne_test
      • class BjarneTest

The classes Dog, Gendibal, and Bjarne are “speakers": they all have the
method greeting() which takes no arguments and returns a string containing
something they said. The Dog speaker will, of course, say “Bow, wow!”.
Gendibal is a mathemetician and therefore uses prime numbers in his
greetings. Bjarne likes to talk about C++.

For every module and class, there is a corresponding test module and class.

We shall also have three scripts (i.e., programs that live in a bin
directory somewhere) that are intended to be launched from the command
line. The programs will be:

  • rundog: runs speaker.dog:Dogmain
  • rungendibal: runs speaker.gendibal:GendibalMain
  • runbjarne: runs speaker.bjarne:BjarneMain

Directory Structure

This is the intended directory structure.

Speaker/
|-- README.txt
|-- setup.cfg
|-- setup.py
|-- speaker
|   |-- __init__.py
|   |-- bjarne.cpp
|   |-- dog.py
|   `-- gendibal.pyx
`-- tests
    |-- __init__.py
    |-- bjarne_test.py
    |-- dog_test.py
    `-- gendibal_test.py
  • Speaker is the name of the project, and it will also be the name of our
    package (Speaker-0.1.tgz, for e.g.);
  • Our project contains a package named speaker, where we will put our
    classes; we can add more packages inside later;
  • setup.py and setup.cfg contain information to build our egg.
  • The tests package will contains test code.

Version 0.0: setting up the package

Let’s create some dirs and files:

Speaker/
|-- setup.cfg
|-- setup.py
|-- speaker
|   |-- __init__.py
`-- tests
    |-- __init__.py

Where:

  • setup.py
    from setuptools import setup, find_packages
    setup(name='Speaker',
          packages=find_packages(),
          )

The find_packages function automatically will discover your python packages
and modules, and pack them up.

  • setup.cfg
    [egg_info]
    tag_build = dev

The tag_build option appends a tag of our choice to the generated
filename. We’ll see it in action in a second.

  • speaker/__init__.py and tests/__init__.py are empty files.

Now we can build our package:

$ cd Speaker
$ python setup.py build
$ python setup.py bdist_egg
$ ls dist/
Speaker-0.0dev-py2.5.egg  Speaker-0.0dev.tar.gz

We have just created a source distribution and a platform-independent egg, even
though we don’t have a single line of useful code yet.

Note: the ‘dev’ in the filename: we’ve told setuptools that our package
is a in-development package and specified the tag ‘dev’ in setup.cfg. This
actually matters when easy_install is figuring out which out of several
versions of a pacakge it should download and install. More on it later.

Version 0.1: making a releasable package

Let’s update our setup.py:

# setup.py
from setuptools import setup, find_packages
import sys, os

version = '0.1'

setup(name='Speaker',
      version=version,
      description="Demo Pakcage",
      packages=find_packages(exclude=['ez_setup', 'examples', 'tests']),
      include_package_data=True,
      zip_safe=False,
      )

Notes:

  • Look at find_packages directive: some packages and modules are not going
    to be part of your distribution, because we want the tests and examples package, and the ez_setup.py module, to be available only to people checking out the code, not when they are downloading a built egg. (We haven’t written any exampels yet, but you were going to do it, right? ;-))
  • zip_safe means that the package won’t be unzipped: stuff will run right
    out of the zipped directory! Normally not useful.
  • We always want to set include_package_data to True.

Our first bit of code

We create our first speaker:

# speaker/dog.py
class Dog(object):
    def greeting(self):
        return "Bow, wow!"

and write a test:

# tests/dog_test.py

import unittest
from speaker import dog

class DogTest(unittest.TestCase):
    def test_greeting(self):
        d = dog.Dog()
        self.assert_(d.greeting() == "Bow, wow!")

if __name__ == "__main__":
    unittest.main()

Oops! Python does not know where to find our packages yet. So we ‘install’ our
egg as a ‘develop egg':

$ python setup.py develop

This will create the necessary symbolic links for python to find
our packages. Now our code will behave just as if it was
installed, while letting us keep coding away.

$ python tests/dog_test.py
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Automatic test discovery and running

We specified a collection of tests above (dog_test.py). But we
will be writing a lot of tests, and we want to be able to run all
of them in one shot. We are going to use the ‘nose’ test
discovery and execution tool to find and run our tests.

# setup.py
setup(...
     test_suite="nose.collector",
     tests_require="nose",
     )

The tests_require line will make easy_install download and put nose in
the current directory if nose is not already installed.

$ python setup.py test
... <downloads nose>

...
test_greeting (tests.dog_test.DogTest) ... ok
...

(If it fails the first time; just run python setup.py test again.)

The main function

We have a speaker library, but we don’t have a “main” script
yet. You often have to create a separate file just for the
“main” script, which is (should be) just a wrapper script that
imports some module and calls a function in it. In fact, for a
large package, we may have many “main” scripts, each doing
nothing more than importing the required packages and modules and
calling some function in there.

We can use the setuptools ‘Entry points’ mechanism for this. An
‘entry point’ is the name of some functionality of the
package/application; entry points come in groups; two groups are
pre-defined: “console_scripts” and “gui_scripts”. Setuptools can
auto-generate wrapper scripts for our entry points.

Here is how we can tell setuptools to generate a console script
that does something useful:

# setup.py
setup(...
      entry_points={
        'console_scripts': [
            'rundog = speaker.dog:DogMain',
            ],
        },
     ...
     )

Now, when we do a python setup.py develop, or a user installs
our egg, a script called ‘rundog’ will be generated and
automatically put somewhere in the path. The script will called
the DogMain function in the speaker.dog module with no
arguments, and the return value of the function will be the
exit status of the script.

What would the DogMain function be like?

# speaker/dog.py
...
def DogMain():
    d = Dog()
    print d.greeting()
    return 0

Now, when we run ‘develop’ again, setuptools will generate the rundog script
for us:

$ python setup.py develop
...
Installing rundog script to .../bin
...
$ rundog
Bow, wow!

We should keep a minimum amount of code in DogMain and put most
of it in discrete, well tested functions. This helps make code
more robust and re-usable.

About version numbers

Until now, other projects using our Speaker package have been
checking out code from our code repository and using it directly.
Now it is time to make an ‘official’ release. We shall release
v0.1 (the version we have been working on, and the one specified
in setup.py) of our package (and remove ‘dev’ from the release
name). For easy_install:

0.1a < 0.1b … < 0.1dev < 0.1 < 0.1-1 < 0.1-2 …

Steps:

  1. We create a release branch
  2. On the release branch, we edit setup.cfg. Currently, it probably says:
    [egg_info]
    tag_build = dev
  3. We change it to:
    [egg_info]
    tag_build =
  4. Now we can generate a ‘release’ version and copy it to some download page.
    $ python setup.py sdist bdist_egg
    $ ls dist/
    Speaker-0.1-py2.5.egg  Speaker-0.1.tar.gz
  5. Back on the main branch, we prepare to work on the next version by
    changing the version number in setup.py to 0.2.
  6. Our main branch releases are now ‘0.2dev':
    $ python setup.py sdist bdist_egg
    $ ls dist/
    Speaker-0.2dev-py2.5.egg  Speaker-0.2dev.tar.gz

Post releases

So, we have released v0.1 of Speaker. However, there is a bug:
there is no README.txt! This bug has just been fixed on the
trunk. The trunk is not going to be stable until the next
release, which is a month away, and we have to release
a bugfix NOW!

Steps:

  1. We checkout the release branch;
  2. We cherry pick the desired commit from trunk to our release branch;
  3. We edit setup.cfg in the release branch and add a post-release tag:
    [egg_info]
    tag_build = -1
    tag_svn_revision = false
  4. We make a new release:
    $ python setup.py sdist bdist_egg
    $ ls dist/
    Speaker-0.1_1-py2.5.egg  Speaker-0.1-1.tar.gz
  5. And tag the new release (we always tag outgoing stuff)

And now we have a bugfix update to our 0.1 release. If we upload
it to the distribution dir, easy_install will pick it in
preference to the older 0.1 release.

Defining dependencies

We are probably going to be using a bunch of libraries when
developing our project. We can define a dependency requirement
like this:

# setup.py
setup(...
     install_requires=["SQLAlchemy"],
     ...
     )

Now, when we do a python setup.py develop, or a user installs
our egg, easy_install will find the latest version of SQLAlchemy
from PyPI, download it, and install it.

Other projects can depend on our Speaker project in the same way.

Restricting dependency versions

Let’s say we know that SQLAlchemy has a stable 0.4 branch, and 0.5 beta in
progress. We don’t want 0.5 beta versions. How do we tell
setuptools to install the highest 0.4 version, but not any 0.5
version?

First, we have to find out what the smallest version on the 0.5
branch is. Then we need to chenge our requirement to:

SQLAlchemy < 0.5.0a

Where “0.5.0a” refers to the first version ever of the 0.5
branch. (This version does not have to exist; it should just be
smaller than the smallest version you want to ignore.) One needs
to be quite careful about choosing the right version number.
Saying only 0.5, or only 0.5.0, would not not have worked, because 0.5.0rc1
is “smaller” than 0.5.0 or 0.5!

Let’s say we know that our stuff works with 0.4.3 and higher
versions of SQLAlchemy, but does not work with 0.4.2 or below.
Our requirement can look like this:

SQLAlchemy >= 0.4.3, < 0.5.0a

Also, note that:

  • If a version of SQLAlchemy is installed system wide that
    satisfies the dependency version requirement, easy_install
    will download and install that version. Hence, we should
    avoid polluting the system python site-packages.
  • Easy_install will not upgrade your dependency automatically
    when you run it later, even if a newer version of the
    dependency is available, as long as the installed version
    satisfies your dependency version requirement.

Dependencies not on PyPI

What if the library we need is not on PyPI? What if it is
actually developed and packaged by another group in our company,
and available only from an internal release page?

We can get dependencies like these by telling setuptools to look
at a particular URL.

# setup.py
setup(...
      install_requires=[
        "SQLAlchemy >0.4.3, <0.5.0a", # On PyPI
        "hello", # An Affle package
      ],
      dependency_links = [
        "file:///home/parijat/Python/" # find Affle packages here

        ],
     )

Now setuptools will look first in /home/parijat/Python. If
hello and SQLAlchemy eggs are there, it will use them. If an egg of
the eggs is not found there, then it will go to PyPI.

More than one dependency link can be specified.

Developing binary eggs (C extensions)

Now we come to the interesting bit: binary packages. We can use
the Python C API to write extension modules, and let distutils
build them. But there are easier ways.

Version 0.2: Pyrex extensions

Pyrex is “a Language for Writing Python Extension Modules”. The
greatest benefit is that Pyrex makes it easy to convert types
between Python and C.

Writing extensions in Pyrex

We’ll demonstrate this with a new speaker class, and we shall choose Gendibal
for this task. Here is the interface to Gendibal:

# tests/gendibal_test.py
import unittest

from speaker import gendibal

class GendibalTest(unittest.TestCase):
    def test_greeting(self):
        g = gendibal.Gendibal()
        self.assert_(g.greeting() == "Hello 29")

Gendibal is a mathematical speaker, and happens to like the 10th
prime number a lot. Now we only have to define the Gendibal class:

# speaker/gendibal.pyx
...
class Gendibal(object):
    def greeting(self):
        return "Hello %s" % primes(10)[-1]

def GendibalMain():
    g = Gendibal()
    print g.greeting()
    return 0

and add a new entry point:

# setup.py
setup(...
      entry_points={
        'console_scripts': [
            'rundog = speaker.dog:DogMain',
            'rungendibal = speaker.gendibal:GendibalMain',
            ],
        },
     ...
     )

Note:

  • the definition of this class is in a file with the .pyx suffix, indicating that this is a Pyrex, not Python file.
  • the definition is a Python definition. Pyrex code can contain normal Python code.

We have not defined the primes function yet. Here is the definition of the
primes function, in the same .pyx file:

# speaker/gendibal.pyx
...
def primes(int kmax):
  cdef int n, k, i

  cdef int p[1000]
  result = []
  if kmax > 1000:
    kmax = 1000
  k = 0
  n = 2
  while k < kmax:
    i = 0
    while i < k and n % p[i] <> 0:
      i = i + 1
    if i == k:
      p[k] = n
      k = k + 1
      result.append(n)
    n = n + 1
  return result

This is Pyrex code. It looks very much like Python, with some type annotations.

Building Pyrex extensions

Setuptools can build Pyrex files “out of the box”, as long as the
Pyrex compiler is somewhere on the path. Let’s get Pyrex:

$ easy_install pyrex

We need to tell setuptools about our extension, though:

# setup.py
from setuptools import setup, find_packages, Extension
...
setup(...
      ext_modules=[
        Extension('speaker.gendibal', ['speaker/gendibal.pyx']),
        ],
     ...
     )

And that’s it! We can build the egg:

$ python setup.py bdist_egg
...
running build_ext
pyrexc speaker/gendibal.pyx --> speaker/gendibal.c
...
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.5 -c speaker/gendibal.c -o build/temp.linux-i686-2.5/speaker/gendibal.o
gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions
build/temp.linux-i686-2.5/speaker/gendibal.o -o
build/lib.linux-i686-2.5/speaker/gendibal.so
...
creating stub loader for speaker/gendibal.so
byte-compiling build/bdist.linux-i686/egg/speaker/gendibal.py to
gendibal.pyc
...

Note:

  • The Pyrex code gendibal.pyx was converted to C code gendibal.c by the
    Pyrex compiler;
  • The extension gendibal.so was compiled;
  • A wrapper python script gendibal.py to load the extension was automagically
    created for us.

Now there are two tests:

$ python setup.py test
...
test_greeting (tests.dog_test.DogTest) ... ok
test_greeting (tests.gendibal_test.GendibalTest) ... ok
...

Wasn’t it handy we are using ‘nose’? Our new test is discovered
and run for us without having to add it anywhere.

We can run our new ‘main’ script:

$ rungendibal
Hello 29

Pyrex can not only be used to convert Python code to C, but it
can help us interface to existing C code/libraries.

Version 0.3: Boost.Python extensions

What about libraries/code in C++? Pyrex does not help there, and
wrapping around C++ code with Python C API can be tricky.
Boost.Python to the rescue.

Writing extensions in Boost.Python

Let’s say we have the following C++ library:

# speaker/bjarne.cpp
#include <string>
#include <iostream>

namespace { // Avoid clutering the global namespace
    class BjarneCPP {
    public:
        std::string greet() const { return "Hello, C++ World!"; }
    };

    int BjarneCPPMain() {
        BjarneCPP b = BjarneCPP();
        std::cout << b.greet() << std::endl;
    }
}

As can be seen, there is a class named BjarneCPP with an
interface very similar to our speaker interface, except that it
has a greet method, instead of our usual greeting method.
There is also a BjarneCPPMain function, that looks like a good
candidate to be a main function in our application. This looks like a useful
library. How do we access it in Python?

We can wrap it in Python like this:

# speaker/bjarne.cpp
...
#include <boost/python.hpp>
using namespace boost::python;

BOOST_PYTHON_MODULE(bjarne) {
    class_<BjarneCPP>("Bjarne", init<>())
        .def("greeting", &BjarneCPP::greet)
        ;
    def("BjarneMain", BjarneCPPMain, "The main function for 'bjarne'' module");
}

(For convenience and brevity, we’ve added our code in the same file.
Realistically, the code to be wrapped would be in a library, and
we would link against that library at build time.)

As usual, we do not forget to write our tests:

# tests/bjarne_test.py
import unittest
from speaker import bjarne

class BjarneTest(unittest.TestCase):
    def test_greeting(self):
        b = bjarne.Bjarne()
        self.assert_(b.greeting() == "Hello, C++ World!")

and define an entry point:

# setup.py
...
setup(...
      entry_points={
        'console_scripts': [
            ...
            'runbjarne = speaker.bjarne:BjarneMain',
            ],
      ...
      )

Building Boost.Python extensions

Now we need to tell setuptools about the new extension:

# setup.py
...
setup(...
     ext_modules=[
     ...
        Extension('speaker.bjarne',
                  ['speaker/bjarne.cpp'],
                  libraries=['boost_python']),
        ],
     ...
     )

And that’s it. We can create an egg:

$ python setup.py bdist_egg
...
building 'speaker.bjarne' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.5 -c speaker/bjarne.cpp -o build/temp.linux-i686-2.5/speaker/bjarne.o
...
g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-i686-2.5/speaker/bjarne.o -lboost_python -o build/lib.linux-i686-2.5/speaker/bjarne.so
...
creating stub loader for speaker/bjarne.so
...
byte-compiling build/bdist.linux-i686/egg/speaker/bjarne.py to bjarne.pyc
...

Again, setuptools has compiled our extension module, linked it
against the libraries specified (boost_python), and generated a
wrapper (‘bjarne.py’) for us.

We can run our tests, and our new test will appear:

$ python setup.py test
...
test_greeting (tests.bjarne_test.BjarneTest) ... ok
test_greeting (tests.dog_test.DogTest) ... ok
test_greeting (tests.gendibal_test.GendibalTest) ... ok
...

And our new entry point works too:

$ runbjarne
Hello, C++ World!

easy_install annoyances

  • easy_install does not upgrade dependencies when upgrading a
    package;
  • easy_install does not, by itself, have a way of specifying
    exact versions of all dependencies of a package;
  • it is possible to force easy_install to not download anything
    from the Internet but to install everything from a given
    location; this can be used to mitigate unexpected versions of
    dependencies being installed;
  • easy_install, by itself, will install packages in the
    system-wide python site-packages directory; this can be a big
    annoyance. It is highly recommended to use virtualenv.

Credits

Egg jargon/terminology taken from: http://grok.zope.org/documentation/tutorial/introduction-to-zc.buildout.

About these ads
Post a comment or leave a trackback: Trackback URL.

Comments

  • Parag Shah  On September 8, 2009 at 1:24 pm

    Thanks, I found this tutorial very helpful

  • kambas  On July 8, 2010 at 7:45 am

    Million thanks fro the good and helpful illustration.

  • Sergey Vasilyev  On July 23, 2011 at 10:36 am

    Thank you very much. Now I have an idea how branching and version releasing should go in git to satisfy my needs.

    So, we have master branch (“trunk” in svn), where we make a development. There we hold a version.py module with only one __version__ variable defined. Actually, this is a generated file, which contains the output of `git describe`.

    When we make a release, we first tag this master branch with “0.1” tag (or, maybe, “0.1rc1″ or whatever else). And then we create a branch “0.1rc1″, disable dev mode and re-generate version.py, and then run sdist + bdist commands.

    So we have the version tag visible on both branches, version.py generated and updated with different modes (dev on “master” vs release on “0.1rc1″), and a branch ready for selective post-release patches.

    Good, goooood. I’ll try it now :-)

  • Alesson Zaire  On October 25, 2011 at 10:44 am

    This tutorial was very useful. Thank you!

  • mehmetalianil  On October 29, 2011 at 4:42 am

    That was as concise as this could get. Thanks.

  • Phyl Crandall  On April 29, 2012 at 1:55 am

    What a great help this was to me. Thank you, thank you.

Trackbacks

  • […] programming — Tags: eggs, python, setuptools — parijatmishra @ 9:09 pm In the post Python Packaging: setuptools and eggs, I described how to use setuptools to create a distributable egg.  Installing the egg would […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 81 other followers

%d bloggers like this: