SQLAlchemy: one classes – two tables

SQLAlchemy is a wonderful ORM for Python. While it allows the normal “class per table” semantics, one can do some more interesting stuff with it.

We often start with a table for some stuff.  For e.g., we might start off our app with a User class mapping to a user table, which has columns for an id, name and an address. For illustration, the list of columns is small. In a real application, this table would probably contain a lot more columns.

Here is some boilerplate code that initiates our database connection, etc.:

from sqlalchemy import *
from sqlalchemy import sql
from sqlalchemy.orm import mapper, sessionmaker, create_session

dburl = "mysql://testuser:testpasswd@localhost/test"
engine = create_engine(dburl)
meta = MetaData(bind=engine)

Let’s define a table and a class mapping to the table.

t_users = Table(
        'users', meta,
        Column('id', Integer, primary_key=True),
        Column('name', String(40), nullable=False),
        Column('address', String(200))
        )
class User(object):
    def __init__(self, name, address=None):
        self.name = name
        self.address = address
    def __repr__(self):
        return "<User(%s,%s,%s)>" % (self.id, self.name, self.address)
mapper(User, t_users)

To create the database, we might do something like the below, or roll out own database schema management system:

meta.create_all()

We store and retrieve User objects from the database like this:

# create a session
session = create_session(bind=engine)
# for testing only... ensure table is empty to start with
session.execute(t_users.delete())
# create and storea record
u = User('Beowulf', 'Denmark')
session.save(u)
session.flush()

# retrieve records from the db
session.clear()
u = session.query(User).filter_by(name='Beowulf').one()
print u

The output of the above snippet would be:

<User(2,Beowulf,Denmark)>

It might make sense in the beginning to put everything related to a user in one table. After a while, as the application grows, one starts seeing some patterns. Imagine that the address field above was very large (or imagine that there was a ‘photo’ field containing the user’s photograph). However, the address field is used only in one corner of the application, and only the name is used in the vast majority of the application code. What’s more, the large binary/string field is slowing down other queries. One of the ways to solve this problem is to split the user table into two:

t_users = Table(
    'users', meta,
    Column('id', Integer, primary_key=True),
    Column('name', String(40), nullable=False),
    )
t_addresses = Table(
    'addresses', meta,
    Column('user_id', Integer,
           ForeignKey('users.id'), primary_key=True),
    Column('address', String(255)),
    )

meta.create_all()

Now, whereever we are interested in only the user’s name, we’d use a User class. Where-ever we are interested in the address, we’d use an address class.

class User(object):
    pass
class Address(object):
    pass
mapper(User, t_users)
mapper(Address, t_addresses, properties={
        'user': relation(User, backref='address',
                         primaryjoin=t_users.c.id==t_addresses.c.user_id)
        })

session = create_session(bind=engine, transactional=False)
session.execute(t_addresses.delete())
session.execute(t_users.delete())

u1 = User(); u1.name = "Hagar"
session.save(u1)
a1 = Address() ; a1.user = u1 ; a1.address = "Denmark"
session.save(a1)
session.flush()

Now, we are presumably saving a lot of time and memory on our queries. But, what if we want the equivalent of the original User object, one which refers to both the name and address? Well, we can define a class mapped to a join and use it:

usersaddresses = sql.join(t_users, t_addresses,
                           t_users.c.id == t_addresses.c.user_id)
class UserAddress(object):
    def __repr__(self):
        return "<FullUser(%s,%s,%s)" % (self.id, self.name, self.address)
mapper(UserAddress, usersaddresses, properties={
        'id': [t_users.c.id, t_addresses.c.user_id],
        })
f = session.query(UserAddress).filter_by(name='Hagar').one()
print f

Note the “id” column in the “properties”: we’ve told SQLAlchemy that the “id” attribute of our UserAddress class actually is the same as the “users.id” or “addresses.user_id” columns, which are always the same. Thus SQLAlchemy will NOT produce redundant “id” and “user_id” attributes in our class.

We can even use this class to change and save attributes, and the attributes will go to their respective table!

f.name = "Hagar the horrible"
f.address = "Copenhagen"
session.flush()

for (id, name, user_id, address) in session.execute(usersaddresses.select()):
    print id, name, user_id, address

We get:

11 Hagar the horrible 11 Copenhagen

Advertisements
Post a comment or leave a trackback: Trackback URL.

Comments

  • Jorge Hojyo  On February 11, 2010 at 2:03 am

    Hi,
    You mention a field containing a photograph. How does sqlalchemy do it?
    I’m searching a generic way that won’t crash if I change from SQLite to Postgresql.
    Thanks in advance, Jorge.

  • parijatmishra  On February 11, 2010 at 10:55 am

    Jorge: I would use a Python “string” field to hold the photo raw data, and the corresponding table column would be defined as a BLOB column. SQLAlchemy would be able to map my string to the BLOB and back again.

    I have not tested this out though.

  • Thomas Norheim  On March 10, 2010 at 10:40 pm

    With the UserAddress solution as described above, every User which has no Address in the DB would not be returned when querying for UserAddress objects because of the join.

    Other than manually ensuring that the both objects is present in the database, is there some way to make sqlalchemy enforce this? Or is there some other way to show them both even if just one of them is present in the database?

  • parijatmishra  On March 11, 2010 at 12:19 am

    Thomas:

    The way I have set up the tables above, it is not possible for a record in ‘addresses’ table to exist without an owning record in the ‘users’ table because of the foreign key constraint. So, the problem boils down to: how to get UserAddress objects where there is a ‘users’ record, but no ‘addresses’ record for that ‘user_id’.

    The solution to your problem is a one liner: just change the line:

    usersaddresses = sql.join(t_users, t_addresses,

    to:

    usersaddresses = sql.outerjoin(t_users, t_addresses,

    The OUTER JOIN construct will fetch records from the “left” table even when there is no corresponding record in the “right” table.

    (If you had the situation where you could have addresses without users (perhaps because you are dealing with a legacy database with bad data), you should ask yourself if you really want to get hold of addresses without users within your application at all. You might be better off writing a data cleanup script that removes such addresses. But should you really need to have addresses who have no user_id (i.e., NULL user_id column values) or a bad user_id, I think the proper thing to do is to model them explicitly as a distinct class, and then you can fetch them as needed.)

    Thank you for your question, because it gave me an opportunity to dig around SQLAlchemy a bit more. We can see that SQLAlchemy is a rather complete and thought out ORM.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: