CS and the City Sean Lynch

Gripe: XML in Python

I hadn’t even finished writing my post announcing my new love of Python when I stumbled into one of its skeleton-filled closets: XML.

The Python core libraries include six different methods for parsing and creating XML, none of which feel particularly Pythonic (here I am, three weeks into developing with Python and already I’m calling out core libraries as not being Python-y enough).  I missed the low overhead methods I had used in other languages. Particularly for parsing XML, PHP’s simplexml is hard to beat, and for building, it’s hands-down Ruby’s XML Builder.  Off I went, hunting for Python ports.

Warning: The following is a tangent.

This may be an unfair statement, but I get the impression that there’s a slight “Not built here” bias in the Python world.  Python has a substantial number of best of breed functionality, both core and third-party, but my initial impression is that they’re a little reluctant to adopt solutions championed by other languages.  Example: Where’s the Python equivlent to CPAN or gems?

Here’s my point:  I found Python ports, but they lacked in the qualities I loved about Python.  Maturity and Active Development.

First on my list was simplexml for PHP.  It allows the developer to access attributes and text through variable and list combination.  To get similar functionality in Python, I found handyxml. Handyxml allows equally brief tree traversal and iteration of multiple items.  Unfortunately, it hasn’t been updated since early 2004 and a lot of the dependencies have moved or are gone completely.  As such, it required some modifications just to get it into a functional state.  Not ideal.

The other functionality I missed was XML Builder in Ruby.  XML Builder takes full advantage of blocks in Ruby to allow nesting of xml element creation that makes the structure of the resulting document blindingly obvious in the code. This is in stark contrast to the Java-esque series of createNode, appendNode that Python (and Java and Objective-C) love. I managed to dig up a recent port by Jonas Galvez of XML Builder for Python.  He took advantage of the upcoming ‘with’ statement in Python 2.6 to achieve the same effect.  Though it had some problems handling unicode characters (remind me to submit a patch to github) and the documentation is minimal, I was able to get it up and running very quickly.  Better.

I know from my digging in recent weeks that there’s been some talk about refactoring Python’s urllib/urllib2 code for Python 3 to simplify the module and remove duplication.  I sincerly hope the XML libraries fall underneath the same knife, and that the solutions from Ruby and PHP are considered for a graph.