CS and the City

  • rss
  • Home
  • Resume

Gripe: XML in Python

Sean Lynch | July 18, 2008

I hadn’t even finished writing my post announcing my new love of Python when I stumbled into one of its skeleton-filled closets: XML.

The Python core libraries include six different methods for parsing and creating XML, none of which feel particularly Pythonic (here I am, three weeks into developing with Python and already I’m calling out core libraries as not being Python-y enough).  I missed the low overhead methods I had used in other languages. Particularly for parsing XML, PHP’s simplexml is hard to beat, and for building, it’s hands-down Ruby’s XML Builder.  Off I went, hunting for Python ports.

Warning: The following is a tangent.

This may be an unfair statement, but I get the impression that there’s a slight “Not built here” bias in the Python world.  Python has a substantial number of best of breed functionality, both core and third-party, but my initial impression is that they’re a little reluctant to adopt solutions championed by other languages.  Example: Where’s the Python equivlent to CPAN or gems?

Here’s my point:  I found Python ports, but they lacked in the qualities I loved about Python.  Maturity and Active Development.

First on my list was simplexml for PHP.  It allows the developer to access attributes and text through variable and list combination.  To get similar functionality in Python, I found handyxml. Handyxml allows equally brief tree traversal and iteration of multiple items.  Unfortunately, it hasn’t been updated since early 2004 and a lot of the dependencies have moved or are gone completely.  As such, it required some modifications just to get it into a functional state.  Not ideal.

The other functionality I missed was XML Builder in Ruby.  XML Builder takes full advantage of blocks in Ruby to allow nesting of xml element creation that makes the structure of the resulting document blindingly obvious in the code. This is in stark contrast to the Java-esque series of createNode, appendNode that Python (and Java and Objective-C) love. I managed to dig up a recent port by Jonas Galvez of XML Builder for Python.  He took advantage of the upcoming ‘with’ statement in Python 2.6 to achieve the same effect.  Though it had some problems handling unicode characters (remind me to submit a patch to github) and the documentation is minimal, I was able to get it up and running very quickly.  Better.

I know from my digging in recent weeks that there’s been some talk about refactoring Python’s urllib/urllib2 code for Python 3 to simplify the module and remove duplication.  I sincerly hope the XML libraries fall underneath the same knife, and that the solutions from Ruby and PHP are considered for a graph.

Categories
Python
Tags
handyxml, modules, python, simplexml, xml, XML Builder
Comments rss
Comments rss
Trackback
Trackback

« Confirming everything that’s ever been said about Python Innovation Place menu gadget and the state of iGoogle »

5 responses

Have you tried elementtree? http://docs.python.org/lib/module-xml.etree.ElementTree.html - Paddy.

Paddy3118 | July 18, 2008

Have you tried elementtree?

http://docs.python.org/lib/module-xml.etree.ElementTree.html

- Paddy.

Aye, I've run into this issue a few times while

Brad | July 18, 2008

Aye, I’ve run into this issue a few times while writing one-time-use XML pull scripts… I’ve always ended up locking my door and sheepishly writing an incredibly Q&D sgmllib.SGMLParser while nobody’s looking.

Note though that Python import/export of JSON is so incredibly beautiful that it makes me tear up just thinking about it.

"Example: Where’s the Python equivlent to CPAN or gems?" The cheese

Oolis | July 20, 2008

“Example: Where’s the Python equivlent to CPAN or gems?”

The cheese shop/easy_install?

I must agree with the ElementTree comment. It is probably

Pete | July 20, 2008

I must agree with the ElementTree comment. It is probably the most powerful, fastest, and the most Pythonic.

Unfortunately becomes frustrating to use when it comes to xml namespaces. You can find little corner case helpers, but to read, parse, and write xml with namespaces you are better to go with anything else (weep).

You might also want to look at lxml over at

Nick Hofstede | July 20, 2008

You might also want to look at lxml over at http://codespeak.net/lxml/ which looks a lot like ElementTree, but has more complete xml support.

Leave a comment

You can use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Navigation

  • Business
    • Apple
    • Google
    • Microsoft
    • Yahoo
  • Canada
  • Copyleft
  • Development
    • Interfaces
    • Protocols
    • Python
  • How-to
  • Reviews
  • School
  • Technology
    • Gadgets
    • Software
  • Truthiness

Search

rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox