Friday, August 24, 2007

libxml++ vs xerces C++

When I was reading "API: Design Matters" I recalled one example of good API vs bad API. Actually my example is more about good API documentation vs bad API documentation but I suspect there is a correlation between these two things. It is definitely hard to write good documentation if your API sucks.

So my story is that I had a task to read XML data in C++ application. XML data was small and performance of this part of the application was not critical so it looked like the simplest way to read this data was to load DOM tree for XML document and just use DOM API and maybe couple simple XPath queries. It was the first time I needed to do this in C++; I had no previous experience with any XML C++ libraries. So, I do google search (or maybe it was apt-cache search - I don't remember) and the first thing I find is xerces C++. Quote from project's website:

Xerces-C++ makes it easy to give your application the ability to read and write XML data.
Sounds good, just what I need. So I dig documentation and find it to be completely unhelpful as it is just Doxygen autogenerated undocumentation. Fine, I can read code, let's check sample code then. I open sample code and I find that the shortest example how to parse XML into DOM tree and how to access data in the tree (DOMCount) consists of two files which are more then 600 lines long in total. Huh? I don't want to read 15 pages of code just to learn how to do two simple actions: parse XML into DOM and get data from DOM. Other examples are even more bad. Several files, several classes just to read and print freaking XML (DOMPrint). You've got to be kidding me. It cannot be that hard.

I don't really want to waste hours to learn API I'm unlikely to use ever again. After all I don't write much C++ code and I definitely don't write much C++ code that needs XML. So time to search further. Next hit is libxml++. It is C++ wrapper over popular C XML library libxml. This time there is actually some documentation that does try to explain how to use the library. And this documentation contains an example which while being just about 150 lines manages to demonstrate most of library's DOM API.

End result: I finish my code to read my XML data in next 30 minutes using libxml++. It is simple, short and it works.

So what's wrong with xerces C++? There is no introduction level documentation at all. Examples look too complex for the problem they are supposed to show solution for. And the reason for this is that API is just bad: it requires writing unnecessary complex client code.

Update: boris corrected me about lack of introduction level documentation in a comment to this blog post. Turned out I missed it. As a weak excuse I'll blame bad navigation on the project's site :)


Anonymous said...

I think you've missed the Xerces-C++ Programming Guide. It covers both DOM and SAX usage.

As for examples, I agree they are overly complex for a beginner. Though, to be fair, they are much closer to a real-world application in that they allow you to turn on/off support for namespaces, XML Schema validation, different input/output encodings, etc. These are common expert vs novice and serious application vs quick hack problems.

Ilya Martynov said...

Boris, thanks for correction. I've updated my blog post.

Anonymous said...

still doesn't plug and chug. grr... how to add library right?

Rodger said...

same like you
totally do not understand the xerces……thanks your advice!

Anonymous said...

It's charming that the link to the "easy" Xerces documentation, is now broken. (missed it or Xerces-C++ Programming Guide).

Anonymous said...

I just want to ask a doubt about Xerces-C++ , will there be any "cout" like statements or to be more specific logging in this parser ..what is the default path of this log file , if it is created.

Kevin said...

Normally I wouldn't comment on such an old blog post, but it's Feb. 2009 and the documentation for Xerces is still completely inadequate. If the developers think this is enough documentation for people to get xerces working in their app, they are dreaming. And it's not that I'm an inexperienced programmer- I just don't have all damn day to play with their API and read their source. Especially not for something like XML. I'll be trying out libxml and seeing how that goes.

antred said...

Xerces is a bloated, poorly designed and the documentation is attrocious. I fully agree with the original post. By the way, I too ended up using libxml2 instead. It's not exactly modern C++ (actually it's not even modern C IMO) but it's light-weight, doesn't get in my and (as you pointed out) actually has some useful documentation to go with it.

Unknown said...

I have been using Xerces-c for over two years in a project and very recently get in contact with libxml++. I am not migrating to that library but it seems to be easier to develop applications. I am sure that Xerces is more complete and compliant with W3C spec but I think that libxml++ has 30% that makes 90%.

Herwig said...

I fully agree with the initial post of this blog.

Initially my application used TinyXml. In a moment of mental aberration I decided that I need to change that to something more "standard" and started to use xerces-c 2.8. (although everything I needed worked with TinyXml)
It took me hours of TryAndError and googling around to get my simple DOM operations work the way it worked before.

My next idiocy was the idea to update xerces-c from 2.8 to 3.1.1. It turned out that I will have to spend another several hours, because they simply changed almost everything completly.

Still thinking of: "I need to use a common standard" I thought of using libxml2 and found this blog which pointed me to libxml++.

And: Wow! I did not know that handling XML could be that simple. It took me just minutes to get my code running again: Easier, less code and more intuitive than ever...

I do not understand why xerces-c is almost undocumented at all. Their few samples are barely helpful. (DOM)Parsing an XML Document from std::string is a real challenge and makes you produce miles of code. I don't like that...

Ilya: thank you!

Anonymous said...

I am in the situation that I am forced to use Xerces-C++ since I need the extra features Boris describes like the schema validation.

However: In 2011 their documentation is still inadequete and Boris his link currently 404's but at least there is in some places some doxygen documentation which helps a little.

But to make matters worse, I need to use Xalan-C as well to transform the XML I receive into flat files. The last release of this was in 2005 (standard is from 1999 so no real need for a new release once it is fully working) and 99% of it contains no documentation at all. The doxygen is just empty and just shows the function definitions.

IMO this is poor performance for a library developer(s)

Anonymous said...

I think poor design of Xerces mostly has to do with XML APIs which were designed for Java. We should just stop using XML but if there is no choice and we have to use it - Xerces is the only full implementation.