Friday, August 24, 2007

libxml++ vs xerces C++

When I was reading "API: Design Matters" I recalled one example of good API vs bad API. Actually my example is more about good API documentation vs bad API documentation but I suspect there is a correlation between these two things. It is definitely hard to write good documentation if your API sucks.

So my story is that I had a task to read XML data in C++ application. XML data was small and performance of this part of the application was not critical so it looked like the simplest way to read this data was to load DOM tree for XML document and just use DOM API and maybe couple simple XPath queries. It was the first time I needed to do this in C++; I had no previous experience with any XML C++ libraries. So, I do google search (or maybe it was apt-cache search - I don't remember) and the first thing I find is xerces C++. Quote from project's website:

Xerces-C++ makes it easy to give your application the ability to read and write XML data.
Sounds good, just what I need. So I dig documentation and find it to be completely unhelpful as it is just Doxygen autogenerated undocumentation. Fine, I can read code, let's check sample code then. I open sample code and I find that the shortest example how to parse XML into DOM tree and how to access data in the tree (DOMCount) consists of two files which are more then 600 lines long in total. Huh? I don't want to read 15 pages of code just to learn how to do two simple actions: parse XML into DOM and get data from DOM. Other examples are even more bad. Several files, several classes just to read and print freaking XML (DOMPrint). You've got to be kidding me. It cannot be that hard.

I don't really want to waste hours to learn API I'm unlikely to use ever again. After all I don't write much C++ code and I definitely don't write much C++ code that needs XML. So time to search further. Next hit is libxml++. It is C++ wrapper over popular C XML library libxml. This time there is actually some documentation that does try to explain how to use the library. And this documentation contains an example which while being just about 150 lines manages to demonstrate most of library's DOM API.

End result: I finish my code to read my XML data in next 30 minutes using libxml++. It is simple, short and it works.

So what's wrong with xerces C++? There is no introduction level documentation at all. Examples look too complex for the problem they are supposed to show solution for. And the reason for this is that API is just bad: it requires writing unnecessary complex client code.

Update: boris corrected me about lack of introduction level documentation in a comment to this blog post. Turned out I missed it. As a weak excuse I'll blame bad navigation on the project's site :)

Thursday, August 16, 2007

4 silly mistakes in use of MySQL indexes

1. Not learning how to use EXPLAIN SELECT

I'm really surprised how many developers who use MySQL all the time and who do not know or understand how to use EXPLAIN SELECT. I've seen several times developers proposing serious architectural changes to their code to minimize, partition or cache data in their database when the actual solution was to spend 30 minutes thinking over result of EXPLAIN SELECT and adding or changing couple indexes.

2. Wasting space with redundant indexes

If you have multicolumn index it means you don't need a separate index which is subset of the first index. It is easier to explain with an example:

col1 INT,
col2 INT,
PRIMARY (col1, col2),
KEY (col1)
Index on col1 is redundant as any search on col1 can use primary index. This just wastes disk space and might make some queries which change this table a bit slower.

There is one but! See below..

3. Incorrect order of columns in index

Order of columns in multicolumn index is important. From MySQL documentation:
MySQL cannot use an index if the columns do not form a leftmost prefix of the index.
col1 INT,
col2 INT,
col3 INT,
KEY (col1, col2)
MySQL wont use any indexes for query like
SELECT * FROM table2 WHERE col2=123
EXPLAIN SELECT shows this instantly. If you want to run this query faster either change order of columns in the index or add another one.

4. Not using multicolumn indexes when you need to

MySQL can use only one index per table in a time so if you query by several columns in the table you may need to add multicolumn index. Example:
col1 INT,
col2 INT,
col3 INT,
KEY (col1)
Query like
SELECT * FROM table2 WHERE col1=123 AND col2=456
would use the index on col1 to reduce number of rows to check but MySQL can do much better if you add multicolumn index which covers both col1 and col2. The effect of adding such index is very easy to see with EXPLAIN SELECT.