I notice often developers don’t pay much attention to the XPath queries they use. And then some complaining about XML performance. Of course, XML means overhead but we’re here not to make it worse. For example, such convenient descendant-or-self axis specifier (AKA “//”) can be frequently abused without realizing how much harm it can bring to high volume XML processing applicaitons. Say we have an XML message with simple structure like this:


<Root>
  <Inventory1>
    <item0>0</item0>

    <item499>499</item499>
    <Inventory2>
      <item500>500</item500>

      <item999>999</item999>
    </Inventory2>
  </Inventory1>
</Root>


(yeah, it looks strange, but some pretty wild schemas may be seen in enterprise integration scenarios)


Suppose we want to parse item999 so the lazy query would be “//item99”. This will scan entire document tree at all levels in search for the desired node. The optimal solution would be to narrow down query to “/Root/Inventory1/Inventory2//item999” given we have assurance that this element will always be somewhere under Inventory2. Is it worth it? Well, on my desktop using .Net 2.0 with the test XML document above (1000 itemN elements) this yeilds almost 3.5 times performance gain. Real life gains can be less or greater depending on schema, number of elements and their distribution. The bottom line, performance losses caused by inefficient XPath can be critical for high throughput messaging solutions.