For those of you who haven’t noticed yet: Dare Obasanjo has published a refined version of the work David Orchard did about a year ago.  Doing so, Dare explains best practices in designing an extensible xml schema.  Writing schemas that are both forward and backward compatible is not easy, believe me.  Even an experienced schema author can be tricked by some of the requirements the xml schema spec needs you to comply with.

One of the most import things to be aware of when designing xml schemas is the requirement on the xml content model to be deterministic.  Some people like to refer to this as the “Unique Particle Attribution Constraint”.  For the normative definition of this constraint, I’d like to refer to the W3C.  No one could explain this so fuzzy as they can!!  MSDN does certainly a better job 🙂

Even the very first xml spec itself has recommendations about this in a non normative section.  For compatibility reasons with SGML. In addition section 3.2.1 in this specification says: “For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model.”

As for the question “why”… even Tim Ewald, recently wondered why exactly xml schema had this requirement in the first place…  I can only guess… but it certainly makes writing an xml parser way easier since only one single lookahead symbol is required… (While otherwise some form of backtracking is needed, which could mean an enormous perf hit!)

.NET is a good xml citizen and requires you to comply with this constraint.  If you don’t, you won’t even be able to validate your schema.  Good work guys!  As close to the standards as possible!

Could someone explain to me then, why it is that “the world’s leading product family of XML development tools” – laughing out loud now – doesn’t even *support* detection of this constraint!!!  .NET, BizTalk Server, Word and InfoPath certainly do a better job!  It’s not because of Xerces supporting this kind of bad schema’s that they have an excuse not to implement at least a check for this critical type of content model requirements!  Even worse, I came across this post in their public FAQ.  They publicly state that the detection of a non-deterministic model as an error would be wrong!  Pfffff… so far for the standards.

Share this post: Email it! | bookmark it! | digg it! | reddit!