It’s been a while since I wrote about Jolt.NET, so I thought I’d give a preview of the functionality from the XML assertions I’ve been working on. I’m currently working on wrapping-up the core assertion functionality, which can ultimately be exposed through test-framework-specific interfaces.
When I attempt to verify XML in unit tests, I usually am performing one of the following tasks.
- Schema validation
- Verifying that a tree contains a particular value in a specific location
- Verifying that a tree contains the same structure and values as another tree
For the purposes of this discussion, I will focus on the third point as it is trivial to accomplish the first two using .NET XML validation and XPath, respectively. On a side note, Jolt.NET XML assertions will support all three of these operations.
What does it mean for two XML documents to be equal? In the strictest sense, we can apply a byte-per-byte comparison of two XML files and determine if all byte pairs match. Clearly, this doesn’t work in the general case since things like whitespace, character encoding, and attribute ordering, will all impede the success of the equality algorithm. All of these items can vary greatly in an XML document and still not change the semantics or structure of the XML.
In order for the algorithm to work in the general case, it must treat XML entities (elements, attributes, etc…) as single units and define equality semantics for those units. Furthermore, XML parser support can eliminate the need for dealing with whitespace, processing instruction elements, comments, and other entities that do not affect the semantics of the XML document.
That said, here is a recursive definition that determines if two XML elements are considered to be equal by value (XML document equality is achieved by comparing root elements for equality). The algorithm currently used by Jolt.NET XML assertions is based on this definition, and relies on the parsing provided by XmlReader to simplify the implementation.
Elements E and F are equal if and only if:
- The namespace of E equals the namespace of F
- The name of E equals the name of F
- The text node values contained in E are equal to the text node values contained in F, and must be in the same order
- The set of attributes contained in E equals the set of attributes contained in F
- The number of child elements of E equals the number of child elements of F
- For each child element pair (E.child[i], F.child[i]), E.child[i] is equal to F.child[i]
Attributes A and B are equal if and only if:
- The namespace of A equals the namespace of B
- The name of A equals the name of B
- The value of A equals the value of B
You will notice that this algorithm does not discriminate on the position of an element’s attribute. All that is required is that the attribute in question exists in the set of attributes of the compared element, as defined by the rules for attribute equality. On the contrary, child element ordering is considered as part of the evaluation since an XML schema [element-sequence] generally implies order.
Sometimes, constraints such as namespaces and element ordering pose too much of a restriction and make XML comparison tedious. Jolt.NET XML assertions aim to address this issue by allowing the option to relax some of the constraints of the equality algorithm.
Equivalency – Relaxed Equality
In Jolt.NET, equivalency between two XML elements is defined by choosing to relax any combination of a set of constraints. The following directives are currently supported.
- For element comparisons, ignore element namespaces
- For element comparisons, ignore element values (child text nodes)
- For element comparisons, ignore ordering of child elements
- For element comparisons, ignore their attributes
- For attribute comparisons, ignore attribute namespaces
The interesting algorithm from the above list of directives is the one that processes child elements, yet ignores their ordering. Here is another recursive definition that determines if two sets of child elements are equivalent. The algorithm currently used by Jolt.NET XML assertions is based on this definition.
Assume that the method bool AreEquivalent(element, element) exists and, denotes if two given elements are equivalent according to the user-specified equivalency directives and constraints, listed above. Then, two sets of child elements, C and D, are equivalent if and only if:
- The number of elements in C equals the number of elements in D
- For each child element C[i], there exists a unique child element D[j] such that AreEquivialent(C[i], D[j]) == true
Note that the term “unique” is important – it denotes that any given element can only be matched once for the entire operation. The use of indexes i and j shows that the ordering of elements is not important and that a search is required to locate the desired element.
So there you have it – the definitions of equality and equivalency for XML. Do let me know if you think I’ve missed something in my definitions!