About Jolt.NET Libraries

Inspired by the Boost C++ libraries, Jolt.NET aims to complement the .NET Base Class Library (BCL) with algorithms, data structures, and general productivity tools. It is the hope of the authors that the features of Jolt.NET will one day be part of, or represented in the BCL and the .NET Framework.

Limitations to XML Doc Comment Parsing

Hello everyone!

I've been working on improving the support for XML doc comment queries with metadata types, but came across some limitations in the reflection API that have hindered my progress.  Specifically, I've been trying to add support for retrieving doc comments for methods containing a function pointer as a parameter, and methods containing a parameter that has been decorated with an optional or required custom modifier.  Unfortunately, given the current state of the reflection API (for which I will elaborate upon below), I don't believe that I can implement these features and achieve a general solution that works for all inputs.


The goal of this feature is to be able to retrieve the XML comments as given in the following code example.

public ref class T
{
public:
/// <summary/>
/// <param name="param">A function pointer of managed types</param>
void FnPtrMethod(String^ (*param)(TimeSpan, DateTime, array<int>^)) { }
};

Since a function pointer is a native C++ entity, it is not recognized by the CLR and consequently I don't believe it to be possible to adequately represent it with a metadata object.  Given the method FnPtrMethod above, the managed type of its sole parameter param is rendered by the reflection API as System.IntPtr.  This is understandable sense since the function pointer is really a raw native pointer to the address of executable code.  It makes no difference that its arguments are managed types, since the type that encompasses them is not managed. If all we have to work with is an IntPtr instance, then it is impossible to infer that the IntPtr is really a function pointer and fetch its managed arguments for processing.

I'm fairly confident that my analysis is correct, but in the hope of being wrong I've started a discussion on StackOverflow just to be sure.  If you know the trick to get this to work, please respond to this blog entry or to the discussion on StackOverflow.


The goal of this feature is to be able to retrieve the XML comments as given in the following code example.

public ref class T
{
public:
/// <summary/>
/// <param name="x">modopt</param>
/// <param name="y">modreq</param>
void modifiers(const int x, volatile int y) { }
};

This feature may be implemented, but only for a small number of variations in the types of method parameters.  The problem lies in the placement of the ParameterInfo.GetOptionalCustomModifiers() and ParameterInfo.GetRequiredCustomModifiers() methods.  These methods are specific to a parameter, and I believe that they should be generalized and applied to any instance of System.Type.  For instance, consider the following method.

public ref class C
{
public:
/// <summary/>
/// <param name="param">Generic action containing modreq/modopt generic argument.</param>
generic <typename T>
void f(Action<const volatile T>^ param) { }
};

The sole parameter param of function f does not have a custom modifier, however the generic method argument T that participates in the definition of the parameter type does.  In this case, it doesn't make sense to call param.GetOptionalCustomModifiers() since the modifiers apply to T.  To get the metadata for T, we need to call GetGenericArguments() resulting in a System.Type array.  But, System.Type does not provide a means to get its custom modifiers!  A similar problem also exists with array and pointer types as custom modifiers are applied to the element type of these entities.  We need to call GetElementType() on the array/pointer type, but are again presented with a System.Type instance and with no way to get its custom modifiers.

Regarding this issue, I've opened a bug report with Microsoft in hopes that the API flaw will be addressed in a future version of the .NET Framework.

However, all is not lost.  We can still use the ParameterInfo.GetOptionalCustomModifiers() and ParameterInfo.GetRequiredCustomModifiers() methods to get custom modifiers for a parameter that is not an array, pointer, or decorated with the by-ref/out attributes (all of these types require us to call GetElementType() to get the type to which the modifier is applied).  Also, the parameter can not be dependent on a generic method or type parameter having a custom modifier applied to it.  This will at least expose the feature to a small number of methods, but it clearly won't work in the general case.  Consequently, I've decided to postpone implementing this feature until I get clearer direction from the community.  If you would like to see this feature implement in Jolt.NET, please vote it up on the Jolt.NET CodePlex web site.

Recent Jolt.NET Revisions and Xml Doc Comment Parsing

I try to make a habit of posting to the development blog each time a significant feature or piece of code gets committed to the source repository.  Consequently, I would like to use this post to summarize what has been committed in the past month as several updates have been made.  Also, I'll describe some of the work I've been doing on matching XML doc comment elements with their corresponding metadata type from the System.Reflection namespace.

Commit Summary

Maintenance and refactoring is the main aspect of the Jolt.NET 0.4 release.  Prior to this release, I've intentionally delayed many code clean-up tasks as well as performing upgrades of 3rd party dependencies so that I could work on more important features.  Now that all those features are complete, I have spent some time to restore the code to its "pristine" state.  Here is a summary of the recent updates related to maintenance.

  • Updated QuickGraph dependency to QuickGraph 3.3.40824
    • Removed FSM->MSAGL conversion code as MSAGL is no longer supported by QuickGraph (superseded by GLEE)
    • Removed explicit implementation of equality semantics for Transition class as it is now supported natively by QuickGraph's EquatableEdge type
  • Updated RhinoMocks dependency to RhinoMocks 3.6
    • Modified relevant unit tests to utilize Act-Arrange-Assert syntax
  • Update NUnit dependency to NUnit 2.5.2
    • Adopted the use of new constraints to simplify and/or strengthen existing unit tests
    • Added additional unit tests to verify presence of attributes on types and their members, a task facilitated by the new constraints in NUnit 2.5
  • Unit test maintenance
    • Fixed many issues that prevented unit tests from being run in an NUnit project (aggregating many test fixtures)
    • Moved much of the reflection code for accessing types and members by strings into separate classes, improving the readability of some unit tests

The following commits introduced new features that were previously planned to be included with the Jolt.NET 0.4 release.

  • The Jolt.Convert class will now correctly generate the XML doc comments representation of an explicit or implicit operator
    • Predefined .NET operators were already supported
    • Consequently, you can now process XML doc comments with a System.Reflection.MethodInfo type referring to an operator
  • Created the Jolt.Testing.Assertions.VisualStudio.XmlAssert class to integrate the Jolt XML assertions to the Visual Studio test framework

XML Doc Comment Processing

"Processing the XML File (C# Programming Guide)" describes the supported XML doc comment markup for various types, methods, parameters, and fields.  For a given metadata type instance, Jolt.Convert will produce the correct markup, with the exception of the following constructs.

  • Function pointer parameter (ELEMENT_TYPE_FNPTR)
  • Optional understanding modifier (ELEMENT_TYPE_CMOD_OPT)
  • Required understanding modifier (ELEMENT_TYPE_CMOD_REQ)
  • Pinned field (ELEMENT_TYPE_PINNED)
  • Dimensionless and rank-less array (ELEMENT_TYPE_GENERICARRAY)

In order to verify that my implementation is correct, I compare the output of a .NET compiler with the output of the Jolt.Convert class.  Since the C# language does not currently support these constructs directly, other means are required for testing the implementation of the Jolt.Convert class, which are demonstrated below.

To produce XML doc comments with ELEMENT_TYPE_FNPTR, ELEMENT_TYPE_CMOD_OPT, and ELEMENT_TYPE_CMOD_REQ markup, we may use the the C++/CLI compiler to compile the following class.


public ref class XmlDocCommentTest
{
public:
typedef int (*function_ptr)(char, int, double);

void fnptr(function_ptr f); // ELEMENT_TYPE_FNPTR
void mod_opt(const int n); // ELEMENT_TYPE_CMOD_OPT
void mod_req(volatile int n); // ELEMENT_TYPE_CMOD_REQ
};


Function pointers are a common construct for C/C++ developers, but understanding why const and volatile translate to optional and required modifiers requires some explanation.  Paul DiLascia covers this topic in his article "C++ At Work: Rationales, Highlights, and a Farewell".

ELEMENT_TYPE_PINNED is a bit more tricky since the general C++ literature on pinned pointers states that their usage is restricted to non-static local stack variables, which can not be decorated with XML doc comments.  However, the System.Runtime.CompilerServices namespace gives some hints on how discover that a field is pinned.  Unfortunately I do not know of a way to verify this behavior using a .NET compiler or other tool.

Finally, ELEMENT_TYPE_GENERICARRAY appears to be deprecated as I can not locate any reference to it in modern .NET documentation (apart from the aforementioned document).

In the mean time, I plan to implement support for all ELEMENT_TYPE_FNPTR, ELEMENT_TYPE_CMOD_OPT, and ELEMENT_TYPE_CMOD_REQ in Jolt.NET 0.4.  For ELEMENT_TYPE_PINNED, I will wait until the feature is highly requested or until I stumble upon a tool that will produce the desired output.

Jolt.NET 0.3 and Future Features

This past afternoon, I committed source revision #26346, containing the final feature work for Jolt.NET 0.3.  This release took longer than expected because I added features to the release after thinking it may be too small, and those features ended up taking longer to implement than expected.  In the future, I will try to make the releases more timely, as long releases make the project appear unmaintained.

Jolt.NET 0.3 contains the following new features:

Please refer to the issue tracker for all features covered by the release, and to the release page for download options.

Jolt.NET 0.4 will be a short maintenance release, which will involve the upgrade of external dependencies.  The dependency upgrade will allow me to simplify some existing code by utilizing a new external library feature, as well as learn how to use such features so that they may be applied in the future..  Furthermore, there have been many maintenance related tasks I’ve been neglecting, and I feel that now is a good time to address them.  Some new features will be added, but only those were previously identified and not included in a preceding release.

Error Reporting for Xml Equivalency Assertion

In my last post, I sketched out the algorithms dealing with XML equality and equivalency assertions. During the testing of my implementations, I discovered a fundamental problem with the XML equivalency algorithm that made it a nuisance to use: the reporting of the inequivalent element while ignoring element sequence can potentially be too generic and not very usable. In this post, I will describe some of the problems I encountered and the currently-implemented remediation. For conciseness, use of the the term inequivalent will imply that only element sequence ordering is ignored.

XML Equivalency Assertion Algorithm

When we attempt to determine if a list of nodes is equivalent to another list of nodes, we generally perform a set-equality operation on two list-like data structures, comparing each element in one list is to every element in the other list. Furthermore, given that the elements are not ordered, we can’t stop searching for an element once we detect the first mismatch since the element may exist further ahead in the list of elements to search. Reporting the precise element causing the inequivalency thus becomes difficult since we need to store extra information: should we stop the search because the mismatch is indeed the mismatch we are looking for, or should we ignore the mismatch so we can continue searching for the inequivalency in the rest of the tree?

Here is an algorithm that attempts to address the requirements of the XML equivalency assertion, as noted above.


void AreEquivalent(XElement expected, XElement actual)
{
HashSet<XElement> expectedElements = new HashSet<XElement>(expected.Elements());
HashSet<XElement> actualElements = new HashSet<XElement>(actual.Elements());

if (expectedElements.Count != actualElements.Count)
{
// Report 'actual' as offending element.
throw new Exception();
}

foreach (XElement expectedChild in expectedElements)
{
XElement equivalentChild = actualElements.FindFirstOrDefault(e =>
{
try { AreEquivalent(expectedChild, e); }
catch(Exception) { return false; }
return true;
});

if (equivalentChild == null)
{
// Report 'expectedChild' as offending element.
throw new Exception();
}

actualElements.Remove(equivalentChild);
}
}

This algorithm, hereon mentioned as AreEquvialent, runs in quadratic time (expected for set comparison of two lists), and leverages the optimization that only the first inequivalent element need to be reported. If all inequivalent elements were to be reported, we would need to implement one of the many tree-diff algorithms, all of which have greater runtime complexity.

Reporting Offending Elements

Consider the following XML snippets which are clearly inequivalent due to the presence of an unexpected child element from the root node.


<!-- Expected XML -->
<parent>
<descendant xmlns="ns"/>
<descendant xmlns="ns"/>
<childNode/>
</parent>

<!-- Actual XML -->
<parent>
<bogusNode/>
<descendant xmlns="ns"/>
<descendant xmlns="ns"/>
</parent>

We want to report that bogusNode is the offending element, but at the same time we want to avoid categorizing the following related XML snippets as invalid.


<!-- Expected XML -->
<root>
<parent>
<descendant xmlns="ns"/>
<descendant xmlns="ns"/>
<childNode/>
</parent>
<parent>
<descendant xmlns="ns"/>
<descendant xmlns="ns"/>
<bogusNode/>
</parent>
</root>

<!-- Actual XML -->
<root>
<parent>
<bogusNode/>
<descendant xmlns="ns"/>
<descendant xmlns="ns"/>
</parent>
<parent>
<descendant xmlns="ns"/>
<childNode/>
<descendant xmlns="ns"/>
</parent>
</root>

At a first glance, AreEquvialent solves the problem of reporting bogusNode from the first pair of XML documents. However, although AreEquvialent detects the offending element, it fails to report this element when the recursive calls unwind back the the original caller. If you trace AreEquvialent for the examples above, you will notice that it always reports the root element of the XML as the offending element, for any inequivalent pair of XML documents.

Algorithm Improvement

The problem with AreEquvialent is if the condition that detects the offending element evaluates to true in one stack frame, it will also evaluate to true again in the preceding stack frame, until the recursive call completely unwinds. In other words, reporting that bogusNode is inequivalent also causes its parent to be reported as inequivalent through the same condition – see lines 9, 17, and 24..

I spent a considerable amount of time modifying AreEquvialent in an attempt to address this problem. Unfortunately, many of my attempts failed as they either reintroduced the condition issue in another manner, or failed to address the issue of premature search termination. I ultimately realized that while a single-pass through the XML is desirable, it was simply insufficient for what the problem demanded.

As you may recall, the algorithm for strict equality is a linear-time algorithm that positionally compares elements from two documents. If the elements at a given position are not equal, the algorithm terminates immediately and classifies the documents as not equal. Ideally, AreEquvialent could behave this way if the element ordering of the two documents were similar. So, a new solution to the equivalency problem is as follows.

  • Order the elements of a document such that they match the order of elements in the reference document as best as possible
  • Run a the equality algorithm with the reference and transformed documents
    • The algorithm terminates upon detection of the first inequivalent node, or if the two documents are equal

What does “as best as possible” mean? It means that we need AreEquvialent to do some additional work in attempting to detect candidate inequivalent elements during the canonicalization step. For instance, if the best match for a given element is an inequivalent one that differs only in number of child elements, we would want to reorder this element so that the equality algorithm reports a specific and relevant error: inequivalency due to mismatch of number of child elements, for the specific elements involved.

Here is the algorithm that performs the canonicalization of element order. It runs in quadratic-time and is invoked recursively prior to calling AreEquvialent.


void NormalizeElementOrder(IEnumerable<XElement> expectedChildren, List<XElement> actualChildren)
{
int nextUnorderedElementIndex = 0;
foreach (XElement expectedChild in expectedChildren)
{
int numChildElements = expectedChild.Elements().Count();

Predicate<XElement> isEquivalentToExpectedChild = new Predicate<XElement>(e =>
{
// Similar to AreEquivalent, but without the recursive call to determine child element equivalency.
try { AreElementsEquivalent(expectedChild, e); }
catch (Exception) { return false; }
return true;
});
Predicate<XElement> isEquivalentToExpectedChild_Strict =
e => isEquivalentToExpectedChild(e) &&
numChildElements == candidateElement.Elements().Count();

// Look for the element that best resembles expectedChild considering the configuration
// the assertion, along with the number of child elements for expectedChild.
int equivalentChildIndex = actualChildren.FindIndex(nextUnorderedElementIndex, isEquivalentToExpectedChild_Strict);

if (equivalentChildIndex < 0)
{
// We couldn't find a matching element. Relax the search criteria
// by removing the matching child element count constraint and try again.
equivalentChildIndex = actualChildren.FindIndex(nextUnorderedElementIndex, isEquivalentToExpectedChild);
if (equivalentChildIndex < 0) { return; } // No match found, stop normalizing element order.
}

SwapElement(actualChildren, equivalentChildIndex, nextUnorderedElementIndex);
++nextUnorderedElementIndex;
}
}

Is this the best solution?  I hope so!  Readers, please share your thoughts on this algorithm along with ideas for improving it.

Now that the equivalency algorithm is implemented and tested, I can resume working on the NUnit and VSTS adapters to the core assertion library.  I am hoping to have the relevant sources sources committed within the next week or so.

XML Equality and Equivalency

It’s been a while since I wrote about Jolt.NET, so I thought I’d give a preview of the functionality from the XML assertions I’ve been working on.  I’m currently working on wrapping-up the core assertion functionality, which can ultimately be exposed through test-framework-specific interfaces.

When I attempt to verify XML in unit tests, I usually am performing one of the following tasks.

  • Schema validation
  • Verifying that a tree contains a particular value in a specific location
  • Verifying that a tree contains the same structure and values as another tree

For the purposes of this discussion, I will focus on the third point as it is trivial to accomplish the first two using .NET XML validation and XPath, respectively.  On a side note, Jolt.NET XML assertions will support all three of these operations.

Strict Equality

What does it mean for two XML documents to be equal?  In the strictest sense, we can apply a byte-per-byte comparison of two XML files and determine if all byte pairs match.  Clearly, this doesn’t work in the general case since things like whitespace, character encoding, and attribute ordering, will all impede the success of the equality algorithm.  All of these items can vary greatly in an XML document and still not change the semantics or structure of the XML.

In order for the algorithm to work in the general case, it must treat XML entities (elements, attributes, etc…) as single units and define equality semantics for those units.  Furthermore, XML parser support can eliminate the need for dealing with whitespace, processing instruction elements, comments, and other entities that do not affect the semantics of the XML document.

That said, here is a recursive definition that determines if two XML elements are considered to be equal by value (XML document equality is achieved by comparing root elements for equality).  The algorithm currently used by Jolt.NET XML assertions is based on this definition, and relies on the parsing provided by XmlReader to simplify the implementation.

Elements E and F are equal if and only if:

  • The namespace of E equals the namespace of F
  • The name of E equals the name of F
  • The text node values contained in E are equal to the text node values contained in F, and must be in the same order
  • The set of attributes contained in E equals the set of attributes contained in F
  • The number of child elements of E equals the number of child elements of F
  • For each child element pair (E.child[i], F.child[i]), E.child[i] is equal to F.child[i]

Attributes A and B are equal if and only if:

  • The namespace of A equals the namespace of B
  • The name of A equals the name of B
  • The value of A equals the value of B

You will notice that this algorithm does not discriminate on the position of an element’s attribute.  All that is required is that the attribute in question exists in the set of attributes of the compared element, as defined by the rules for attribute equality.  On the contrary, child element ordering is considered as part of the evaluation since an XML schema [element-sequence] generally implies order.

Sometimes, constraints such as namespaces and element ordering pose too much of a restriction and make XML comparison tedious.  Jolt.NET XML assertions aim to address this issue by allowing the option to relax some of the constraints of the equality algorithm.

Equivalency – Relaxed Equality

In Jolt.NET, equivalency between two XML elements is defined by choosing to relax any combination of a set of constraints.  The following directives are currently supported.

  • For element comparisons, ignore element namespaces
  • For element comparisons, ignore element values (child text nodes)
  • For element comparisons, ignore ordering of child elements
  • For element comparisons, ignore their attributes
  • For attribute comparisons, ignore attribute namespaces

The interesting algorithm from the above list of directives is the one that processes child elements, yet ignores their ordering.  Here is another recursive definition that determines if two sets of child elements are equivalent.  The algorithm currently used by Jolt.NET XML assertions is based on this definition.

Assume that the method bool AreEquivalent(element, element) exists and, denotes if two given elements are equivalent according to the user-specified equivalency directives and constraints, listed above.  Then, two sets of child elements, C and D, are equivalent if and only if:

  • The number of elements in C equals the number of elements in D
  • For each child element C[i], there exists a unique child element D[j] such that AreEquivialent(C[i], D[j]) == true

Note that the term “unique” is important – it denotes that any given element can only be matched once for the entire operation. The use of indexes i and j shows that the ordering of elements is not important and that a search is required to locate the desired element.

So there you have it – the definitions of equality and equivalency for XML.  Do let me know if you think I’ve missed something in my definitions!

Jolt.NET Feature Triage

You may have noticed through the feature RSS feed that I've been busy opening/closing items and shuffling things around.  I've postponed work on some documentation tasks that didn't make much sense for a project of one developer.  Also, I decided that I won't implement the Tuple library as the upcoming release of the .NET Framework 4.0 will contain an implementation similar to what I had planned.  .NET 4.0 is currently available in beta/CTP for download.

The triage slashed the number of items that I wanted to complete for Jolt.NET 0.3, and that prompted me to think about including another feature in the release.  When I first started work on Jolt.NET, I wanted to include a test facility to make assertions on XML data structures.  After releasing Jolt.NET 0.1, I decided not to implement this feature as it would be available in NUnit 3.0.  However, I think this decision was premature as not all testing frameworks will support this feature.

So over the next few days, I will be drafting an implementation and feature set for extending a test framework to support assertions on XML data.  I suspect that I will need some adapter classes to expose the feature in various test frameworks, and at this time, I'm not sure how many of these frameworks to support.

Static Idempotent Functors

The task of replacing inline anonymous methods with a generic functor from the Jolt.Functional library brought forth some interesting issues, all of which have been addressed as of revision #19483.  The main topic that I will discuss in this post deals with the implementation and usage of the Functor.Idempotency and Functor.TrueForAll generic factory methods.

The TrueForAll factory method creates a delegate that returns true for any input, and thus it is natural to implement this method in terms of the Functor.Idempotency method since TrueForAll is a Boolean idempotent predicate.  My main use of TrueForAll in Jolt.NET was in testing the serialization of finite automata to GraphML.  If you’ve used this library, you may recall that there are limitations when serializing of a TransitionPredicate.  Specifically, the predicate must refer to a static method.  So, is it possible to implement a generic idempotent method that is static?  Let’s take a look at the function signature that we need to satisfy, along with the current implementation.

using System;

// One-arg idempotent method; overloads exist for zero through four args.
static Func<T, TResult> Idempotency<T, TResult>(TResult constant)
{
return arg => constant;
}

In this example, the method that ultimately implements the inline expression can not be static since it uses a variable that is not local to the expression (see “Anonymous Methods and Jolt Functors: Behind the Scenes” for more information).  Consequently, the resulting delegate refers to a non-static method.  If the hidden class that implements the expression were static, then we could only ever have one idempotent delegate per unique specialization. per app-domain, as demonstrated in the following example.


using System;

static class IdemptotencyWrapper<TConstant>
{
static TConstant constant;
static TConstant Idempotency<T>(T arg)
{
return constant;
}
}

static Func<T, TResult> Idempotency<T, TResult>(TResult constant)
{
IdempotencyWrapper<TResult>.constant = constant;
return IdempotencyWrapper<TResult>.Idempotency<T>;
}

void UseFunctors()
{
Func<int, bool> falseForAll = Idempotency<int>(false);
Func<int, bool> trueForAll = Idempotency<int>(true); // Oops! Changed the behavior of falseForAll.
}

Another novel attempt at solving this problem is to use an expression tree to force the user-provided constant to be local to the expression.

using System;

static Func<T, TResult> Idempotency<T, TResult>(TResult constant)
{
Expression<Func<T, TResult>> expression = Expression.Lambda(
Expression.Constant(constant, typeof(TResult)),
new[] { Expression.Parameter(typeof(T), arg) });
return expression.Compile();
}

This solution looks good, has some performance deficiencies we can live with, and may even provide the static method that we need.  However, the method that is produced is compiled and created at runtime.  In other words, it doesn’t exist in the assembly and thus not feasible for textual serialization.

I’m also uncertain on whether this method is truly static.  The MethodInfo object representing the delegate’s method states that IsStatic is true, but the delegate’s Target property gives a non-null value.  Very strange.

If we want to persist the method that is represented by the expression, we can utilize the Reflection.Emit API and create an assembly that contains the desired method.  The performance deficiency is even greater with this solution and is overkill for the general usage of the Idempotency method.  While the serialization problem is addressed, the deserialization problem is not; this generated assembly is not distributed with the library and thus the referenced methods may not exist in deserialization environment.

In conclusion, I could not conceive of a way to modify the Idempotency method implementation to meet my criteria, so I decided to keep it as is.  The implementations of TrueForAll and FalseForAll were modified to return constants local to the expressions, thus making the resulting delegate refer to a static method.