About Jolt.NET Libraries

Inspired by the Boost C++ libraries, Jolt.NET aims to complement the .NET Base Class Library (BCL) with algorithms, data structures, and general productivity tools. It is the hope of the authors that the features of Jolt.NET will one day be part of, or represented in the BCL and the .NET Framework.

GraphML Serialization for FSMs

As of revision 18123, it is now possible to serialize an FSM to GraphML and perform the inverse deserialization operation.  However, there are some caveats with the current implementation that I will discuss below.  In the mean time, if you would like to review how to use this feature, please refer to the FSM documentation.

Copying Data and Intermediate Types

When serializing to GraphML, any serilizable data needs to be represented as a primitive type (a numerical or string type).  Furthermore, in order for a property value to be serialized, the QuickGraph GraphML serialization facility requires that the property be decorated with XmlAttribute.  Unfortunately, implementing support for GraphML serialization wasn't as easy as just decorating some properties with XmlAttribute since properties of each state are held in the FiniteStateMachine class (start/final state markers).

To address this issue, I created two intermediate types: one to hold all serializable transition data (GraphMLTransition), and the other to hold all serializable state data (GraphMLState).  In essence, each of these types represents the <node> and <edge> GraphML elements.  Each time the FSM is serialized, it is copied into a new graph of the intermediate serializable types, and then the new graph is serialized.  Similarly, when the GraphML is deserialized, the resulting graph is copied into a new FSM.

I could have avoided creating these extra types and implementing the copy process, but to do so would have cluttered and complicated the overall FSM implementation; XML-serializable properties must be public!  For states, an actual state class is needed containing the start/final state flags (in essence, the GraphMLState class).  Storing start/final state flags on a state class creates a very sparse data set which I wanted to avoid (there is only ever one start state, and generally very few final states), and consequently I chose to maintain the string representation for the state.  For transitions, each delegate needs to be represented as a string implying a new read/write property.  This new property doesn't make sense as it is unnatural for a user to set a delegate/event using a string representation.

Serializing Delegates (and events)

The TransitionPredicate property and OnStateTransition event are both delegates, which are difficult to serialize/deserialize to/from XML.  You can't expect a delegate to be serialized using XmlSerializer becuase an XML-serializable type requires a parameterless constructor.  So to accomplish this, delegate-to-string-to-delegate conversion code is required.

A delegate may reference either a static or instance method, and in order to deserialize an instance method, the serializer needs to reconstruct the object state that owns the method.  In otherwords, the method's parent object needs to be serialized as well.  This does not create a user friendly scenario for creating an FSM via XML, so I prohibited this type of method from being serialized all together.  Binary serialization is better suited for this task, which is a feature that will be implemented in the future.

So, for delegate serialization to work, your delegate must reference a static method.  When serialized, it will have the following form: methodName;assemblyQualifiedDeclaringTypeName.  An example of a serialized delegate for the [System.Char.IsDigit] method is: IsDigit;System.Char, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.

During deserialization, if the method is discovered to be invalid (i.e. it does not exist, the signature isn't that of a predicate, etc...) then it will be replaced with a default predicate.  The default predicate returns false for any input value. 

Events pose a challenging conversion task because an event is really a multicast delegate in disguise.  Each method subscribed to the event will need to be validated and serialized as descibed above.  Furthermore, this will be a reflection-heavy task since the delegate is stored in a compiler-generated field.  Currently, I have avoided implementing this feature and will consider its implementation in the future.

Thinking About XML Documentation Comments

One of the features for the Jolt 0.2 release is to add support for propagating XML comments for proxy methods in the Jolt.Testing library.  During the past holiday season, I started to dabble in an implementation and consequently started to research what is required to complete this task.  Ideally, I would like to support a syntax that is depicted as follows.


using System;
using System.Reflection;
using System.Xml.Linq;

void ParseXmlComments(Assembly assembly)
{
XmlDocCommentReader reader = new XmlDocCommentReader(assembly);
XNode[] comments = {
reader.GetComments(assembly.GetType("MyType")),
reader.GetComments(assembly.GetType("OtherType").GetMethod("f")),
reader.GetComments(assembly.GetType("OtherType").GetProperty("g")),
reader.GetComments(assembly.GetType("YetAnotherType").GetConstructor(Type.EmptyTypes))
};

XDocument dom = new XDocument("root", comments);
}


The easy part of supporting such an implementation is parsing the XML file so that a MemberInfo or Type object can index a block of XML (the comments).  The rules for doing so are straightforward, and documented in the MSDN help system.

The tricky part lies in the first line of psuedo-code: inferring the location of the XML file from an Assembly object.  XML comments are always installed in a directory that is either very near to, or the same as the installation directory for an assembly.  This is done by convention and facilitates the file's location from Visual Studio's Intellisense.  Given that the Assembly class offers CodeBase and Location properties, it may appear trivial to locate the file.  However these properties do not guarantee finding the reference assembly path for a .NET Framework assembly, which is the path that contains the XML file.  The ".NET Matters" article from MSDN Magazine (June 2004) suggests using RuntimeEnvironment.GetRuntimeDirectory() to locate the reference assembly path, but this fails for .NET 3.0 and .NET 3.5, which use version 2.0 of the common langauge runtime.

Given these issues, I should note that the path inference algorithm using the CodeBase and Location properties will work for assemblies that are loaded from the same directory as the host application.

The following is a summary of my research and conclusions relating to supporting the locating of the XML comments file for an Assembly object.

Reference Assemblies, Location, and CodeBase

The reference assemblies path is the location in which the .NET Framework installer places assemblies (and their XML comments) prior to installing the assemblies in the global assembly cache (GAC).  An IDE or compiler will reference the assemblies from this location, while the runtime will generally load assemblies from another location (i.e. the GAC or the local directory of the running application).  Assemblies from .NET 3.0 and onwards are installed to an explicit reference assemblies directory, but are loaded from the GAC.  For previous .NET Framework versions, assemblies are loaded from the path they were installed to.

The Location path is the location from which an assembly is loaded.  For the .NET Framework assemblies (any version), this is generally going to be the GAC.  For user assemblies, it could be something else (like the local application folder), and a shadow-copied assembly will yield the shadow-copy path in its Location property.

The CodeBase path is the location from which an assembly was first found.  Suzanne Cook gives the example that this path may be an internet URI for a downloaded assembly.  Furthermore, Cook states that the CodeBase property is not guaranteed to be set for a GAC-loaded assembly.

Given these distinctions, it is clear that the reference assemblies path is the ideal path to locate since it will always be the same regardless of how the assembly was obtained or loaded.

Programmatically Locating the Xml Comments File

The CLR has no knowledge of a reference assembly path, so it makes sense that you should not expect a .NET Framework API that knows how to locate the path.  However, during my efforts to find such an API, I came across the MSBuild task ResolveAssemblyReference.  This task will give the physical disk location of the given assembly, the paths to all of its assemlby references, and the paths to any related files.  The related files could be XML, PDB, or any extension that you provide.  Jomo Fisher gives an example of using the managed API for this task to resolve assembly references in his F# code.

I thought this class would solve my problem, but upon closer inspection of the program output you will notice that the XML files are not found, even for assemblies that share the same directory as their XML files (i.e. System.Core.dll).  I tried different configurations of the class, but with no success. 

So, in order to locate the XML comments file for a given assembly, I will have to rely on the technique used by Lutz Roeder's Reflector: search a predefined list of paths, which is expandable with user data.  This has the drawback of needing to update the library each time a new .NET Framework is released (as it may introduce a new reference assembly path), but it is a good short-term fix until I can figure out why the ResolveAssemblyReference class doesn't function as I expect it to.