About Jolt.NET Libraries

Inspired by the Boost C++ libraries, Jolt.NET aims to complement the .NET Base Class Library (BCL) with algorithms, data structures, and general productivity tools. It is the hope of the authors that the features of Jolt.NET will one day be part of, or represented in the BCL and the .NET Framework.

Circular Collections and Enumeration

Sunday evening I committed revision #33654, which includes the first additions to the Jolt.Collections library: circular lists and enumerators.  A complete overview of the features enabled by these types, including usage examples, is available on the library's documentation page.  In this post, I will discuss some of the implementation decisions went into producing these types, as well as give a brief overview of how the new types work.

Adaptors v.s. Collections

The goal of the initial library release was to support three data types:

  • An enumerator capable of enumerating any collection in a circular manner
  • A circular list, with semantics similar to the System.Collections.Generic.List class
  • A circular linked list, with semantics similar to the System.Collections.Generic.LinkedList class

All three of these types were to be implemented as adaptors, providing circular-list semantics as a pseudo-"view" of the underlying collection.  In this view however, insertions and removals would indeed be propagated to the underlying collection wheres as other operations remained read-only.  In order to keep the implementation simple and straightforward (maintaining as many forwarding methods as possible), I decided to implement the circular collections with copy construction semantics (i.e. no adaptation), and in terms of their List and LinkedList counterparts.  It didn't seem right to provide an adaptor whose operations may or may not transform the underlying collection.  Furthermore, the implementation of the Shift operator for a linked list (see below), complicates the implementation of other operations when implemented as a non-destructive operator.

Cyclical v.s. Circular Collections

Another trade-off that was made was in choosing to implement cyclical collections, or the less-general circular collections (cyclical collections join the last collection element to any preceding collection element).  The challenge with a cyclical collection was in using an existing enumerator to implement the enumeration algorithm, again aiming at minimizing the amount of code to rewrite (e.g. version checking, storing collection element references, etc...).  With a cyclical collection, the enumerator needs to reset itself to an arbitrary position in the underlying collection, and unfortunately, there is no good way to do this in a generic fashion -- the .NET enumerator interface provides no such functionality.  The best we could do is store the reference to the start of the cycle and then implicitly enumerate to that element when enumerating beyond the end of the collection.  Clearly, this performs very poorly for large collections.

In the end, I chose circular collections for simplicity.  However, in retrospect, implementing a cyclical collection would not be too difficult if I chose not to reuse certain aspects of the List or LinkedList collections and their enumerators.

Circular Enumerator

CircularEnumerator<T> is an adaptor class accepting an IEnumerator<T> instance at construction time.  The enumerator forwards calls to the underlying enumerator except that when enumeration progresses beyond the last collection element (i.e. Next() returns false), the underlying enumerator is reset and moved to the first collection element.

It goes without saying that you should not use a circular enumerator within a foreach loop unless there is an explicit break condition with the loop body.

Circular List

CircularList<T> is effectively a List<T> with circular list semantics.  That is, the GetEnumerator() method returns a CircularEnumerator<T>, and given indexes for indexing operations are allowed to exceed the bounds of the collection (for positive indexes).  Modular arithmetic translates a user-specified index to the correct internal List index.

CircularList<T> also implements operator>>() and operator<<(), which effectively "rotates" the circular list, changing the sequence of elements.  For instance, if a collection containing the ordered elements {10, 1, 5, 58, 32} is forward-shifted by two elements, the resulting collection contains the elements {5, 58, 32, 10, 1}.  Since List<T> stores its elements in a contiguous memory block, CircularList<T> implements shifting as a "view" on the underlying collection to avoid the copying or moving of collection elements.  A virtual head-index is maintained, and user-specified indexes are adjusted accordingly.

Circular Linked List

CircularLinkedList<T> is effectively a LinkedList<T> with circular list semantics.  That is, the GetEnumerator() method returns a CircularEnumerator<T>, and the node-access methods return a CircularLinkedListNode<T> object.  Similarly, CircularLinkedListNode<T> encapsulates a LinkedListNode<T> and provides circular navigation semantics through its Next() and Previous() methods.

On a side note, the .NET 3.5 implementation of LinkedList<T> is internally implemented as a circular linked list.  The last collection element connected to the first element, and because of this, the implementation of certain list operations is simplified.  Unfortunately, this implementation detail is completely hidden from the user, and can not be leveraged by the CircularLinkedList<T> implementation.

Much like CircularList<T>, CircularLinkedList<T> also implements operator>>() and operator<<().  However, since LinkedList<T> doesn't provide direct-access to an arbitrary element, each call to a shift operation transforms the underlying collection.  An optimization in the shift algorithm places an upper-bound on the total number of shifts to N/2, where N is the size of the collection.  We accomplish this by noting that a forward shift of k elements is identical to a backwards shift of N - k elements.  Hence the maximum number of shifts occur when k = N - k, or k = N/2.

Circular Linked List Node

CircularLinkedListNode<T> encapsulates a LinkedListNode<T>, is used for inserting new nodes into the collection, and representing existing nodes in the collection.  Since LinkedListNode<T> exposes the list in which the node is contained, CircularLinkedListNode<T> must hide this list, but expose the circular list for consistency.  Furthermore, CircularLinkedListNode<T> references are never stored in a container; they are created on-demand.  Thus, it is possible to create two different CircularLinkedListNode<T> object for the same underlying node.  To minimize the impact of this draw-back, CircularLinkedListNode<T> implements IEqualityComparer<T> which performs a reference comparison on each object's underlying nodes.

0 comments: