Project name changed and blog moved

Looks like I’ve been negligent in posting here. We’ve renamed the gXML project to GenXDM.

Consequently, we’ve got a new blog for hosting material.

Please check out the materials in their new locations.

Posted in Uncategorized | Leave a comment

Migrating Stateful Code to gXML – Episode 2

State Won’t Cause Problems, Will It?

In the previous discussion about migration to gXML, I explored the conversion of one simple stateless method. That post explored some of the general patterns of migrating code to use gXML, by way of deprecating old methods and adding new ones.

With this discussion, I want to take up a much more tangled scenario – a class that holds state, and how to migrate that.

Parallel Tracks

I had the most trouble with the ElementProxy class.  This class has a protected field in it of type Element.  A search of the existing source quickly revealed quite a number of subclasses that directly refer to this field. To change the class to work with gXML, though I needed to make sure that the class ElementProxy ended up being parameterized by <N>, and that a field corresponding to this existing field had to be of type N, rather than of type Element.  Remember, though, that my goal is to migrate the library incrementally – existing classes have to operate unmodified, at least until I have a chance to convert them.  So I took an approach of having parallel tracks.  Where I started with this:

protected Element _constructionElement;

… I ended up with this:

/** @deprecated **/
protected Element _constructionElement;

private N _wrappedElement;

private N _lastSyncedElement;

public void setElementNode(N elem) { ... }

public N getElementNode() { ... }

Before explaining this further, perhaps I have to explain the two scenarios more fully. In the easy case, a given subclass has been fully modified to use the new gXML-based approach, and that class uses new getter/setter methods I defined. This easy scenario works, because all access goes through the the get and set methods.  In the harder case, existing clients directly set and get the old protected member that I want to replace.  Unless I wanted to convert the entire class hierarchy stemming from this one class all at the same time, I needed a way to support a partially ported class hierarchy stemming from this one class.

In my first attempt at this, I didn’t have the “_lastSyncedElement” member. This didn’t work, though.  Instead, I found code directly setting the protected member, and yet I had already rewritten the parent class to use the new private member via the getter method.  Since Java doesn’t give me any way to detect when a protected member is altered, the value returned by the get method didn’t match the value directly set into the protected method, and tests failed.  The solution, simple enough, was to add a third copy of the field, as show above, the “lastSyncedElement”.  The third copy kept track of the last value set using the new API.  Now, the “get” method, whenever it went to get the new private value, it would first check to see if the “last sync value” matched the value of the old protected field.  If it didn’t, it would copy the value from the protected field into the new private member.

This change meant that all the as-yet-unmodified DOM-based code could run unmodified by directly accessing the “_constructionElement” field.  All the new code would call the “set” method to set the value, and the object being set was of type Element, I would also set the original protected value, and its shadow – “_lastSyncedElement”.  The new set  and get methods look like this:

protected void setElementNode(N elem) {
    _constructionElement = (elem instanceof Element) ? (Element) elem : null;
    _lastSyncedElement = _constructionElement;
    _wrappedElement = elem;
}

public N getElementNode() {
    if (_constructionElement != _lastSyncedElement) {
        _lastSyncedElement = _constructionElement;
        _wrappedElement = (N) _constructionElement;
    }

    return _wrappedElement;
}

In this way, any direct changes to the old protected member would be detected and used by new code.

Tricks For Constructors

ElementProxy gave me another problem, too.  It had existing constructors that took Element object as parameters.  In the gXML world, that had to be turned into an element of type N, and an XmlContext<N> ctx, so that it would be possible to do something with the type N parameter.  So similar to the approach used with the purely static methods I discussed in my previous writeup, I deprecated the existing constructor, and had it call a new constructor with a new DOM-specific context object.

That solved one problem, but another cropped up.  Some subclasses of ElementProxy had a pair of constructors, each with a single parameter, but one taking Document, and one taking Element.  In gXML, both Document and Element collapse to type N.  This meant that a new constructor method taking a context and an N could no longer distinguish these two cases.  As a somewhat silly solution, I changed it so that the “Document” form of the constructors took an extra “do nothing” boolean parameter (which I named “unusedDiscriminator”).

Perhaps A Different Approach?

I might have followed an alternate course.  I could have wrapped the _constructionElement member with methods to access it.  That is, I might have swept through the code base once to add calls to get/set methods that still used the Element type.  This would have let me directly intercept all attempts to set the value.  However, as I went through and converted code, I would have had to replace these method calls with ones that returned and set values of type <N>.  I think this means that I would have needed to touch each source file twice – once to convert it to use the method wrapping access to an Element, and then to convert those calls to something wrapping <N>.  In the end, I think the approach I took probably worked just as well, if not slightly better.

End Result

The end result of this careful backwards compatibility left me with unnecessary code.  Once I converted all of the code – all of the subclasses of this class, and all of the direct uses of the protected field, I was left with no direct access to the protected field, and I have found I can now remove the parallel tracks.  But the intermediate solution did let me worry about migrating code a file at a time, rather than a class hierarchy at a time.  I think that made the extra complexity worth it.

Posted in Migration | Leave a comment

Notes from Balisage

Last week (3-6 August 2010), we attended Balisage in beautiful Montréal, Quebec. Mad props to Tommie Usdin, Michael Sperberg-McQueen, and all the organizers and instigators of the conference.

We were able to present a paper, and also had the opportunity to do a followup in a “nocturne,” the Balisage informal evening meeting format. It all seemed to go really well, which was an enormous relief, of course. Once the paper’s posted, we’ll link to it from here. If you’re interested in (or excited about!) XML, you really should check out Balisage and its Balisage Series on Markup Technologies … oh, and get yourself to Montréal next August!

A number of questions arose in the course of presentation, question and answer, followup, and in private discussions. I’m going to treat each of those in a sort of mini-series of posts, here, so I thought that I would set out a bit of a list, here, to let folks know what to expect.

  • Why doesn’t it have a better name?
  • Why should I implement now?
  • Has the opportunity for an in-memory API other than the DOM already passed?
  • Why not use an existing in-memory tree model API, then?
  • What’s the advantage to using gXML?
  • What is this “bridge” or “handle/body” pattern and why should I care?
  • What can it do right now, today?
  • Can it really grow to do all the things the developers say it could do?
  • Give me some numbers!
  • Why isn’t there a better web site, source code repository, and all the other infrastructure?

That looks like a pretty adequate start, no?

:-)

Posted in Connections | Leave a comment

Node Conversion

If my XML processor is written with the gxml API, it can operate over any tree model for which it has a gxml bridge. But if required to produce a specific type of tree model, perhaps to provide data to a legacy, non-gxml application, I may need to convert my underlying node tree to the type which the legacy application requires. As long as I have a bridge to that desired tree model, that conversion will be a piece of cake.

In this sample, I’ll demonstrate such a conversion from an XmlNode, the node type for the gxml reference bridge, to a DOM Node. I’ll start with a convenience method for handling generic node conversions. The crux of this conversion is the creation of a cursor over the source <Nsrc> node, and a builder for the target <Ntrgt> node. Then, I simply write the content of the cursor to the builder:

final static public  Ntrgt untypedConversion(final Nsrc srcNode,
	final ProcessingContext srcPcx, final ProcessingContext trgtPcx) {
	final Cursor srcCursor = srcPcx.newCursor(srcNode);
	final FragmentBuilder trgtBuilder = trgtPcx.newFragmentBuilder();
        srcCursor.write(trgtBuilder);
        return trgtBuilder.getNode();
}

In order to inject the specific source and target node types, I’ll write a concrete implementation, wherein the injection is managed solely by the creation of appropriate processing contexts for each type of node:

static public void main(String[] args) {
	ProcessingContext cxContext = new XmlNodeContext();
	ProcessingContext domContext = new DomProcessingContext();
	SampleConverter.convertSample(args[0], cxContext, domContext);
}

The source code for this example can be downloaded from our developer site, and this particular example lives in the samples project, in the SampleSerializer and Cx2DomSampleConverter classes.

Posted in Sample | Tagged , | Leave a comment

Migrating to gXML – episode 1

A Simple Conversion?

I’ve been spending a fair amount of time migrating the existing Apache XML Security library to use gXML, as a way to validate the APIs, and because I think it will eventually be quite useful.  This has given me plenty of opportunities to notice what is involved in such a migration. The following explores one particularly simply method, to see what I’ve learned from it, and code like it.

I’ve made any number of extremely rote changes.  An example of code before conversion:

public static Element getNextElement(Node el) {
    while ((el!=null) && (el.getNodeType()!=Node.ELEMENT_NODE)) {
      el=el.getNextSibling();
    }
    return (Element)el;
}

And the same code after:

public static <N> N getNextElement(Model<N> model, N el) {
    while ((el!=null) && (model.getNodeKind(el) != NodeKind.ELEMENT)) {
       el=model.getNextSiblingElement(el);
    }
    return el;
}

The above before and after snippets make the changes look really simple, and rote, but they expose some of the core tricks involved in migrating existing code to gXML.

Where Does Model<N> Come From?

The Model interface, of course, is core to what makes gXML work.  It appears above as the first parameter to the rewritten method. Yet, where does it come from?  The simplistic answer is that I just pass that instance in.  That means that every method that calls the above code needs to be parameterized with <N>, and add a parameter for Model, and every method that calls those, and so on, and so on…. Pretty soon, I’ve signed up to convert the entire library to use gXML, and I won’t know if it works until I’ve converted everything.  Worse yet, I will have broken all of the existing clients, and I will have rewritten all of the unit test cases.  Further, in rewriting all of the test cases and the code it tests I’ve potentially introduced subtle changes in behavior; who is to say whether or not I’ve introduced a bug in the code?  Clearly, this isn’t the way forward.  Seems like it is essential to do the migration as incrementally as possible, preserve all existing test cases unmodified, and even continue to work with all existing clients.

I solved the above conundrum in a way that seems simple, in retrospect.  Rather than remove the old method, I merely deprecated it, and rewrote it as a one-line method to call the new implementation. Since I know that the old method uses DOM, I can create a utility method somewhere in the code base that returns me a singleton implementation of the Model class for DOM, and pass that as the first parameter.  I called this singleton class the “XmlContext” for the library.  The new version of the now deprecated method looks like this:

public static Element getNextElement(Node el) {
    return (Element) getNextElement(XmlContext.getDomModel(), el);
}

One part of the beauty of the design of gXML shows up here.  Since the Model implementation is, by definition, stateless, there are no concurrency concerns, and no statefulness concerns.  In short, no subtle complexity to worry about.  Future clients, using the gXML enabled version of the library, will pass in the correct Model instance, but for backward compatibility, I know there is exactly one instance of one Model class that I need, and consequently, this approach adds little overhead beyond one additional method call to fetch that instance.  Naturally, as the library conversion progresses, the one call to get that Model instance will move higher and higher in the stack, so there will likely only ever be one call to get this Model class per library invocation, even for backwards compatibility, so the overhead truly is minimal.

Of course, I marked this rewrite of the existing method as deprecated.  Now, as I get the chance, within my IDE, as I find a call to the deprecated method, I can look at the deprecation comment I left for myself (“New clients should use….”), follow the instructions there, apply the same trick to the method I’m in, and repeat the process until done.  At every stage, all existing clients will continue to work, and all the unit tests will pass.

Crucial Step: Eliminate Warnings

Well, at least, for the XML Security library, after a few rounds of sizeable migrations, I figured out that I should eliminate compiler warnings in every file I touched.  The code base dates back to before JDK 1.5 and generics, which means it employs the Collection classes without declaring the types in the collection.  By leaving that file with no warnings, the IDE helps me catch all the places where I’ve mixed up Element and <N>.  Otherwise, what I’d see is a test failure, and the point of the failure was some ClassCastException, potentially quite far away from where the actual mistake happened.

Stomping on the warnings many times just meant adding generic parameters to Lists and Maps that had nothing to do with the underlying conversion to gXML.  At some points, it seemed like extra work.  However, once I’d eliminated all those issues though, what remained were the gXML specific issues, and the IDE made it a breeze to fix all those issues.

Shouldn’t I Rewrite That Method?

Before I move on, I want to take a closer look at the example converted method itself.  If you’re familiar with gXML, you’ll notice that the method overlaps with functionality that gXML provides. The gXML NodeNavigator interface provides a navigation method that seems like it does a similar thing, and in fact, that method appears in my rewrite of the function – “getNextElement”.  I read the code carefully, and I see what it does is check whether the current node is null. If so, return that.  Is the current node an element?  If so, return that.  Otherwise, find the next element, if any, that is a sibling of the current node, and return that.  That doesn’t quite align with getNextElement, so there seems to be overlap, but it isn’t immediately obvious.

My next step was to look at how the method was being called. Depending on how the method was called in the first place, perhaps a gXML construct could simply skip calling the method in question? It turns out, for this particular function, it was always being called as follows:

Element el=XMLUtils.getNextElement(element.getFirstChild());

Since the Navigator interface provides a method “getFirstChildElement”, whenever I converted a caller of the method, I replaced the two method invocation above with a single call to the Model interface. After I’d converted enough of the code, I made the method itself unnecessary, and the intent of code is easier to grasp.

Conclusion

The above migration shows a single function with no side effects other than its return value.  This case is the simplest conversion point, and an important one. Among other changes that happened in the code, I was able to piecemeal migrate a large library, and eventually reduce the complexity of the library by taking advantage of the richer capabilities of gXML.

Looking at this simple method sets the stage for tackling more complicated and subtle problems.  So there’s more work to do.  I’ll have to come back and revisit that.

Posted in Migration | Tagged , , | Leave a comment

Discussion at the incubator

We’ve started discussion of gXML as an Apache project on the Apache incubator mailing list (general@) and wiki. If you’re involved in Apache, perhaps you’ll come join the discussion?

Posted in Uncategorized | Leave a comment

First release of gXML with DOM support libraries!

For the first post, it seems appropriate to announce the release of our gXML project with DOM support.  We’ve got more code to follow, but need a little more time to tidy up loose ends so you can feel comfortable with the open source license on the code base.

This release is an great step forward, because it means that you can now actually play with gXML, and have it do real things.  If you’re at all curious, please feel free to download it and kick the tires.  If you want to send feedback, you can comment on entries in this blog, or join our mailing list.

I’m excited about the gXML project because of the possibilities it opens up.  For example, it opens the possibility of finally addressing the problem of incompatible tree models in Java.  At my company, we’ve been using projects like Apache Axis2, but we’ve also needed support for WS-Security, which means using the Apache XML Security library.  Axis2 wants to use the AxiOM library, and the XML Security library wants to use DOM.  For that matter, inside our products we have occasions where we want to use neither of these.  All of this, right now, means enormous conversion costs to switch between tree models.  At least in theory, working with gXML will, over time, eliminate these conversion bottlenecks.  That will mean enormous performance improvements for our customers, and for yours.  And that’s just the beginning.

If you’re going to it, we’re presenting a paper at the upcoming 2010 Balisage Conference, and we’d love to see you there.

Posted in Announcement | Leave a comment