Migrating to gXML – episode 1

A Simple Conversion?

I’ve been spending a fair amount of time migrating the existing Apache XML Security library to use gXML, as a way to validate the APIs, and because I think it will eventually be quite useful.  This has given me plenty of opportunities to notice what is involved in such a migration. The following explores one particularly simply method, to see what I’ve learned from it, and code like it.

I’ve made any number of extremely rote changes.  An example of code before conversion:

public static Element getNextElement(Node el) {
    while ((el!=null) && (el.getNodeType()!=Node.ELEMENT_NODE)) {
      el=el.getNextSibling();
    }
    return (Element)el;
}

And the same code after:

public static <N> N getNextElement(Model<N> model, N el) {
    while ((el!=null) && (model.getNodeKind(el) != NodeKind.ELEMENT)) {
       el=model.getNextSiblingElement(el);
    }
    return el;
}

The above before and after snippets make the changes look really simple, and rote, but they expose some of the core tricks involved in migrating existing code to gXML.

Where Does Model<N> Come From?

The Model interface, of course, is core to what makes gXML work.  It appears above as the first parameter to the rewritten method. Yet, where does it come from?  The simplistic answer is that I just pass that instance in.  That means that every method that calls the above code needs to be parameterized with <N>, and add a parameter for Model, and every method that calls those, and so on, and so on…. Pretty soon, I’ve signed up to convert the entire library to use gXML, and I won’t know if it works until I’ve converted everything.  Worse yet, I will have broken all of the existing clients, and I will have rewritten all of the unit test cases.  Further, in rewriting all of the test cases and the code it tests I’ve potentially introduced subtle changes in behavior; who is to say whether or not I’ve introduced a bug in the code?  Clearly, this isn’t the way forward.  Seems like it is essential to do the migration as incrementally as possible, preserve all existing test cases unmodified, and even continue to work with all existing clients.

I solved the above conundrum in a way that seems simple, in retrospect.  Rather than remove the old method, I merely deprecated it, and rewrote it as a one-line method to call the new implementation. Since I know that the old method uses DOM, I can create a utility method somewhere in the code base that returns me a singleton implementation of the Model class for DOM, and pass that as the first parameter.  I called this singleton class the “XmlContext” for the library.  The new version of the now deprecated method looks like this:

public static Element getNextElement(Node el) {
    return (Element) getNextElement(XmlContext.getDomModel(), el);
}

One part of the beauty of the design of gXML shows up here.  Since the Model implementation is, by definition, stateless, there are no concurrency concerns, and no statefulness concerns.  In short, no subtle complexity to worry about.  Future clients, using the gXML enabled version of the library, will pass in the correct Model instance, but for backward compatibility, I know there is exactly one instance of one Model class that I need, and consequently, this approach adds little overhead beyond one additional method call to fetch that instance.  Naturally, as the library conversion progresses, the one call to get that Model instance will move higher and higher in the stack, so there will likely only ever be one call to get this Model class per library invocation, even for backwards compatibility, so the overhead truly is minimal.

Of course, I marked this rewrite of the existing method as deprecated.  Now, as I get the chance, within my IDE, as I find a call to the deprecated method, I can look at the deprecation comment I left for myself (“New clients should use….”), follow the instructions there, apply the same trick to the method I’m in, and repeat the process until done.  At every stage, all existing clients will continue to work, and all the unit tests will pass.

Crucial Step: Eliminate Warnings

Well, at least, for the XML Security library, after a few rounds of sizeable migrations, I figured out that I should eliminate compiler warnings in every file I touched.  The code base dates back to before JDK 1.5 and generics, which means it employs the Collection classes without declaring the types in the collection.  By leaving that file with no warnings, the IDE helps me catch all the places where I’ve mixed up Element and <N>.  Otherwise, what I’d see is a test failure, and the point of the failure was some ClassCastException, potentially quite far away from where the actual mistake happened.

Stomping on the warnings many times just meant adding generic parameters to Lists and Maps that had nothing to do with the underlying conversion to gXML.  At some points, it seemed like extra work.  However, once I’d eliminated all those issues though, what remained were the gXML specific issues, and the IDE made it a breeze to fix all those issues.

Shouldn’t I Rewrite That Method?

Before I move on, I want to take a closer look at the example converted method itself.  If you’re familiar with gXML, you’ll notice that the method overlaps with functionality that gXML provides. The gXML NodeNavigator interface provides a navigation method that seems like it does a similar thing, and in fact, that method appears in my rewrite of the function – “getNextElement”.  I read the code carefully, and I see what it does is check whether the current node is null. If so, return that.  Is the current node an element?  If so, return that.  Otherwise, find the next element, if any, that is a sibling of the current node, and return that.  That doesn’t quite align with getNextElement, so there seems to be overlap, but it isn’t immediately obvious.

My next step was to look at how the method was being called. Depending on how the method was called in the first place, perhaps a gXML construct could simply skip calling the method in question? It turns out, for this particular function, it was always being called as follows:

Element el=XMLUtils.getNextElement(element.getFirstChild());

Since the Navigator interface provides a method “getFirstChildElement”, whenever I converted a caller of the method, I replaced the two method invocation above with a single call to the Model interface. After I’d converted enough of the code, I made the method itself unnecessary, and the intent of code is easier to grasp.

Conclusion

The above migration shows a single function with no side effects other than its return value.  This case is the simplest conversion point, and an important one. Among other changes that happened in the code, I was able to piecemeal migrate a large library, and eventually reduce the complexity of the library by taking advantage of the richer capabilities of gXML.

Looking at this simple method sets the stage for tackling more complicated and subtle problems.  So there’s more work to do.  I’ll have to come back and revisit that.

Advertisement
This entry was posted in Migration and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s