Sunday, December 20, 2009

JBPM Migrator (Part 2)

My previous post discussed the JBPM migrator, a small project we wrote to handle the migration of JBPM process instances. Since that post, we had several requests to open-source the code. We managed to get the code to the JBPM team in late August.

The JBoss team have taken our code and implemented it in the 4.2 version of the
JBPM release. The developer responsible for doing this made a number of significant changes (this is to be expected because we developed the migrator for version 3.2.2 of JBPM), so much so that it really doesn't resemble the work that we originally did. Neverthesless, JBPM 4.2 is now capable of performing migrations!

A number of organizations (ourselves included) are still using version 3.2.2 of JBPM and are asking to get their hands on the original version of the migrator. To address this, I have renamed the project as jbpm-instance-migrator and published it on Google Code under the LGPL license.

At this point, I would ask anyone who is interested in this project to download the code and give it a try. Please submit any issues, requests, or questions you may have to the project (not to this blog). Your feedback is welcome.

Sunday, June 21, 2009

JBPM Migrator


JBPM Migrator

Overview

We are using the JBPM workflow library (version 3.2.2) on a project here at Intelliware. After some analysis, we chose JBPM as our process modelling tool because it was open source and it was easy to integrate into our technology stack (Java 4, Hibernate for persistence).

Over time, a process definition needs to change. Usually these changes reflect new business requirements, but they can also be related to a bug fix or an improvement in the existing process. So when we release a new version of a software product, we may need to update the older process instances in the database. Rather than provide a mechanism to migrate process instances, the JBPM library supports multiple versions of a process definition simultaneously:

Process instances always execute to the process definition that they are started in. But JBPM allows for multiple process definitions of the same name to coexist in the database. So typically, a process instance is started in the latest version available at that time and it will keep on executing in that same process definition for its complete lifetime. When a newer version is deployed, newly created instances will be started in the newest version, while older process instances keep on executing in the older process definitions.1

This wasn't very appealing to us. Our application has processes that can be resumed at any point in the future (potentially years later) so by following the JBPM prescribed approach our developers would have to support - and the QA folks would have to test - outdated process instances for years to come.

What we wanted was the ability to migrate outdated process instances to the current process definition. Indeed, the JBPM documentation addresses this approach:

An alternative approach to changing process definitions might be to convert the executions to a new process definition. Please take into account that this is not trivial due to the long-lived nature of business processes. Currently, this is an experimental area so for which there are not yet much out-of-the-box support.

As you know there is a clear distinction between process definition data, process instance data (the runtime data) and the logging data. With this approach, you create a separate new process definition in the JBPM database (by e.g. deploying a new version of the same process). Then the runtime information is converted to the new process definition. This might involve a translation cause tokens in the old process might be pointing to nodes that have been removed in the new version. So only new data is created in the database. But one execution of a process is spread over two process instance objects...
JBPM doesn't provide a tool to do this, but the code is open source and well documented, so we build a jbpm-migrator ourselves.

The Migrator

Given an old process instance, the migrator is responsible for transferring data to the latest process instance. The migrator transfers:


  1. All of the tokens. This is facilitated through the use of mappings .


  2. All persistent and transient variables.


  3. It adds a migration memo (a String in the persistent variables map) to the new process instance which records info about the migration (the current date, the old process definition version#, the old process instance id , etc).
The migration is performed recursively, so each sub-process is migrated according to the above steps.

Mapping Token Nodes

One of the key challenges when migrating a process instance is the renaming or removal of wait state nodes. Wait state nodes (the green boxes in the diagram below) are where tokens reside when a process instance is persisted. Determining where a token should be placed is facilitated through a migration. A Migration contains a map that tells the Migrator where to put a token from a deprecated wait state node in the current process. Consider the following three versions of a Process called 'Application':





















For the Application Process, two migrations would be written2:


  1. Migration #1 (maps tokens from version #1 to version #2):
    {'init' => 'start', 'invalid' => 'Requires Review', 'end' => 'application completed'}


  2. Migration #2 (maps tokens from version #2 to version #3):
    {'start' => 'application received'}
Our migrator takes these individual migration maps and creates a composite map. With the composite map, the migrator can map a token from any outdated wait-state node directly to the appropriate node in the current process. The composite map of these two migrations would be written as:
{'init' => 'application received', 'start' => 'application received', 'invalid' => 'Requires Review', 'end' => 'application completed'}
Note that the map only needs to explain what to do with tokens on deprecated nodes (e.g. init, start, invalid, and end). No mapping is required for non-deprecated nodes (e.g. managerial audit, application completed, and Requires review). By default, if no mapping exists for a wait state node (i.e. it is not deprecated) the migrator will attempt to move the token to a node with the same name in the new version.

The example I am using includes a discrete migration for each version of the process but this is not always required. Depending on the changes being made to the Process Definition, it is possible that the developer will not be required to include a migration at all. This is a good thing. It means that we don't have to write a migration for every single process definition change and there is almost no configuration required on behalf of the developer.

But there is a cost. Deprecated nodes can never be used in future definitions of your process. So with our 'Application' Process, the init, start, invalid, and end nodes can never be used again in the process definition (as wait states). Doing so would break the migrator.

Defining a Migration

Migrations are written as Java classes. The class must implement the Migration interface, it must not be abstract, and it must contain a default constructor. The Migration interface declares one method that must be implemented:
public StateNodeMap createNodeMap();
Here is how you would express the first migration for 'Application' Process Definition example:
public class ApplicationProcessMigration001 implements Migration{
public StateNodeMap createNodeMap() {
return new StateNodeMap(new String[][]{
{"init", "start"}, {"invalid", "Requires Review"}, {"end", "application completed"}
});
}
}
Defining a Migrator

How do we create the migrator and use it to perform a migration? Like this:
Migrator migrator = new Migrator(“ApplicationProcess”, jbpmContext, “com.foobar.ApplicationProcessMigration”);
ProcessInstance newProcess = migrator.migrate(oldProcess);
The parameters used to create the Migrator instance are:


  1. The name of the Process Definition that it will be migrating.


  2. A JbpmContext instance. The migrator requires this to look up the latest Process Definition.


  3. The Migration base class name. The migrator assumes that your migrations use the pattern package.ClassName{migration#}. For the base Class name “com.foobar.ApplicationProcessMigration”, the migrator will attempt to load and instantiate classes named “com.foobar.ApplicationProcessMigration001”, “com.foobar.ApplicationProcessMigration002”, etc, until it can’t find any valid classes.
The third point is another example of convention over configuration, resulting in less maintenance and tedium on behalf of the developer.

Unit and Integration Testing

I'm putting this section last, but it was one of our top concerns when considering an approach to migrations. We debated a number of approaches to testing and most of them were deemed to be too complex and error prone.

We already unit tested our JBPM process definitions to make sure that transitions point to valid nodes and that all actions declared in the process were available on the Classpath. With regards to the migrations, we have a base test that asserts that:


  1. A developer has not introduced a deprecated node into the current process definition.


  2. All current nodes in the composite map exist in the process definition.


  3. All current nodes in the composite map are valid wait state nodes.
Our application is deployed several times a week to a test environment where QA folks test functionality and validate stories. This provides a great opportunity to discover problems with our application early, and the migration code is no exception. We write migrations throughout the entire development cycle (not just when we release) to make sure, so if a QA person loads up an outdated process from from last week, the migrator will be run. This gives us a chance to find runtime bugs that the unit tests can't locate.




  1. http://docs.jboss.com/jbpm/v3.2/userguide/html/jpdl.html#processversioning
  2. I'm using the Ruby literal syntax for a Hash because there isn't one in Java.