Paolo Salvatori and I, had this idea about using AppFabric Cache as a workflow persistence provider. As we’ve been working on this for quite some time now, we’re happy to finally share the work with everyone else thinking about how workflow persistence work, or perhaps thinking of building your own provider.

Developing your own persistence provider is not the simplest task you can take on, but it’s perfectly doable. I was surprised by the absence of persistence provider samples on the web. The only helpful one I could find was the MemoryStore, which helped me get started.

The funny thing with this sample was that I spent most of the time, trying to figure out why it worked, not why it didn’t.

I want to thank Paolo Salvatori, Manu Srivastava and Ruppert Koch from the AppFabric CAT and Product group for helping out and clarifying how the underlying plumbing works. More on that at the end of this post.

What is Workflow Persistence?

Unless we enable persistence, a workflow instance is only stored in memory. If the host goes down, it will take any evidence of your workflow instance with it. The process of persisting a workflow to a durable repository or storage is known as dehydration while restoring it, is generally referred to as rehydration. If you are a BizTalk person, you are probably aware that the persistence store for BizTalk is the BizTalkMsgBoxDb database.

Much like BizTalk, Workflow Foundation comes with a SQL persistence store, – the SqlWorkflowInstanceStoreProvider. However, with WF it’s optional. -Not only to use persistence, but you are also free to choose other providers, or even build one yourself.

You can enable the persistence either through configuration or code. If you’re hosting your workflow in IIS/AppFabric you can even manage your persistence through the IIS Manager.

Why and when would we use persistence?

You might end up using persistence for different reasons, which might also have an impact on which provider you’d choose. As I’m a “BizTalk guy”, I generally think of persistence in terms of 1) A safety-net in case something goes wrong and 2) Resource management, off-loading running instances to prevent memory consumption. In most cases you’re looking for a safe and robust provider to make sure you are not going to lose sensitive data along with the internal state of the workflow instance.

AppFabric Cache might not be your first choice if you are looking for a secure and reliable store provider. As Stephen W. Thomas put it: “Never put anything in the Cache that MUST not be lost – it is a non-transactional cache, not a database”.

However, persistence is also about scalability. If you, for example, are using WF to control the page flow of an ASP.Net application running on multiple web servers, you’d need persistence to make sure one server can continue a workflow which initiated on another server. Furthermore, in any scenarios where workflow services are accessed in a session-less manner, it is also important to persist immediately when the workflows gets idle. Otherwise the second request to the workflow might be re-directed to the other server, while it’s still active in memory on the first server. We can control this behavior using the WorkflowIdleBehavior.TimeToUnload setting in the web-/app.config:

<sqlWorkflowInstanceStore connectionStringName="persistenceStore" />
<serviceMetadata httpGetEnabled="true"/>
<workflowIdle timeToUnload="00:00:00"/>
<serviceDebug includeExceptionDetailInFaults="true"/>

The default value for the property is 1 minute. Unloading a workflow implies that it is also persisted. If TimeToUnload is set to zero the workflow instance is persisted and unloaded immediately after the workflow becomes idle. Setting TimeToUnload to MaxValue() effectively disables the unload operation. Idle workflow instances are never unloaded.

In some cases, for example when using WF to control the UI of your application, , you may consider using a less reliable provider. In such cases storing your workflow state in AppFabric Cache could be a good fit. In fact I would not be surprised if Microsoft shipped an AppFabric Cache provider in a near future.

But I won’t lie to you, – I think AppFabric Cache is super cool, and would make up any excuse to play with it!

What is correlation?

First lets have a look at the sample workflow. It’s a very simple sample to simulate a process of working with some sort of document form. The user may save the document, to come back later to continue working on its content.

  1. The client calls the workflow service through the CreateDocument method. This will cause the runtime to create a new instance of the workflow.
  2. After receiving the new Document, the workflow assigns it to a local variable.
  3. As the response is sent back to the client, the workflow initializes the correlation on the Document.Id (string)
  4. While the workflow waits for the client to send an updated document it will become idle, and therefore persisted to the AppFabric Cache store.
  5. The client invokes the UpdateDocument method. As the workflow instance is not in memory, the runtime will ask the persistence provider for the serialized workflow and will then rehydrate it.
  6. The workflow instance will perform the following actions:
      • update the local document variable;
      • return the response;
      • persist its internal state,
      • continue to listen for updates until the user submits “I’m done”. 

The only “complex” part of this workflow is making the second call (and all subsequent calls) to be “routed” to the right workflow instance. Keep in mind that there might be any number of concurrent documents being processed in the system. This is solved using the WF Correlation and in particular Content-Based Correlation.

If you’re a BizTalk guy, this is nothing new; however when I teach BizTalk classes people seems to have a bit of trouble to understand the concept. A correlation set is basically a “primary key” for the instance of the workflow. The id is created on the first call to the workflow, and needs to be passed on to the runtime for all subsequent calls. Using WF, this is very simple, all you need to do is to set the CorrelationInitializers for instance a SendActivity:

This will cause the Workflow runtime to create the instanceId (Guid), and add it to the binding context of the client. This way the instanceId is passed along with the soap header.  Although this might work fine in most situations, there are some limitations. -If the client does not have the instanceId, as when different clients are interacting with the same workflow, this won’t work. You are therefore better off using the Content based correlation type; Query correlation initalizer. For more information about the different kinds of correlation that WF provides, have a look at Paolos post.

This correlation type lets you define the id of your workflow, based on the content of the message. But keep in mind it needs to be unique.

So how does a persistence provider work?

The simple answer is: It serializes the workflow along with its metadata to a store. The persistence provider could use any durable media such as a file, database or queue. The AppFabric Cache does not fall in the category of being a durable media, but we will ignore Stephens lame warnings for now, and just concentrate on the sweet coolness of AppFabric Cache.

To begin with, a persistence provider needs to inherit from InstanceStore (System.Runtime.DurableInstancing), which is an abstract class with some operations you need to overload. The most important one is the BeginTryCommand. This is a universal operation which will be called from the runtime. With it comes an InstancePersistenceCommand parameter which tells you what the runtime expects you to do. This can be any of the following:

(there are more commands, but this will do for now)

If the last SaveWorkflowCommand has the CompleteInstance set to true, this indicates the workflow is done. In my case, this is where I clean up the cache, even though the cache will eventually clean itself up, as I add to the cache using the timeout parameter. You can set this yourself in the web.config.

<bLogicalPersistenceStore cacheName="bLogical" timeout="00:05:00"/>
<serviceMetadata httpGetEnabled="true"/>
<workflowIdle timeToUnload="00:00:01"/>
<serviceDebug includeExceptionDetailInFaults="true"/>

The LoadWorkflowByInstanceKeyCommand is where I got stuck. As I said in the beginning, -it worked, I just couldn’t figure out why. Below is the code in my LoadWorkflowByInstanceKey method:

private IAsyncResult LoadWorkflowByInstanceKey(InstancePersistenceContext context, 
LoadWorkflowByInstanceKeyCommand command,
TimeSpan timeout,
AsyncCallback callback,
object state)

Key key = _cacheHelper.GetKey(command.LookupInstanceKey);
Instance instance = _cacheHelper.GetInstance(key.Instance);

I’m loading the workflow using the LookupInstanceKey value, which is a Guid. Where did that come from? I was expecting my correlation key, – the Document.Id. I was starting to believe the correlation key was stored (together with the LookupInstanceKey) somewhere else. I started to trace the message to see if there was anything hidden in the header, but there wasn’t. I tested different client, collaborating on the same workflow instance, and that worked too. I even set up an environment with multiple servers -AND IT STILL WORKED! How was this possible?

As I was losing sleep over this, I turned my faith to Paolo Salvatori, hoping he could share some light on this. Eventually, Manu Srivastava and Ruppert Koch explained how the magic works. It turns out the LookupInstanceKey is a 128-bit hash of the actual content based key.

Manu Srivastava:

“The Hash == InstanceKey.Value == CorrelationKey == LookupKey. These terms all mean the same thing. InstanceKey.Value contains the value of the hash, which is of type GUID and represents the correlation / lookup key []”

“It is the responsibility of the Provider to store the Workflow Instance, the Correlation Keys *and* the mapping between the two. The LoadWorkflowByInstanceKeyCommand has a LookupKey as an argument. This is the correlation key. This is the key the custom provider implementation must use to identify and then return the Workflow Instance. The hashing algorithm used is irrelevant to the implementation of the Provider. The Provider only has to worry about storing the InstanceKey.Value and retrieving an instance via the InstanceKey.Value. The transformation between Document.Id and the InstanceKey is a WF runtime detail that you do not need to worry about for your implementation; thus, you do not need to worry about the hashing algorithm itself. When you persist an instance, save the InstanceKey.Value and its mapping to InstanceId. When you load an instance, use the LookupKey to find the correct InstanceId and return the WorkflowInstance. That's it. You don't need to go from Document.ID to hash or vice-versa…thats handled at the WF Runtime Layer. Just focus on saving InstanceKey.Values and loading by LookupKey. =) InstanceKey.Value == LookupKey in this particular scenario”

In other words: – Concentrate on the problem, and leave the plumbing to us! -Point taken.

Running the sample

Download and install:

Before running the sample, you need to properly configure the environment and in particular you have to create the cache used by the custom Persistence Provider.

To accomplish this task, you can perform the following steps:

1. Download the Windows Server AppFabric.

2. Read Scott Hanselman’s post as to how to set it up.

3. Open Caching Administration Windows PowerShell through the Start menu, and run the following commands:

  • Start-CacheCluster
  • New-Cache bLogical

4. Download and run the sample the sample.

This is probably a good time to point out, that the sample provided with this post is not production ready. However, If you want to take it further, I’d be happy to help out!(</disclaimer>)

Again, many thanks to Paolo Salvatori, Manu Srivastava and Ruppert Koch from the AppFabric CAT and Product group.