Challenges of supporting a high volume production BizTalk environment

by community-syndication | Mar 30, 2010 | BizTalk Community Blogs via Syndication

It's one of the common scenarios, there is a sudden surge in your business and number of transactions increases drastically for a period of time. Modern business solutions are not standalone applications. In a SOA/BPM scenario there will be "n" number of layers (applications) work together to form a business solutions. So, during this sudden surge, if a single layer (application) malfunctions – its going to bring the whole end to end process in trouble.

We were hit with a similar situation recently.

Quick overview of our overall solution:

Our top level over simplified architecture looks like this

In the above picture, BPM workflow solution and Business Service (Composite service calling multiple underlying integration services) were build with Microsoft BizTalk Server. In our case the main problem area is database and concurrency. Business users basically perform 2 sets of operations.

1. Validate Applications, and

2. Commit Applications

They scan applications and feed into the system, the system goes through full validation rules and give the result back. If there are errors, the business users correct them and feed back into the system and they cycle through this loop until all the errors are resolved. Finally after resolving all the errors they perform the commit operation. Due to the complexity of the solution if things goes wrong during commit, then it will end up in manual (time consuming!!) operation to complete the job. Its efficient if we reduce the number of manual processing.

During validate cycles the calls to database are purely read only and even at very high volume levels there are no serious issues (also because lot of information are cached in memory). But the commit calls persist information into the database and some of the hot tables got few million records and can't deal with simultaneous requests. The biggest problem is, if it gets into trouble it effects everybody, as shown in the picture, there are some external applications like web sites relies on the availability of the database. They get effected as well.

Overview of the Business Services architecture

The below picture shows the architecture of our Business Services layer. The responsibility of the business services is to create a usable abstract composite business services layer for the end consumers, so end consumers don't need to understand the interaction logic with underlying systems. It takes care of various thing like the order (sequencing) in which the underlying integration services are called, error handling across various calls, suppressing duplicate errors returned from various calls, single unified contract to interact with etc.

Now getting to the technical bits?

The sub orchestrations are designed more for modularity reasons and are called using the "Call Orchestration" shape from the two main orchestrations "Validate" and "Commit". All the orchestrations are hosted using a single BizTalk Host and one host instance per server is created (we got 6 BizTalk Servers. The BizTalk servers hosts other BizTalk applications as well..).

The important thing to note here is, all the orchestrations need to live within a single host due to their tight coupling.

This is how its been tightly coupled:

1. The called sub orchestrations needs to live in the same host as calling orchestrations ("Validate" and "Commit") since you can't specify a host for sub-orchestrations (orchestrations started using a call orchestration shape runs under the same host/host instance as the called orchestration), and

2. Both the top level orchestrations share the sub orchestrations.

So the only option is to configure and run all the orchestrations in a single host. The other interesting factor here is, the send ports are also not aware of the "Validate"/"Commit" differentiation. There is only one send port per end point, which handles both validate and commit calls. This is again the constraint created by the way sub-orchestrations are designed, there is only one logical port mapped to one physical port for both the "Validate" and "Commit" operations.

This is not a problem in majority of the cases. But as I mentioned earlier in this post, in our case the multiple concurrent "validate" calls are absolutely fine, you can thrash the system with such calls (Thanks to improved memory cache stores). But, when it comes to "commit" the underlying database is seriously suffering with concurrent calls. This just shows the importance of understanding the limitations of the underlying systems, and the importance of non-functional SLA requirements when designing a SOA/BPM system which relies on diverse applications in the organisation. There is nothing wrong with this applications, its designed for easy maintenance, not keeping operational constraints in mind.

I can think of two different approaches which would have given the opportunity to separate "Validate" and "Commit" operations and setting different throttling conditions.

1. By not reusing the sub-orchestrations, in which case both "Validate" and "Commit" can live in two different host with completely isolated throttling settings.

2. By not reusing the same send port for both "Validate" and "Commit", which could have given the opportunity to control it at send port level by configuring different host for send ports with different throttling settings.

BizTalk server is designed for high throughput and its going to push the messages to underlying systems (and databases) as quickly as it can. It does provide throttling, it understands underlying systems are suffering and will try to control it. But the throttling kick off is too late for us, the system has already gone to unrecoverable mode.

Production environment Constraints

First of all there are some very restricted conditions for us to work with

1. The environment is frozen. Next few days is our core business days in the year and we are not allowed to make changes (serious financial implications!!).

2. Time is against us. We need a working solution ASAP.

3. We don't want to slow down the validate process, this will keep the business users idle. In fact there are lot of temp business users hired for this period.

Understanding the pattern and requirement:

First we identified the pattern under which the underlying database suffers. We roughly need to process 700 applications per hour. The database is very happy and capable of processing this volume at only 60% resource utilization if we feed them at consistent rate (roughly 1 application every 5 seconds). But if we accidentally receive 100 applications concurrently, this will bring the database to stand still and block all further calls. Resource utilization will be nearly 100% and it will take a while for the database to recover. By that time we would got 100's of applications already failed and routed to manual processing.

Secondly, we should have the ability to stop and start processing commit request in a very controlled way (as soon as we start seeing issues on the database). The issues might be triggered by other external applications using the same database. Ex: Some part of the external web sites hits the database via the Integrations services.

So, it all comes down to:

1. Parking all the "Commit" (only Commit) application on the Business Services side in the BizTalk MessageBox database (creating a queuing effect. The validate application should flow through without any hold).

2. Releasing "Commit" applications in a controlled way to down line integration services (and into database), with the ability to start/stop/increase volume/decrease volume/manual submit all at runtime with minimal risk.

Finally, the solution:

This is the solution I came up with,

1. Stop the top level "Commit" orchestration via the BizTalk Admin Console. This will solve our first issue of parking the commit application on BizTalk Business Services layer. All the commit orchestration instance will get suspended in a resumable state and reside in BizTalk MessageBox.

2. Use BizTalk's ability to resume the suspended instances. The key thing to note here is you don't need to start your orchestration to resume suspended(resumable) instances. You only want to make sure the orchestrations are in enlisted state.

This solves the issue of ability to submit commit applications manually in a controlled way. But there are quite few challenges here

1. It's a daunting task to sit and do this job manually for hours.

2. High possibility of human error, note in the above picture, you are only 1 click away from terminating the instances.

3. There is a possibility of one more human error, you might resume a completely wrong instance in a wrong tab.

Bare in mind, its production environment.

Tool to resume Suspended(resumable) instances using WMI

All I had to do now is just to automate the manual task of resuming suspended orchestration instances automatically in a controlled way. I just knocked this tool in couple of hours using the BizTalk Servers WMI ability to interact with the BizTalk environment.

Its a very simple tool, this is how it works

1. You select the required Orchestration, and press refresh. It will show how many instances waiting in suspended-resumable state (for this particular orchestration).

2. Then you configure how many instances to submit every time (Ex: Number of Instances to submit = 1)

3. Then you can decided whether to submit it manually (by selecting the "Manual" option and clicking "Resume Instances"), or

4. You can configure to submit it in periodic intervals. Ex: 1 every 3 seconds (by selecting the "Timed" option, configuring the time in milliseconds and clicking "Resume Instances").

5. Once you started your timed submission, instances will be submitted in periodic intervals and you'll have the ability to stop them at any time by just clicking "Stop Timer".

Certainly I'm not recommending this as a long term solution, but it helped us to get out of the scenario. This solution requires constant human monitoring, which is ok for our peak window time (few days), but not sustainable to do it throughout the year. But I believe something similar is required built into the product to support scenarios like this, which looks very common for businesses.

Difficulties in understanding BizTalk Throttling mechanism:

Anyone who worked with BizTalk long enough will agree, throttling the BizTalk environment is not for faint hearted ones. Simply because the sheer number of tuning parameters BizTalk provides to the end users. Also, because its not clear to everyone each and every settings. Here are the few parameters you can tune

1. Registry setting to alter min/max IO and Worker Threads for host instances (you need to do it on each machine)

2. Host throttling settings for each host, which affects all the host instances for that host. You will see few options for threads here.

3. adm_services table in BizTalk management database where you configure polling interval, and threads for EPM, XLANG etc..which effects the whole environment.

4. You also got individual adapter settings which effect to certain extend. Example: Polling interval in SQL, MQSeries adapter etc.

After talking to many people, its very clear to me. They simply try various combinations and finally settle for one that works for their requirement. Most of the time they really don't know the implications which is bad.

This complexity also puts the constraint, its nearly not practical to tune them in production if your business requirements are changing every day. Example: In our case, we'll know the volume we are going to process every day in the beginning of the day (when all applications are received in post). So we need to set up our system for that daily volume. If its 500 applications, we don't need to do anything system will automatically coupe. If its 3000 applications then we need to modify certain settings. BizTalk server is a self tuning engine, it caters automatically for varying situation ? majority of the time. But the problem is BizTalk can't be aware of all its underlying systems ability!!

Throttling based on system resources or functional requirements?

BizTalk got a very good throttling mechanism (I'm only saying its bit difficult to understand it 🙂 ), but the throttling is based purely on the system resources (example: CPU utilization, memory utilization, threads allocation, db connections, spool depth etc etc) of the BizTalk environment. We need to keep in mind BizTalk is a middleware product and it will depend heavily on the performance and ability of the external system to work efficiently.

BizTalk of course provides message publishing throttling, so it slows down automatically when it sees the underlying system is suffering and its host queue length is building up. But its a system decision based on parameters. In some scenarios (like ours) its too late before the system can start throttling.

While researching for a solution to this problem, I have noticed lot of people in the same boat. They were trying to slow down the BizTalk environment to coupe with the underlying systems ability in various ways. Things like building a controller orchestration, adjusting the MaxWorkerThread settings to control the number of running orchestrations, Enabling ordered delivery on the send port etc.

Suggestion to BizTalk Server Product Team:

At the moment with the current version of BizTalk (2009 R2 or 2010!!) there is no separate mechanism for Message Processing throttling between Orchestrations (Business Processes) and other subscribers (ex: Send ports). In some scenarios business processes may be driven by various external factors and need to be controlled in a much more fine grained way. Also, as explained in this post there may be requirement to change the throttling behaviour in production at runtime quite frequently for varying reasons. Here I'm going to put my fictitious idea 🙂

On the Orchestration Properties window add a new tab called "Controlled", which may look like this

All the instances that are waiting to be processed can have a status something like "Ready to Run (Controlled)". The administrators will still have the ability to Terminate the instances if not required.

The context menu can be extended a bit so, the administrator can start/stop processing in a controlled or normal way easily.

In this approach, we don't need to restart host instances, the configuration changes should take effect on the fly.

Nandri

-Saravana

Enterprise SSO service start failure due to installation of VS2010

by community-syndication | Mar 30, 2010 | BizTalk Community Blogs via Syndication

BizTalk and SSO was running smooth before the installation of VS2010 on my machine. I have a 64-Bit machine with Windows Server 2008 R2 and BTS 2009 installed. When I installed VS2010 my SSO was stopped and when I tried to start I started getting the error below ————————— Services ————————— Windows could not start […]

BizTalk Orchestration Profiler v1.2

by community-syndication | Mar 29, 2010 | BizTalk Community Blogs via Syndication

OrchProfiler has now been updated to support 64 bit systems. Actually this has been available for quite some time now in the Planned releases tab of the codeplex space but i’ve taken the opportunity to give it a quick test drive, updated the version numbers etc and made the new release available.
To use this, you […]

BizTalk 2010 -> Move over BTS2009 R2 Name changed!

by community-syndication | Mar 29, 2010 | BizTalk Community Blogs via Syndication

The other day this landed in my inbox. Being on the TAP program and posting various
pieces of feedback I’ve been updated that BizTalk 2010 is the only name to remember.

Keeps inline with VSNET2010 etc etc, so anything with a 2010 after its name *should*
work with each other. SharePoint 2010 etc.

So far I’ve been playing with the early bits and I’m liking what I’m seeing – copy
and paste functoids in a map!!! (for those of you who don’t know the pain.it’s
pain let me tell you)

So here’s the official blurb

Well done BizTalk Team! Working Hard!

——————————————

BizTalk Server 2010 Name
Change Q&A

Q: Why was the original name for the release BizTalk
Server 2009 R2?

BizTalk Server 2009 R2 was planned to be a focused release
to deliver support for Windows Server 2008 R2, SQL Server 2008 R2 and Visual Studio
2010. Aligning BizTalk releases to core server platform releases is very important
for our customers. Hence our original plan was to name the release as BizTalk Server
2009 R2.

Q: Why did Microsoft decide to change the name for BizTalk
Server 2009 R2 to BizTalk Server 2010?

Over the past
year we got lot of feedback from our key customers and decided to incorporate few
key asks from our customers in this release. Based on customer value we are delivering
and positive feedback we are getting from our early adopter customers we feel the
release has transitioned from minor release (BizTalk Server 2009 R2) to a major release
(BizTalk Server 2010).

Following is list of key capabilities we have added to
the release

1. Enhanced
trading partner management that will enable
our customers to manage complex B2B relationships with ease

2. Increase
productivity through enhanced BizTalk Mapper. These enhancements are critical
in increasing productivity in both EAI and B2B solutions; and a favorite feature of
our customers.

3. Enable
secure data transfer across business partners
with FTPS adapter

U
Updated adapters for SAP 7, Oracle eBusiness Suite 12.1, SharePoint 2010 and SQL
Server 2008 R2

Improved and simplified management with updated
System Center management pack

Simplified management through single dashboard which
enables IT Pros to backup and restore BizTalk configuration

Enhanced performance tuning capabilities at
Host and Host Instance level

8. Continued
innovation in RFID Space with out of box event
filtering and delivery of RFID events

Q: Is there any additional benefit to customers with
name change to BizTalk Server 2010?

In addition to all the great value the release provides,
customers will benefit from support window being reset to 10 years (5 years mainstream
and 5 years extended support). This highlights Microsoft’s long term commitment to
BizTalk Server product.

BizTalk Server 2009 R2 renamed to BizTalk Server 2010

by community-syndication | Mar 26, 2010 | BizTalk Community Blogs via Syndication

On March 22 2010 Microsoft announced to rename the BizTalk 2009 R2 minor release to BizTalk Server 2010 major version. According to Microsoft the BizTalk 2009 R2 version was to support Windows Server 2008 R2, SQL Server 2008 R2 and VS 2010 but now more functionality has been added to the product on the basis […]

BizTalk File Backup

by community-syndication | Mar 26, 2010 | BizTalk Community Blogs via Syndication

Business or systemconstraints may force you to leverage BizTalk for File backups. BizTalk can actually handle this pretty well if it’s planned out carefully.
Here is a diagram of the overall design:

One way to implement this is to set up a receive port and locationto poll the original drop location of the files. This receivelocation will […]

TechDays Sweden 2010

by community-syndication | Mar 25, 2010 | BizTalk Community Blogs via Syndication

That should read I was a speaker at the Swedish TechDays, since I am writing this post after the fact, although I’m backdating the post a bit to more closely match in time when I was supposed to have published it.

I did a session titled “The Future Roadmap of BizTalk Server” and I’m happy to say I drew a full room (not that the room was all that big, but all the same). Future in this case meant I covered both current and coming technology. The reasoning behind it was that a lot of people probably wouldn’t have BizTalk Server 2009. This turned out to be true since it was a clear minority of the people in the room that raised their hands when asked if they had had previous contact with BizTalk Server 2009. I covered ESB Toolkit 2.0 and Itinerary processing as well as connecting to Windows Azure Platform Service Bus. I was a little surprised here that as I asked for a raise of hands on how many were familiar with the Service Bus only a couple of hands went up. Either it’s a Swedish mentality thing – to not want to stand out – or there simply aren’t all that many people interested in it, yet. I went on to cover some of the news in 2009 R2 (which I learned had gotten renamed to 2010 the evening after my presentation). I couldn’t demo anything, but am quite satisfied with how the Powerpoint “Demo” I was able to do (yes, I know, you can kill a presentation, and other things, with Powerpoint, but I think I did ok). I also related some of the vision given by the team at PDC in November and subsequent WebCasts about vNext. The recorded session should be up soon at the TechDays site, however it will be in Swedish, so for an international audience it will be useless. The presentation will be availble soon as well, as a pdf.

Because I believe in sharing I’ve decided to make the presentation available through the blogical Downloads section, here. I have myself borrowed small parts and ideas from the presentation from elsewhere as you often do, but almost all slides and images are built from scratch. I’m sharing those under a “if you decide to use parts of it please at least tell me that you liked it and that it was useful to you through commenting this blog post” license. I’ve marked it as final, but it’s not copy protected or in pdf format or anything, it’s the pptx file itself. Enjoy.

Also BizTalk User Group Sweden was represented at TechDays as well.

(Not the best of pictures, with the backlight and all, but there we are From MPN Swedens photostream)

Tellago speaks about Business Intellligence with SQL Server 2008 R2

by community-syndication | Mar 25, 2010 | BizTalk Community Blogs via Syndication

At Tellago , we always try to stay in the frontlines of technology that can enhance our solution development practices. This year we are putting a lot of emphasis on business intelligence and in particular the new set of BI technologies such as Microsoft…(read more)

ESB Toolkit 2.0 Architecture Poster Available

by community-syndication | Mar 24, 2010 | BizTalk Community Blogs via Syndication

To complement the existing series of posters that show various parts of BizTalk’s architecture, there is now one available for the ESB Toolkit.

You can get it here.

Windows Server AppFabric Architecture

by community-syndication | Mar 24, 2010 | BizTalk Community Blogs via Syndication

A long time ago, applications running on Windows Server were simply EXEs, where developers wrote a lot of custom code on top of the Operating System (OS) to interact with clients and other applications over the network. COM+ and Microsoft Transaction Server (MTS) were added to make it easier to write and host server-side applications and Internet Information Server (IIS) was added to make it easier to write and host Web applications. Application logic was typically written in native code and/or in scripts (CGI, ASP) back then.

In 2002 Microsoft shipped .NET 1.0 and IIS was extended to support ASP.NET enabling managed code Web applications to be hosted on Windows Server. In 2006, with .NET Framework 3.0, Microsoft released two new programming models designed to simplify development of application logic, namely: Windows Communication Foundation (WCF) and Windows Workflow Foundation (WF). WCF enables service-orientation and interoperability and WF enables authoring and execution of long-running workflows. IIS was refactored in Windows Server 2008/Vista and WAS (Windows process Activation Service) was born to enable HTTP and non-HTTP activation of application hosting worker processes (w3wp.exe, also known as AppPool worker) even when the full IIS is not installed. While Windows Server 2008 and .NET 3.0 provided basic administration tools for applications written using WCF and WF programming models, there wasn’t out-of-the-box support for hosting long-running workflows (one had to self-host workflows or build a solution around WAS to ensure reliability and timer activation).

In the upcoming Windows Server AppFabric release we are evolving the application server capabilities in Windows Server to introduce enhanced application hosting and administration tooling (previously codename “Dublin”) for services written using WCF and WF programming models, and providing distributed cache capabilities (previously codename “Velocity”) for .NET applications. The high-level architecture overview of AppFabric is available at http://msdn.microsoft.com/en-us/library/ee677374.aspx.

In parallel, we were also working to elastically scale .NET applications and make them highly available, using commodity hardware. Out of this effort came the distributed in-memory cache (formerly known as Project “Velocity”) which helps cache frequently used data in-memory across multiple machines thereby increasing the performance and scale of these applications. In addition it can elastically scale up and down and replicates data to increase the availability of these applications. These capabilities make the distributed cache an ideal option to cache reference data such as catalogs, profile etc. and session data such as shopping cart.

In this blog post we go deeper into the AppFabric system and its inner workings.

Architecture Overview

The diagram below provides a high-level overview of the AppFabric system.

%u00b7 Apps are deployed into a farm of AppFabric servers that share Workflow Instance Stores and Monitoring Stores (Databases).

%u00b7 Distributed Cache provides unified data cache view to Apps while distributing and replicating data across multiple machines

Functional View

The diagram below illustrates key components of the AppFabric system and their functions from a single-machine view (one AppFabric Server view) in a typical setup.

Deployment and Configuration

Deploying and configuring an AppFabric application is as easy as deploying and configuring a Web application.

1. An Administrator deploys and configures a Web application containing WCF and WF services.

a. The Web application is developed using Visual Studio, and developers can easily add WCF services and WF workflow services to the application.

b. The Web application can be packaged and deployed using MSDeploy just like any other Web application targeting IIS/WAS environment. MSDeploy is integrated with Visual Studio and IIS Manager and provides command-line scripting.

c. AppFabric provides configuration tools for WCF and WF services via IIS Manager modules and PowerShell command-lets which enable tasks like setting application monitoring levels and configuring workflow persistence.

d. Application artifacts are copied to a directory together with web.config and an <application> entry in %SystemRoot%\system32\inetsrv\config\applicationHost.config.

e. WAS watches applicationHost.config and upon registering an <application> entry, notifies transport listeners and starts waiting for message traffic addressed to the application.

Activation

For WCF and WF services, AppFabric supports both on-demand activation and auto-start activation (Windows 7 and Windows Server 2008 R2). On-demand activation improves resource utilization at the price of the first-request response time. Auto-start activation improves the first-request response time at the price of resource utilization.

2. A message arrives and is dispatched to a code service or a workflow service for processing.

a. When a first message arrives to the application via HTTP or via non-HTTP transport listeners, WAS activates a w3wp.exe worker process corresponding to the AppPool that the application is associated with.

b. CLR AppDomains are created inside the w3wp.exe for each application.

c. A ServiceHost object is created for each code service in the AppDomain.

d. A WorkflowServiceHost object is created for each workflow service in the AppDomain.

e. Both the ServiceHost and WorkflowServiceHost create endpoints and dispatch the received message for processing.

i. Based on the instancing model declared on the service, the ServiceHost either creates an instance of the service or finds an existing in-memory instance and invokes the method corresponding to the received message.

ii. Based on the message, WorkflowServiceHost either creates a new workflow instance or loads an existing workflow instance from the Instance Store and transfers execution to the workflow instance.

f. After activation, subsequent messages are dispatched by WAS and transport listeners directly to endpoints established by ServiceHosts and WorkflowServiceHosts during activation.

g. If the Web application is configured for auto-start activation, corresponding w3wp.exe process, AppDomain, ServiceHost and WorkflowServiceHost objects are created whenever WAS starts.

Monitoring and Troubleshooting

AppFabric provides visibility into running WCF and WF services in terms of 1) overall statistics such as # of errors, average response time per service, etc. and 2) individual events correlated across deployed services, messages and workflow activities.

3. The application emits WCF and WF monitoring and troubleshooting events and the Event Collector Service uploads events to a Monitoring Store.

a. Based on the monitoring configuration of the application and individual services, the WCF and WF runtimes emit events into an ETW (Event Tracing for Windows) session established by the Event Collector Service running on each machine.

b. Reading monitoring store information from the application’s monitoring configuration, the Event Collector Service bulk-uploads events to the corresponding Monitoring Store.

c. AppFabric provides an out-of-the-box SQL based Monitoring Store, with support for other database technology via an extensible provider model.

d. Upon uploading to the store, events are shredded into type-specific tables (e.g. WCF MessageReceived event), and views are automatically updated. The views are designed to offer dashboard-like summaries as well as event correlation flows (such as a service-to-workflow-to-service flow of messages).

4. An Administrator retrieves monitoring and troubleshooting information from the materialized views in the Monitoring Store.

Workflow Instance Persistence and Management

AppFabric enables long-running workflows by 1) saving workflow state in an instance store and 2) exposing administrative commands such as suspend, resume, terminate on long-running workflow instances.

5. At some point a Workflow Service that is processing the message will reach a natural idle point (such as a Delay activity or a Receive activity in a workflow), which will trigger the WorkflowServiceHost to save workflow state in the Instance Store and unload the Workflow Service from memory.

a. AppFabric provides an out-of-the-box SQL based Workflow Instance Store, with support for other database technology via an extensible provider model.

6. An Administrator queries the Instance Store to retrieve a list of in-progress workflow instances and their status (such as Running, Idle, Suspended).

7. An Administrator issues commands such as Suspend, Resume, Terminate against one or more workflow instances.

a. Commands are written to the Instance Store.

8. The Workflow Management Service monitors all Instance Stores in the system and looks for instance commands.

9. The Workflow Management Service issues instance commands on the appropriate machine against the Instance Control Endpoint, which is automatically configured by the system on each WorkflowServiceHost.

a. The WorkflowServiceHost executes a control command by either resuming workflow execution from last saved persist point, suspending the workflow in its current state or terminating the workflow and removing its state from the Instance Store.

b. Results of the command execution are visible to the Administrator.

10. The Workflow Management Service looks for timers and orphaned workflow instances.

a. Upon reaching a Delay activity, workflow state will be saved along with an ActivateMeAt property.

i. The Workflow Management Service looks for ActiveMeAt properties that are close to the current time and activates such instances.

b. The AppDomain or w3wp.exe process can crash or recycle for various reasons such as unhandled exceptions, reconfiguration, memory leaks, resource contention among applications, etc. If a crash or recycle occurs while a workflow instance is in-memory doing work, it needs to be resumed from its last persist point.

i. When the WorkflowServiceHost creates or loads a workflow instance it places an expiring lock on the instance. If the instance continues to work in-memory, the lock is renewed before its expiration.

1. A locked instance cannot be loaded by the WorkflowServiceHost that does not own the instance.

2. Workflow state cannot be saved by the WorkflowServiceHost that does not own the instance.

ii. The Workflow Management Service looks for instances with an expired lock and considers them orphaned which results in the lock being cleared so that the instance can be loaded again.

11. The Workflow Management Service resumes timer-due and orphaned workflow instances.

a. For each workflow instance whose timer is due or whose lock was cleared, the Workflow Management Service calls the Service Management Endpoint to activate (load into memory) the required instance in the WorkflowServiceHost within an AppDomain in w3wp.exe.

b. Upon activation, the WorkflowServiceHost loads the instance from the Instance Store and continues execution from the last persist point.

Distributed Cache

The cache provides a distributed, in-memory application cache for developing scalable, available, and high-performance applications.

12. Applications can take advantage of in-memory distributed cache to improve performance, scale and high availability of applications data

a. Any serializable CLR object created with .NET languages such ASP.NET, WCF or WF can use the distributed cache to Get/Put items from/into the cache.

i. Developers use the Microsoft.Data.Caching APIs to interact with the cache service.

b. Distributed Cache Service runs across multiple machines and forms a tight cluster offering data replication and data consistency across multiple machines.

i. The Distributed Cache Service can run on the same machine as the application code or run in a remote dedicated farm of machines.

Logical Hierarchy of AppFabric caching consists of Machine -> Cache Host -> Named Caches -> Regions -> Cache Items -> Objects

%u00b7 Machine are servers which run Cache Hosts

%u00b7 Cache Hosts are the physical processes hosting AppFabric Caching instance.

%u00b7 Named Caches are the logical entities which can span across machines

%u00b7 Regions are physically co-located Container of Cache Items and they may be implicit or explicitly created

Cache Item are the key, Payload (Object) and they have associated Tags, TTL, Timestamps, Version

Deployment Topologies Overview

For scale-out and availability, AppFabric can be setup in a load-balanced farm as illustrated in the following diagram.

1. Machines have been sysprep’d with AppFabric and MSdeploy has been used to deploy applications to these machines.

a. MSdeploy allows synchronizing machines to a master machine, so an easy approach could be to MSdeploy app packages to one machine and then sync that machine to others in the farm.

2. Create, configure and secure persistence and monitoring databases.

a. Upon installation, Appfabric allows you to create farm-wide shared persistence and monitoring databases. For applications that choose the default persistence and monitoring settings, the system will save workflow state and monitoring information in these shared databases. This simplifies configuration and works for typical scenariso where applications do not require stronger data isolation.

b. An application can be configured with its own dedicated persistence and/or monitoring databases.

c. Workflow Management Service is a trusted system service and will require access to all persistence databases.

d. Event Collector Service is a trusted system service and will require access to all monitoring databases.

3. Setup a distributed cache cluster. (See http://msdn.microsoft.com/en-us/library/ee790954.aspx for more)

a. Dedicating separate machines would ensure no memory contention between cache and running applications, failure isolation, data isolation, etc.

b. For simplicity and uniformity, however you may decide to enable Distributed Cache Service on the application hosting machines in the farm.

4. Load-balancers and queues are configured to feed messages to machines running across the farm. Load-balancers “spray” messages across the farm and queues would be used by the queue transport listeners to asynchronously retrieve messages from the queue and activate new workflow instances.

Conclusion

The upcoming Windows Server AppFabric release evolves application hosting and administration tooling capabilities in Windows to support services written using WCF and WF programming models. The AppFabric release provides powerful distributed cache capabilities to .NET applications. In this blog post we discussed inner workings of the AppFabric system components and a typical scale-out deployment topology.

« Older Entries

Next Entries »