A long time ago, applications running on Windows Server were simply EXEs, where developers wrote a lot of custom code on top of the Operating System (OS) to interact with clients and other applications over the network. COM+ and Microsoft Transaction Server (MTS) were added to make it easier to write and host server-side applications and Internet Information Server (IIS) was added to make it easier to write and host Web applications. Application logic was typically written in native code and/or in scripts (CGI, ASP) back then.

 

In 2002 Microsoft shipped .NET 1.0 and IIS was extended to support ASP.NET enabling managed code Web applications to be hosted on Windows Server. In 2006, with .NET Framework 3.0, Microsoft released two new programming models designed to simplify development of application logic, namely: Windows Communication Foundation (WCF) and Windows Workflow Foundation (WF). WCF enables service-orientation and interoperability and WF enables authoring and execution of long-running workflows. IIS was refactored in Windows Server 2008/Vista and WAS (Windows process Activation Service) was born to enable HTTP and non-HTTP activation of application hosting worker processes (w3wp.exe, also known as AppPool worker) even when the full IIS is not installed. While Windows Server 2008 and .NET 3.0 provided basic administration tools for applications written using WCF and WF programming models, there wasn’t out-of-the-box support for hosting long-running workflows (one had to self-host workflows or build a solution around WAS to ensure reliability and timer activation).

In the upcoming Windows Server AppFabric release we are evolving the application server capabilities in Windows Server to introduce enhanced application hosting and administration tooling (previously codename “Dublin”) for services written using WCF and WF programming models, and providing distributed cache capabilities (previously codename “Velocity”) for .NET applications. The high-level architecture overview of AppFabric is available at http://msdn.microsoft.com/en-us/library/ee677374.aspx.

 

In parallel, we were also working to elastically scale .NET applications and make them highly available, using commodity hardware. Out of this effort came the distributed in-memory cache (formerly known as Project “Velocity”) which helps cache frequently used data in-memory across multiple machines thereby increasing the performance and scale of these applications. In addition it can elastically scale up and down and replicates data to increase the availability of these applications. These capabilities make the distributed cache an ideal option to cache reference data such as catalogs, profile etc. and session data such as shopping cart.

 

In this blog post we go deeper into the AppFabric system and its inner workings.

Architecture Overview

The diagram below provides a high-level overview of the AppFabric system.

 

%u00b7         Apps are deployed into a farm of AppFabric servers that share Workflow Instance Stores and Monitoring Stores (Databases).

%u00b7         Distributed Cache provides unified data cache view to Apps while distributing and replicating data across multiple machines

Functional View

The diagram below illustrates key components of the AppFabric system and their functions from a single-machine view (one AppFabric Server view) in a typical setup.

Deployment and Configuration

Deploying and configuring an AppFabric application is as easy as deploying and configuring a Web application.

1.       An Administrator deploys and configures a Web application containing WCF and WF services.

a.       The Web application is developed using Visual Studio, and developers can easily add WCF services and WF workflow services to the application.

b.      The Web application can be packaged and deployed using MSDeploy just like any other Web application targeting IIS/WAS environment. MSDeploy is integrated with Visual Studio and IIS Manager and provides command-line scripting.

c.       AppFabric provides configuration tools for WCF and WF services via IIS Manager modules and PowerShell command-lets which enable tasks like setting application monitoring levels and configuring workflow persistence.

d.      Application artifacts are copied to a directory together with web.config and an <application> entry in %SystemRoot%\system32\inetsrv\config\applicationHost.config.

e.      WAS watches applicationHost.config and upon registering an <application> entry, notifies transport listeners and starts waiting for message traffic addressed to the application.

Activation

For WCF and WF services, AppFabric supports both on-demand activation and auto-start activation (Windows 7 and Windows Server 2008 R2). On-demand activation improves resource utilization at the price of the first-request response time. Auto-start activation improves the first-request response time at the price of resource utilization.

2.       A message arrives and is dispatched to a code service or a workflow service for processing.

a.       When a first message arrives to the application via HTTP or via non-HTTP transport listeners, WAS activates a w3wp.exe worker process corresponding to the AppPool that the application is associated with.

b.      CLR AppDomains are created inside the w3wp.exe for each application.

c.       A ServiceHost object is created for each code service in the AppDomain.

d.      A WorkflowServiceHost object is created for each workflow service in the AppDomain.

e.      Both the ServiceHost and WorkflowServiceHost create endpoints and dispatch the received message for processing.

                                                               i.      Based on the instancing model declared on the service, the ServiceHost either creates an instance of the service or finds an existing in-memory instance and invokes the method corresponding to the received message.

                                                             ii.      Based on the message, WorkflowServiceHost either creates a new workflow instance or loads an existing workflow instance from the Instance Store and transfers execution to the workflow instance.

f.        After activation, subsequent messages are dispatched by WAS and transport listeners directly to endpoints established by ServiceHosts and WorkflowServiceHosts during activation.

g.       If the Web application is configured for auto-start activation, corresponding w3wp.exe process, AppDomain, ServiceHost and WorkflowServiceHost objects are created whenever WAS starts.

Monitoring and Troubleshooting

AppFabric provides visibility into running WCF and WF services in terms of 1) overall statistics such as # of errors, average response time per service, etc. and 2) individual events correlated across deployed services, messages and workflow activities.

3.       The application emits WCF and WF monitoring and troubleshooting events and the Event Collector Service uploads events to a Monitoring Store.

a.       Based on the monitoring configuration of the application and individual services, the WCF and WF runtimes emit events into an ETW (Event Tracing for Windows) session established by the Event Collector Service running on each machine.

b.      Reading monitoring store information from the application’s monitoring configuration, the Event Collector Service bulk-uploads events to the corresponding Monitoring Store.

c.       AppFabric provides an out-of-the-box SQL based Monitoring Store, with support for other database technology via an extensible provider model.

d.      Upon uploading to the store, events are shredded into type-specific tables (e.g. WCF MessageReceived event), and views are automatically updated. The views are designed to offer dashboard-like summaries as well as event correlation flows (such as a service-to-workflow-to-service flow of messages).

4.       An Administrator retrieves monitoring and troubleshooting information from the materialized views in the Monitoring Store.

Workflow Instance Persistence and Management

AppFabric enables long-running workflows by 1) saving workflow state in an instance store and 2) exposing administrative commands such as suspend, resume, terminate on long-running workflow instances.

5.       At some point a Workflow Service that is processing the message will reach a natural idle point (such as a Delay activity or a Receive activity in a workflow), which will trigger the WorkflowServiceHost to save workflow state in the Instance Store and unload the Workflow Service from memory.

a.       AppFabric provides an out-of-the-box SQL based Workflow Instance Store, with support for other database technology via an extensible provider model.

6.       An Administrator queries the Instance Store to retrieve a list of in-progress workflow instances and their status (such as Running, Idle, Suspended).

7.       An Administrator issues commands such as Suspend, Resume, Terminate against one or more workflow instances.

a.       Commands are written to the Instance Store.

8.       The Workflow Management Service monitors all Instance Stores in the system and looks for instance commands.

9.       The Workflow Management Service issues instance commands on the appropriate machine against the Instance Control Endpoint, which is automatically configured by the system on each WorkflowServiceHost.

a.       The WorkflowServiceHost executes a control command by either resuming workflow execution from last saved persist point, suspending the workflow in its current state or terminating the workflow and removing its state from the Instance Store.

b.      Results of the command execution are visible to the Administrator.

10.   The Workflow Management Service looks for timers and orphaned workflow instances.

a.       Upon reaching a Delay activity, workflow state will be saved along with an ActivateMeAt property.

                                                               i.      The Workflow Management Service looks for ActiveMeAt properties that are close to the current time and activates such instances.

b.      The AppDomain or w3wp.exe process can crash or recycle for various reasons such as unhandled exceptions, reconfiguration, memory leaks, resource contention among applications, etc. If a crash or recycle occurs while a workflow instance is in-memory doing work, it needs to be resumed from its last persist point.

                                                               i.      When the WorkflowServiceHost creates or loads a workflow instance it places an expiring lock on the instance. If the instance continues to work in-memory, the lock is renewed before its expiration.

1.       A locked instance cannot be loaded by the WorkflowServiceHost that does not own the instance.

2.       Workflow state cannot be saved by the WorkflowServiceHost that does not own the instance.

                                                             ii.      The Workflow Management Service looks for instances with an expired lock and considers them orphaned which results in the lock being cleared so that the instance can be loaded again.

11.   The Workflow Management Service resumes timer-due and orphaned workflow instances.

a.       For each workflow instance whose timer is due or whose lock was cleared, the Workflow Management Service calls the Service Management Endpoint to activate (load into memory) the required instance in the WorkflowServiceHost within an AppDomain in w3wp.exe.

b.      Upon activation, the WorkflowServiceHost loads the instance from the Instance Store and continues execution from the last persist point.

Distributed Cache

The cache provides a distributed, in-memory application cache for developing scalable, available, and high-performance applications.

12.   Applications can take advantage of in-memory distributed cache to improve performance, scale and high availability of applications data

a.       Any serializable CLR object created with .NET languages such ASP.NET, WCF or WF can use the distributed cache to Get/Put items from/into the cache.

                                                         i.            Developers use the Microsoft.Data.Caching APIs to interact with the cache service.

b.      Distributed Cache Service runs across multiple machines and forms a tight cluster offering data replication and data consistency across multiple machines.

                                                        i.            The Distributed Cache Service can run on the same machine as the application code or run in a remote dedicated farm of machines.

Logical Hierarchy of AppFabric caching consists of Machine -> Cache Host -> Named Caches -> Regions -> Cache Items -> Objects

 

%u00b7         Machine are servers which run Cache Hosts

%u00b7         Cache Hosts are the physical processes hosting AppFabric Caching instance.

%u00b7         Named Caches are the logical entities which can span across machines

%u00b7         Regions are physically co-located Container of Cache Items and they may be implicit or explicitly created

Cache Item are the key, Payload (Object) and they have associated Tags,  TTL, Timestamps, Version

Deployment Topologies Overview

For scale-out and availability, AppFabric can be setup in a load-balanced farm as illustrated in the following diagram.

1.       Machines have been sysprep’d with AppFabric and MSdeploy has been used to deploy applications to these machines.

a.       MSdeploy allows synchronizing machines to a master machine, so an easy approach could be to MSdeploy app packages to one machine and then sync that machine to others in the farm.

2.       Create, configure and secure persistence and monitoring databases.

a.       Upon installation, Appfabric allows you to create farm-wide shared persistence and monitoring databases. For applications that choose the default persistence and monitoring settings, the system will save workflow state and monitoring information in these shared databases. This simplifies configuration and works for typical scenariso where applications do not require stronger data isolation.

b.      An application can be configured with its own dedicated persistence and/or monitoring databases.

c.       Workflow Management Service is a trusted system service and will require access to all persistence databases.

d.      Event Collector Service is a trusted system service and will require access to all monitoring databases.

3.       Setup a distributed cache cluster. (See http://msdn.microsoft.com/en-us/library/ee790954.aspx for more)

a.       Dedicating separate machines would ensure no memory contention between cache and running applications, failure isolation, data isolation, etc.

b.      For simplicity and uniformity, however you may decide to enable Distributed Cache Service on the application hosting machines in the farm.

4.       Load-balancers and queues are configured to feed messages to machines running across the farm. Load-balancers “spray” messages across the farm and queues would be used by the queue transport listeners to asynchronously retrieve messages from the queue and activate new workflow instances.

Conclusion

The upcoming Windows Server AppFabric release evolves application hosting and administration tooling capabilities in Windows to support services written using WCF and WF programming models. The AppFabric release provides powerful distributed cache capabilities to .NET applications. In this blog post we discussed inner workings of the AppFabric system components and a typical scale-out deployment topology.