Securing WCF Services hosted in Windows Server AppFabric with Windows Identity Foundation (WIF)

Securing WCF Services hosted in Windows Server AppFabric with Windows Identity Foundation (WIF)

A key challenge while designing and deploying Web Services is access control and this blog post is the first in a two part series focused on leveraging Windows Identity Foundation (WIF) to secure Services hosted in Windows Server AppFabric. In this first blog post we focus on securing both WCF and Workflow Services and also delve into using ‘delegation’. Delegation is an interesting challenge, especially in scenarios wherein it’s important to maintain a caller’s original identity irrespective of how many Service hops are traversed between the Client and the final Service. In the sequel to this blog, the second one, we will use Active Directory Federation Services 2.0 to manage access control.  

Background

 While this blog posting does not elaborate the fundamentals of access control in great detail, some background is provided for context. Additionally a good set of references are linked to this blog. 

Windows indentity foundation (WIF)

Windows Identity Foundation (WIF) provides the infrastructure for securing both Websites and (WCF) Services using a claims-based approach.  WIF enables .NET developers to externalize identity logic from their application, thereby greatly improving developer productivity. WIF is part of Microsoft’s identity and access management solution built on Active Directory. WIF provides implementation for well accepted user authentication protocols like WS-Federation/WS-Trust and the ‘token’ format (SAML) wherein the identity is a part of the request envelope and travels between applications. While WIF does not provide an identity store (user names and passwords), it is designed to query both LDAP identity Stores (AD) and relational databases (SQL).

WIF offers APIs for ASP.NET and WCF developers that can be used to build claims-aware and federation capable applications. WIF was developed under the well-known codename “Geneva” Framework and is currently available for download and deployment. Windows Identity Foundation is supported on IIS6/Windows Server 2003 and IIS7/Windows Vista, Windows Server 2008 and Windows 7.  

Getting Started

In order to develop using WIF on Visual Studio 2010, you will need two things.  First, you will need the runtime components which are referred to as the Windows Identity Foundation. You will also need to install the Windows Identity Foundation SDK 4.0. The SDK provides Visual Studio project templates and enhances Visual Studio with a menu option called Add STS Reference that allows you to provide authentication functionality.

Using WIF to Secure Services Hosted by AppFabric

There are quite a few blog posts that provide incorrect information and suggest that Workflow Services cannot be easily secured. With this posting we hope to dispel that myth!

WIF and AppFabric integration – it just works. 

Note: You don’t need to use the WF Security Activity Pack unless your Workflow Services are making delegated identity calls to other services. If your Workflow Services are the end of the call chain, then WIF is usually all you need.

Scenario

To demo the AppFabric-WIF integration, we use the Contoso DataWiz application. This application allows users to upload data to a warehouse and extract data from the store with a subsequent request. The upload scenario is typical. In the download scenario, a user creates a request for the backend services that in turn will create and deliver a file containing the requested data. Once the data is ready for delivery, the user is sent an e-mail containing a link to the file.

The following screenshots of the DataWiz application describe the aforementioned process.

The DataWiz is a WPF Client; the Rules Engine is a WCF Service and the Data Store is fronted by a Workflow Service.   

Architecture

This section has two parts, where the first introduces the architecture at the Services level, whereas the second provides an integrated architecture that accounts for access control.  

Service Architecture View – Implementation Pattern

The figure below shows the architecture to implement the scenario

The Rules Engine/WCF Service is invoked when the user clicks the ‘Submit’ button on the WCF Client. The event handler invokes the ‘ProcessEvent’ operation. The RulesEngine then examines which event was raised and calls either the Download operation of the DownloadService or the Load operation of the LoadService, both of which we collectively refer to as Computation Workflow 

Integrated Access Control View

In this section we integrate access control to the implementation pattern elaborated above.  In the graphic below, Secure Token Service is included.  Access control is required to authenticate and authorize web service calls when both upload and download operations are invoked.

Let’s walk thru a submission following the numbered sequence and introducing the important concepts along the way.

1.       A user launches the DataWiz, and it uses the logged on Windows Credentials (say CONTOSO\Administrator). When user clicks Submit on the Load Data tab, the user’s Kerberos or NTLM credentials (basically the user name and groups) are sent as claims within a security token to the STS (Security Token Service) as part of a request for another security token containing claims that the Rules Engine understands.

2.       The STS is a Web Service that takes the incoming security token and issues a new security token matching the client’s request. In this scenario, from the Windows credentials presented we need a token containing three claims: Name (has a value of the Windows user name), Role (having the value “AuthenticatedUsers”) and CanLoadData (which is only present and has a value of “true” if the caller is a member of the SeniorManagers Windows group). The STS creates this token and send it back to the DataWiz client.

3.       Now that the DataWiz client has the right claims to present to the RulesEngine, it calls the ProcessEvent operation and includes the token in the message headers. At this point, the RulesEngine could perform an authorization check against the presented claims and allow or deny the call based on some rules. For simplicity, it authorizes all callers. The RulesEngine will determine which service it needs to call. The RulesEngine has been configured to call the Computation Workflow by delegating the identity of the original caller (e.g., CONTOSO\Administrator) while preserving the identity actually running the RulesEngine (which is NETWORK SERVICE). This form of delegation is called ActAs delegation, because the client (the RulesEngine in this case) will be calling the underlying Computation Workflow acting as the original caller (CONTOSO\Administrator).

4.        In order to get tokens relaying both identities, the RulesEngine must call out to the STS again presenting both identities.

5.       The STS takes the presented identity of the caller (Network Service) and adds it to a chain of identities (referred to as actors) beneath the ActAs identity (CONTOSO\Administrator) and returns a new set of tokens.

6.       The RulesEngine presents these security tokens in its call to a computation workflow.

7.       In the case of a Load, an authorization check is required by Contoso to ensure that the user has the CanLoadData claim with a value of “true” before actually allowing the workflow service to respond to the request. No such check is done for calls to Download. This check is performed by a ClaimsAuthorizationManager.

8.       WCF Workflow Services may perform additional authorization on claims using logic defined in the workflow.

The key take away of this solution is that we have eliminated the use of a trusted sub-system approach to security. In the trusted sub-system approach, the Computation Workflows would have been called using the identity of the middle-tier service (e.g., the RulesEngine) that called it, instead of the identity of the original caller. This introduces the risk that the caller may obtain privileges from the middle-tier’s identity which he should not be permitted to execute at the back-end. By delegating the original caller’s identity all the way through to the back-end we have mitigated this risk.  

Implementation 

In order to succinctly illustrate the key implementation steps, we provide a link to the finished solution files for your review and use, <link to codeplex here>.

In sections below we will move forward with the assumption that the upload/download process/components are developed and the solution is being enhanced to apply access control to it. This is typical of how most projects are developed. 

Starting out

We start with a working solution that has the WPF DataWiz client project, the WCF Service RulesEngine project and the WCF Workflow Service ComputationWorkflows project. These three applications communicate in an unsecured fashion. DataWiz app calls the rules engine’s ProcessEvent with a Load or Download event parameter, and then the RulesEngine calls the appropriate Workflow Service in the ComputationWorkflows project. Both service projects are hosted and managed by Windows Server AppFabric.

 Both WCF and WCF/WF Services have includeExceptionDetailsInFaults=true so that errors propagate back to the DataWiz client when they occur, irrespective of depth.

In the following sections we will proceed through steps to secure the RulesEngine and ComputationWorkflows projects by creating the STS project and modifying the client to authenticate with the STS. In this scenario, we implement our STS using templates included in the WIF SDK.

Securing the RulesService with a Custom STS

The first step is to use STS to perform authentication for the RulesEngine. To do this, we right click on the RulesEngine Project and select Add STS Reference.

The three screenshots show the sequence; and to reiterate the only action we take is to select Create a new STS Project in the current solution in the Security Token Service screen.

 

Let’s now review the important details on the summary screen: the wizard (FedUtil.exe aka FedUtil) automatically creates the two certificates we will use – one for data encryption of tokens between applications like the RulesEngine and the STS; and the second for data signing (for digitally signing the token so it can be recognized as coming only from the STS).

As a result of running FedUtil, our solution now contains a fourth project representing a custom STS Observer that is a starter STS, one that’s good for use while developing the solution, but far from ready for production use. In addition, our RulesEngine Service is now configured to use this STS for authenticating clients.

In terms of authorization – we don’t do any here (although we could). This service is focused on getting the right tokens passed down to the appropriate Computation Workflow service based on the incoming parameters. We’ll return to finish the delegation parts after we secure the ComputationWorkflows and update the STS to support delegation.

Update the Custom STS to Emit Desired Claims and Support ActAs Delegation 

Next, we need to make a series of changes to auto-generated STS so that it will issue the three claims we want and support ActAs delegation.

First, we need to modify GetOuputClaims() of the CustomSecurityTokenService class (located under the App_Code folder of newly added STS project) to emit a role claim with a value of “authenticatedusers” and canloaddata claims when the user being authenticated belongs to the SeniorManagers role. The following code snippet shows the complete method implementation:

protected override IClaimsIdentity GetOutputClaimsIdentity( IClaimsPrincipal principal, RequestSecurityToken request, Scope scope )

{

    if ( null == principal )

    {

        throw new ArgumentNullException( “principal” );

    }

 

    IClaimsIdentity outputIdentity = new ClaimsIdentity();

 

    //TODO: demonstrate looking up other attributes here (e.g., from SQL Membership/Role/Profile store)

 

    outputIdentity.Claims.Add( new Claim( System.IdentityModel.Claims.ClaimTypes.Name, principal.Identity.Name ) );

    outputIdentity.Claims.Add( new Claim( ClaimTypes.Role, “AuthenticatedUsers” ) );

       

    if (principal.IsInRole(“SeniorManagers”))

    {

        outputIdentity.Claims.Add(new Claim(“http://contoso.com/claims/canloaddata”, “true”));

    }

 

    if (request.ActAs != null)

    {

        outputIdentity = ((IClaimsIdentity)principal.Identity).Copy();

        IClaimsIdentity currentActAsIdentity = request.ActAs.GetSubject()[0].Copy();

 

        IClaimsIdentity lastActor = currentActAsIdentity;

        while (lastActor.Actor != null)

        {

            lastActor = lastActor.Actor;

        }

 

        lastActor.Actor = outputIdentity;//set last actor in chain to Id of caller (RulesEngine’s ID)

        outputIdentity = currentActAsIdentity; //set the ActAs Identity as the primary identity, instead of using the immediate caller’s identity

 

    }

 

    return outputIdentity;

}

In the above, note how you could use another identity store queried by STS (possibly SQL membership/roles schema). Also, note how we setup the chain of identities when delegating (as we determine by checking the request.ActAs property for a non-null value), such that the identity returned is the identity we are acting as, but the actual identity of the caller is the first actor.  

In order to enable handling of the additional ActAs security token we add the ActAsSecurityTokenServiceFactory class whose primary duty is construct a service host STS that’s configured with a Saml11SecurityTokenHandler for processing the ActAs token. The bulk of this work is done in the CreateServiceHost override, which is as follows:

public override ServiceHostBase CreateServiceHost(string constructorString, Uri[] baseAddresses)

{

    CustomSecurityTokenServiceConfiguration config = new CustomSecurityTokenServiceConfiguration();

 

    Uri baseUri = baseAddresses.FirstOrDefault(a => a.Scheme == “http”);

    if (baseUri == null)

        throw new InvalidOperationException(“The STS should be hosted under http”);

 

    config.TrustEndpoints.Add(new ServiceHostEndpointConfiguration(typeof(IWSTrust13SyncContract), GetWindowsCredentialsBinding(), baseUri.AbsoluteUri));

       

    // Set the STS implementation class type

    config.SecurityTokenService = typeof(CustomSecurityTokenService);

 

    // Create a security token handler collection and then provide with a SAML11 security token

    // handler and set the Audience restriction to Never

    SecurityTokenHandlerCollection actAsHandlers = new SecurityTokenHandlerCollection();

    Saml11SecurityTokenHandler actAsTokenHandler = new Saml11SecurityTokenHandler();

    actAsHandlers.Add(actAsTokenHandler);

    actAsHandlers.Configuration.AudienceRestriction.AudienceMode = AudienceUriMode.Never;

 

    //Set the appropriate issuer name registry

    actAsHandlers.Configuration.IssuerNameRegistry = new ActAsIssuerNameRegistry();

 

    // Set the token handlers collection

    config.SecurityTokenHandlerCollectionManager[SecurityTokenHandlerCollectionManager.Usage.ActAs] = actAsHandlers;

 

    WSTrustServiceHost host = new WSTrustServiceHost(config, baseAddresses);       

    return host;

}

 

As part of CreateServiceHost we configure the ActAsHandlers to use a custom Issuer Name Registry, that simply checks that the certificate used to sign the token is the same one used by our STS. We do this by adding the new ActAsIssuerNameRegistry class as follows.

public class ActAsIssuerNameRegistry : IssuerNameRegistry

{

    /// <summary>

    /// Overrides the base class. Validates the given issuer token. For a incoming SAML token

    /// the issuer token is the Certificate that signed the SAML token.

    /// </summary>

    /// <param name=”securityToken”>Issuer token to be validated.</param>

    /// <returns>Friendly name representing the Issuer.</returns>

    public override string GetIssuerName(SecurityToken securityToken)

    {

        X509SecurityToken x509Token = securityToken as X509SecurityToken;

        if (x509Token != null)

        {

            // Warning: This sample does a simple compare of the Issuer Certificate

            // to a subject name. This is not appropriate for production use.

            // Check your validation policy and authenticate issuers based off the policy.

            if (String.Equals(x509Token.Certificate.SubjectName.Name, “CN=STSTestCert”))

            {

                return x509Token.Certificate.SubjectName.Name;

            }

        }

 

        throw new SecurityTokenException(“Untrusted issuer.”);

    }

}

Note: Both of the aforementioned classes (ActAsIssuerNameRegistry and ActAsSecurityTokenServiceFactory) are variants from the sample code in the Identity Training Kit 2010 under WebServicesAndIdentity\Source\Ex3-InvokingViaDelegatedAccess\End\ActAsSts\App_Code.

Finally, because we need to use our custom factory instead of WIF’s default one, we update Service.svc so that it contains the following line: 

<%@ ServiceHost language=”C#” Factory=”ActAsSecurityTokenServiceFactory” Service=”ActAsSts.CustomSecurityTokenServiceConfiguration” %>

Securing the ComputationWorkflows with a Custom STS and a Claims Authorization Manager

With our STS in place we can now use it for authenticating users accessing our ComputationWorkflows services via delegation. We do this by selecting Add STS Reference from the context menu of the ComputationWorkfows project.  The steps we take through FedUtil this time are similar, but instead of creating a new STS, we configure the use of our existing one (whose metadata lives at http://localhost:1389/RulesEngine_STS/FederationMetadata/2007-06/FederationMetadata.xml and uses the DefaultApplicationCertificate created previously for encryption), as shown in the following screen shots:

 

 

Note: You may have noticed that in this last run of FedUtil we used the LoadService.xamlx as our endpoint. To enable the STS to also apply to DownloadService.xamlx, we added both to the list of applications defined in the audienceUris section of the ComputationWorkflows’ web.config:

<audienceUris>

  <add value=”http://fsweb.contoso.com/ComputationWorkflows/LoadService.xamlx” />

  <add value=”http://fsweb.contoso.com/ComputationWorkflows/DownloadService.xamlx” />

</audienceUris>

Now let’s turn our attention to how we can perform authorization for our workflow services. Here we have two options, we can perform the authorization before the message reaches the workflow or we can perform the authorization within the workflow. Performing it before it reaches the workflow is valuable because it removes any authorization logic that would be cluttering up the business logic of your workflow design, and allows you to change the authorization logic independently of the service implementation. In order to perform authorization in this fashion, we need to create a class that derives from ClaimsAuthorizationManager and add it to the ComputationWorkflows project. Recall that we wanted to restrict access to the Load operation (the action) only to users who have a CanLoadClaims claim. The following code shows the complete implementation, do notice how few lines of code you need to get this done.

public class DataWizClaimsAuthorizationManager : ClaimsAuthorizationManager

{

    public override bool CheckAccess(AuthorizationContext context)

    {

        if (context.Action[0].Value == “http://tempuri.org/IService/Load”)

        {

            return ((IClaimsPrincipal)context.Principal).Identities[0].Claims.Exists(c => c.ClaimType == “http://contoso.com/claims/canloaddata”);

        }

        else

        {

            return true;

        }

    }

}

In order for this ClaimsAuthorizationManager to sit in the message processing pipeline, we simply need to register it in the web.config of the ComputationWorkflows.  This is done by adding the claimsAuthorizationManager element to the service element within the microsoft.identityModel section:

<microsoft.identityModel>

  <service>

    <audienceUris> … </audienceUris>

    <issuerNameRegistry> … </issuerNameRegistry>

    <claimsAuthorizationManager type=”DataWizClaimsAuthorizationManager”/>

  </service>

</microsoft.identityModel>

NOTE: A ClaimsAuthorizationManager can equally be used to authorize access to code-based WCF services, and is built and configured in exactly the same way.

View claims as Appfabric Tracked Events

While it was not a requirement of our scenario, it’s worth a brief review to see how we can examine claims with the Workflow Services. Let’s say we simply want to log some of the claims as custom tracking records so that we can view them in the AppFabric Tracked Events View, similar to the following: 

 

We can do that simply by creating a custom activity that takes as input a string and creates a custom tracking record from that string (see the source code for this). The crux of getting at an individual claim is illustrative of how you always access claims using WIF—via the System.Threading.Thread.CurrentPrincipal property. For example, to get the value for the previous screenshot, we used the following Visual Basic expression:

String.Format(“Canloaddata claim was ‘{0}'”, _

              (From claim In DirectCast(System.Threading.Thread.CurrentPrincipal, Microsoft.IdentityModel.Claims.IClaimsPrincipal).Identities(0).Claims _

               Where claim.ClaimType = “http://contoso.com/claims/canloaddata”).FirstOrDefault())

The previous expression was evaluated by the Track CanLoadDataClaim activity in this sequence (notice the location of this activity; we only access the identity between the ReceiveRequest and SendResponse): 

 

Update the ComputationWorkflow service references in the RulesService

Now that we have finished securing both service projects, we need to update the clients so that they get the new configurations and know to call the STS. We start with the RulesService, which itself has two service references (one to the LoadService and one to the DownloadService). We update the service reference to the LoadService (right click on the Load Reference and select Update Service Reference). This updates the web.config with federation configuration, but we still have some work to do.

In order to capture the Identity of the RulesService’s clients (e.g., users of the DataWiz app) so we can pass down the incoming credentials to the ComputationWorkflows, we need to configure WIF to persist this original identity after it has finished its authorization (otherwise it would just discard it).  This identity is referred to as a Bootstrap Token. We do this by modifying RuleEngine’s web.config by adding saveBootstrapTokens=true attribute to the service element as shown: 

<microsoft.identityModel>

    <service saveBootstrapTokens=”true”>

Since we updated the Service Reference to Load, its client endpoint in web.config is correctly configured to use the STS. We need to copy Load client endpoint and tweak it for use by Download (since they use the same settings). All we need to update are the address and contract values, so that it looks as shown: 

<endpoint address=”http://localhost/ComputationWorkflows/DownloadService.xamlx” binding=”ws2007FederationHttpBinding” bindingConfiguration=”WS2007FederationHttpBinding_IService” contract=”Download.IService” name=”WS2007FederationHttpBinding_IService”>

  <identity>

    <certificate encodedValue=”…” />

  </identity>

</endpoint>

Now, in order to actually pass down ActAs credentials, we need to change how create and use a proxy to the target ComputationWorkflows Workflow Service. Specifically, we have to make use of WIF’s extension methods. First, we need to add a reference to Microsoft.IdentityModel to the ComputationWorkflow project. Then, we need to add a using statement as follows: 

using Microsoft.IdentityModel.Protocols.WSTrust; //need to add this for channel extension methods

Finally, we need to modify the code to get the bootstrap token and pass it in with the request. For example, here’s how we changed the implementation of the ProcessEvent operation for the Download event (the first lines that are commented out are for reference as to how we had called it without ActAs): 

//Download.ServiceClient proxy = new Download.ServiceClient();

//eventId = proxy.Download(new Download.Download() { sourceName = from });

 

SecurityToken bootstrapToken = ((IClaimsPrincipal)Thread.CurrentPrincipal).Identities[0].BootstrapToken;

ChannelFactory<RulesEngine.Download.IService> factory = new ChannelFactory<Download.IService>(“WS2007FederationHttpBinding_IService”);

 

factory.ConfigureChannelFactory(); //configures the factory to use federated client credentials

 

Download.IService proxy = factory.CreateChannelActingAs(bootstrapToken);

 

eventId = proxy.Download(new Download.DownloadRequest() { Download = new Download.Download() { sourceName = from } }).@string;

In the above, observe the call to factory.CreateChannelActingAs. This is an extension method provided by WIF that enables us to pass along the bootstrap token as an ActAs token in addition to our own token when an operation, like Download, is invoked.

Finish it up by Updating the DataWizApp Client’s Service Reference

Finally, we also need to update the DataWizApp’s service reference to the RulesEngine service. This time we don’t need to add a reference to Microsoft.IdentityModel because we aren’t using WIF on the client side.  After updating the reference on Contoso (which maps to the rules engine), app.config is updated for federation with STS and our solution will flow credentials as described in the scenario 

Additional Resources

1.       Download the DataWiz Code Sample:  File Attachment – WIFSecuringAF – Dev STS ActAs w AuthorizationMgr.zig

2.       WIF DevCenter on MSDN http://msdn.microsoft.com/en-us/security/aa570351.aspx

3.       Identity Developer Training Kit (VS 2010 Edition) http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=c3e315fa-94e2-4028-99cb-904369f177c0 

4.       Download WIF http://www.microsoft.com/downloads/en/details.aspx?FamilyID=eb9c345f-e830-40b8-a5fe-ae7a864c4d76&displaylang=en

5.       Download WIF 2010 SDK http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c148b2df-c7af-46bb-9162-2c9422208504&displaylang=en

Namaste!

 Acknowledge review and comments from Paolo Salvatori http://blogs.msdn.com/members/leprino/.

wcf.codeplex.com is now live

wcf.codeplex.com is now live

Over the last few weeks the WCF team has been working on a variety of new projects to improve WCF’s support for building HTTP-based services for the web. We have also focused on a set of features to enable JavaScript-based clients such as jQuery.

We are proud to announce that these projects are now live and available for download on http://wcf.codeplex.com. You can get both the binaries and the source code, depending on your preference. Please note that these are prototype bits for preview purposes only.

For more information on the features, check out this post, this post, this PDC talk, and the documentation on the site itself.

Our new CodePlex site will be the home for these and other features, and we will continue iterating on them with your help. Please download the bits and use the CodePlex site’s Issue Tracker and Discussion tab to let us know what you think!

Thanks,
-Yavor Georgiev
Program Manager, WCF

How using AppFabric Auto-Start feature to avoid warm-up delays for IIS-hosted WCF Receive Locations

How using AppFabric Auto-Start feature to avoid warm-up delays for IIS-hosted WCF Receive Locations

Scenario

During recent years, I had the chance to work with many customers and I realized that one of the most recurring problems they usually experience is due to long start-up times: indeed, this is not a BizTalk specific issue, it’s rather a problem affecting any .NET application, regardless if it’s a Windows Service, a Web Application or an IIS hosted WCF service. When a BizTalk host process starts, it performs some warm-up operations like creating AppDomains, loading assemblies, reading configuration data and populating internal caches. As a consequence, the first inbound request incurs an extra time delay before being processed as it needs to wait until the warm up period has been completed. One of the most commonly used workarounds to solve this problem is creating a separate service or schedule the execution of a script that sends a dummy message to start up the process in question. By the way, this technique is a less than ideal workaround. There’s a best way to avoid long start-up periods at least for WCF services hosted by IIS 7.5 and Windows Server AppFabric. So going back to the initial question, is there any way to avoid long warm-up periods when the first message comes through a BizTalk process? The answer is yes, at least in one case. In fact, BizTalk Server 2010 can take advantage of the Start Automatically feature provided by IIS 7.5 and of the Auto-Start functionality supplied by Windows Server AppFabric to avoid long start-up periods for IIS-hosted WCF Receive Locations. As you already probably know, the latter now supports the .NET Framework 4.0. Now let’s assume that a BizTalk application exposes one or multiple WCF Receive Locations in the isolated host. The Application Pool running the WCF Receive Locations needs be configured to use the .NET Framework 4.0 and the Integrated managed pipeline mode. Therefore, if BizTalk Server 2010 is installed on Windows 7 or Windows Server 2008 R2 with Windows Server AppFabric, you can exploit the Auto-Start Feature provided by this latter to automatically start WCF Receive Locations when the Application Pool worker process starts.

Auto-Start Feature

The auto-start feature of AppFabric is built on top of the auto-start feature of Internet Information Services (IIS) 7.5, which is included in Windows 7 and Windows Server 2008 R2. In IIS, you can configure an application pool and all or some of its applications to automatically start when the IIS service starts. The AppFabric auto-start feature extends this functionality so that you can configure all or some of the services within an application to automatically start when the application starts.

When you enable the auto-start feature for a WCF service running on Windows AppFabric, this latter is up and running as soon as the application that it belongs to is started and before the service receives the first message from a client application. Therefore, the WCF service processes the first message quickly because it is already initialized. Now, the WCF Receive Location is just a WCF service, therefore it can get advantage of the Auto-Start Feature as any other WCF service running in the Windows Server AppFabric environment.

For more information on this topic, see the following articles:

Enabling auto-start for a WCF Receive Location

In order to enable the auto-start for a WCF Receive Location exposed by a BizTalk Server 2010 application running on Windows 7 or Windows Server 2008 R2, you can proceed as follows:

 

  1. Open IIS Manager by clicking Start, clicking All Programs, clicking Windows Server AppFabric, and then clicking Internet Information Services (IIS) Manager.
  2. In the Connections pane, open the server and site containing the WCF Receive Location, and then select the corresponding application.
  3. In the Actions pane, click Configure under the Manage WCF and WF Services heading in the Actions pane, or right-click the application, point to Manage WCF and WF Services, and then click Configure.
  4. In the Configure WCF and WF for Application dialog box, click Auto-Start.
  5. In the Auto-Start dialog box, click Enabled to enable auto-start for the WCF Receive Location and\or WCF services within the application; click Custom to enable auto-start for each individual WCF Receive Location or service in the application separately.
  6. If the application pool for the application is not set to AlwaysRunning, a pop-up dialog will be displayed with the message: “The application pool for this application/service needs to have its startMode set to AlwaysRunning in order for the application/service to successfully auto-start. Do you wish to set the application pool startMode when changes are applied?” Click Yes to set startMode for the application pool to AlwaysRunning, and then click OK.
  7. The auto-start feature for an application works only if you set startMode for the application pool used by the application to AlwaysRunning. You can also set this attribute by using IIS Configuration editor. Note that setting the startMode for an application pool to AlwaysRunning will restart all applications in the application pool.
  8. Click OK.

A simple Test

If you are a BizTalk developer, you probably use the DebugView to monitor the trace produced by your application components. Let’s use this invaluable tool to make a simple test. If you recycle the Application Pool running the BizTalk isolated host and then you send a message to an IIS-hosted WCF Receive Location, the WCF Adapter runtime generates the following trace on the standard output:

clip_image002

Now, let’s assume that your application exposes an orchestration via a WCF-BasicHttp, or WCF-WSHttp or WCF-CustomIsolated Receive Location hosted by the BizTalk isolated host. If you restart the Application Pool running the WCF Receive Location, you won’t be able to see the trace above in the DebugView until you submit the first message. This means that the WCF Receive Location is initialized only upon arrival of the first message. Now, proceed as explained in the previous section and enable the Auto-Start on the application hosting the WCF Receive Location, as highlighted below.

clip_image003

At this point, open the DebugView and try to recycle the Application Pool running the WCF Receive Location. This time you will immediately notice the trace produced by the WCF Adapter runtime components, and this is a clear sign that your WCF Receive Location has been initialized as soon as when the worker processed has started. As a consequence, when you submit the first message, you won’t have to wait for the WCF Adapter runtime to warm-up. Indeed part of the host instance initialization takes place when the first message comes in, but the start-up period has noticeable decreased.

Conclusions

The Auto-Start feature is just an example of how BizTalk and Windows Server AppFabric can be used in the context of an application to address and solve real-time problems as long warm-up periods. In the coming posts I’ll explain how BizTalk and AppFabric can be tightly integrated to improve the quality, functionality and flexibility of your application platform.

Less tweaking of your WCF 4.0 apps for high throughput workloads

Less tweaking of your WCF 4.0 apps for high throughput workloads

I always have a sense of satisfaction when I find out that there is less tweaking or tuning needed to make my applications perform as expected under high throughput workloads (or that the tweaking is easier to do).  I had this sense of satisfaction just this week with some new insights on WCF 4.0 (and while tuning BizTalk Server 2010 using the shiny, new BizTalk Server 2010 Settings Dashboard).

It’s very common for my team to receive questions on why a certain WCF service can’t receive more requests per second, or why more load can’t be pushed through an application exposed as a WCF endpoint.  We documented some of the WCF tunable settings in the 2009 version of the BizTalk Server Performance Optimization Guide sections "Optimizing BizTalk Server WCF Adapter Performance" and "Optimizing WCF Web Service Performance".  While this guidance was done in the context of a BizTalk solution, the WCF-specifics are valid for any WCF application.

The documentation has not caught up to the binaries (yet), but we have it on good authority that we have some new, higher, more dynamic defaults for the ServiceThrottlingBehavior in .NET 4.0 (and that they actually made it into the release).  I also mention new performance counters you can use to diagnose if you are hitting your high watermarks.

ServiceThrottlingBehaviorone of the usual culprits

With .NET 4.0, we’ve made some improvements in WCF so it is a bit more dynamic when it comes to the ServiceThrottlingBehavior.  Directly from the ServiceThrottlingBehavior documentation is the following text:

Use the ServiceThrottlingBehavior class to control various throughput settings that help prevent your application from running out of memory.

The MaxConcurrentCalls property limits the number of messages that currently process across a ServiceHost.

The MaxConcurrentInstances property limits the number of InstanceContext objects that execute at one time across a ServiceHost.

The MaxConcurrentSessions property limits the number of sessions a ServiceHost object can accept.

The key word is highlighted above:  limits.  While limits can be a good thing, when they are set too low they are a distraction and an annoyance.  If it is so easy to tune and diagnose WCF applications, <insert sarcasm here>, why would we need to increase the default limits?  With .NET 4.0, we have not just increased the defaults, we have also made it a bit more dynamic based on the number of processors seen by the OS.  So a more powerful machine will have higher limits.  Here are the old and new defaults:

Property.NET 4.0 DefaultPrevious Default
MaxConcurrentCalls16 * ProcessorCount 16
MaxConcurrentInstances116 * ProcessorCount26
MaxConcurrentSessions100 * ProcessorCount10

Note that the documentation has not been updated, yet, but someone is working on that.

Diagnosing ServiceThrottlingBehavior limits

Prior to .NET 4.0, it was a bit of black magic diagnosing if you were hitting your ServiceThrottling limits.  With .NET 4.0 we’ve added some new performance counters to help diagnose this.  In your application’s config file, you have to enable the WCF performance counters.  After doing this, you’ll see some counters that are new to .NET 4.0.  These show up at the Service level under the Performance Counter object "ServiceModelService 4.0.0.0":

  • Percent of Max Concurrent Calls
  • Percent of Max Concurrent Instances
  • Percent of Max Concurrent Sessions

Here’s a screen shot from perfmon:

Counters in Perfmon

As an example, if you have your ServiceThrottlingBehavior.MaxConcurrentCalls set to 200, and the counter "Percent of Max Concurrent Calls" is showing "10", then your service currently has 20 concurrent calls (10% of 200).  Once again, the documentation is lagging behind the binariesI’ll see if I can get someone to fix this as well.

The next obvious question is, "What should I use for the ServiceThrottling values?".  The answer is a resounding "it depends"!  As with the maxconnection setting, it depends on your application.  Set it too low, and you will throttle too soon, but set it too high, and you could bring your server to its knees with excessive CPU usage and context switching.  As always, performance test your solutions before going to production.

Virtual Machine Meets Physical License – 3 Things you care about in the cloud

Virtual Machine Meets Physical License – 3 Things you care about in the cloud

At the PDC, the volume of announcements on product updates, new offers and emerging technology can be so high that details can be easily lost.  While there are those who pride themselves on detecting every nuance, my guess is there are developers, customers and service providers looking for a little more information on what the Windows Azure announcements might mean for them. If you think you could be a member of the latter, please read on…

 

Today we announced a new role for Windows Azure – the VM Role.  This functionality does the obvious thing – runs Windows Server 2008 R2 on Windows Azure.  Key to this feature is the user’s ability to construct a Windows Server 2008 R2 VHD on premises, then upload and run it. This is a scenario that we hear a lot about in terms of licensing, so we are taking this opportunity to clarify how this works.

 

Thing you care about #1. Customers with Windows Server licenses obtained through VL may use their media for Windows Server 2008 R2 to create images which can be run in either dedicated or shared hosting environments including the Azure VM Role. .

 

What does that mean in plain English?  Most Service Providers provide their customers with canned images that have one or more products pre-installed.  They do this for license compliance reasons, security and simplicity for the user.  This approach is preferred for many customers for a variety of reasons. Some developers tell us that they want the ability to create images locally rather than configure them while the meter is running or use an existing image rather than configure an image from a Service Provider.  Service Providers who want to be in the business of running customer-provided instances have a simple mechanism for doing so.

 

Three things that might not be clear on the point above and are probably worth clarifying:

·         The point above is about the software, not the license.  This is not license mobility.  You aren’t moving a Windows Server license from on premise into the cloud; you are just using your Windows Server media to create images which will run outside of your organization.

·         The license for Windows Server in this scenario still comes from the service provider.  The service provider (via SPLA) provides the customer a Windows Server license.  Using your own bits doesn’t change the need for a Windows Server license.

·         Any other MSFT software that runs on top of the image needs to be licensed through the service provider.  Take SQL Server for example, running SQL Server on that image requires a license through the Service Provider.

 

Thing you care about #2.  As a pilot, MSDN customers can use products under their active MSDN subscription in a Dev / Test capacity in the Windows Azure VM Role.

 

Many of our Windows Azure customers came to us by way of the MSDN trial.  We know that many developers want to use Azure to spin up cloud instances to test scenarios, reproduce bugs and check out new features. This pilot gives you a way to use SQL Server and other MSFT software in the VM Role for development & test scenarios and will run until May of 2011.  Of course, MSDN does not give you rights to run the software / applications in a production environment, and this pilot is no different. We’ll collect feedback and see how useful this offer is.  Towards the end of the pilot we’ll announce next steps and how we might modify or extend this.

 

Thing you care about #3. No changes on License mobility…for now.

 

While I’d prefer to leave foreshadowing to meteorologists, I will say that when it comes to the desire for expanded license mobility, we hear you loud and clear.  It’s a very complex issue for our customers, resellers, hosting partners and outsourcing partners.  We’re working on some ideas and would welcome your thoughts. In as much as customers use hardware from a variety of sources, we know that customers will likely use cloud services from multiple providers and we will bank on that as we work through the details.

Host Integration Server 2010 Dynamic Remote Environments

Host Integration Server 2010 Dynamic Remote Environments

In the previous versions of Host Integration Server, all connection properties to the Host environment were configured in a static remote environment (RE).

On the HostApps adapter properties, the connection to a Host was made through one of the existing RE’s on the BizTalk server. Defining a remote environment was done through the TI Manager.

HIS2009adapterscreen

From Host Integration Server 2010, the Remote Environment can now be configured on the sendport itself, instead of a pre-defined RE by using the TI Manager.

sendport1

The steps to configure this dynamic RE are very similar to what you were used to do in TI manager. Click the ’Connection Strings’ ellipses to configure the RE on the sendport.

Add your TI assemblies

sendport5blog

Then ’Edit Connection String’ The following screen shows the connection properties to connect to the Host.

sendport6blog

Having all connection properties available on the sendport allows for more flexibility in deployment and runtime. It is also much more ’friendly’ to the BizTalk developer. He is used to configure send ports, in this way he does not need to use another tool to configure a Host App sendport.

For a complete overview of new features in Host Integration Server 2010 visit this site:

http://msdn.microsoft.com/en-us/library/gg167635(v=BTS.70).aspx

Peter Borremans & Tim D’haeyer

Best Practices for Handling Transient Conditions in SQL Azure Client Applications

Best Practices for Handling Transient Conditions in SQL Azure Client Applications

The following post is intended to offer you a set of best practices centered around the development of reliable SQL Azure client applications. The primary focus of this paper is positioned towards handling transient conditions, namely, those intermittent faults, errors and exceptions that need to be accounted for when developing reliable applications for high-density multi-tenant environments such as SQL Azure.

Background

The developers who have already had the opportunity to start working with the Microsoft’s cloud-based relational database service, widely known as SQL Azure, may know that SQL Azure has introduced some specific techniques and approaches to implementing data access service layer in the applications leveraging the SQL Azure infrastructure.

One of the important considerations is the way how client connections are to be handled. SQL Azure comes with throttling behavior the attributes of which can manifest themselves when a client is establishing connections to a SQL Azure database or running queries against it. The database connections can be throttled internally by the SQL Azure fabric for several reasons, such as excessive resource usage, long-running transactions, and possible failover and load balancing actions, leading to termination of a client session or temporary inability to establish new connections while a transient condition persists. The database connections may also be dropped due to the variety of reasons related to network connectivity between the client and distant Microsoft data centers: quality of network, intermittent network faults in the client’s LAN or WAN infrastructure and other transient technical reasons.

The behavior in question was discussed in the article posted on the SQL Azure team blog back in May 2010. The article articulates the need for implementing retry logic in the client code in order to provide reliable connectivity to the SQL Azure databases. In one of our recent Azure customer projects, we have faced with multiple challenges related to this behavior. This experience has led to creating a generic, reusable framework for handling transient conditions using an extensible retry policy model. We hope that our learnings can be of use for many .NET developers working with SQL Azure.

Double Quote Disclaimer
The information provided in this article reflects the real-world experience with the SQL Azure to date. It is likely that some of the transient conditions discussed below may never surface in a given client application. It is in the nature of a transient condition to be dependent on  and be driven by variable technical, environmental, behavioral and other unique characteristics of a particular application or its surrounding infrastructure.

Transient Conditions in SQL Azure

When handling exceptions in the client applications accessing the SQL Azure databases, it is important to differentiate between general errors and faults that require special treatment. Not every exception would be considered as a transient error. The client applications need to ensure that the application code will enter into retry state only when it’s strictly necessary.

Below are some examples of the transient conditions that may occur in the SQL Azure infrastructure:

Retry Conditions (without timeouts)

In order to determine whether or not a specific exception should be treated as “transient” when working with SQL Azure, the following guidelines must be adhered to:

  • Check the exception type first. The two specific types which the application code would need to filter accordingly are SqlException and TimeoutException;
  • Filter out those SQL exceptions which do not indicate a transient error. The SqlException.Number property helps assert whether or not an exception should be considered as transient. Do not attempt to parse the exception text as it may vary between different releases of the .NET Framework Data Provider for SQL Server;
  • Verify if the error number belongs to the family of transient errors by checking it against a set of well-known error codes. The main error codes that need to be accounted for are listed above. In addition, check the up-to-date list of error codes indicating a loss of connection.

This guidance could easily be packaged into a fully reusable framework for handling connection loss and failed SQL commands due to transient conditions.

Transient Conditions Handling Framework

The framework that we have developed in our project takes into account the end requirements for handling the possible transient conditions. Internally, the framework relies on the implementation of a "retry policy" which makes sure that only valid transient errors will be handled. The policy verifies whether or not an exception belongs to the legitimate category of transient faults before the client application enters into retry state.

At a glance, our implementation of the transient condition handling framework:

  • Provides the foundation for building highly extensible retry logic for handling a variety of transient conditions, not limited to SQL Azure;
  • Supports a range of pre-defined retry policies (fixed retry interval, progressive retry interval, random exponential backoff);
  • Supports separate retry policies for SQL connections and SQL commands for additional flexibility;
  • Supports retry callbacks to notify the user code whenever a retry condition is encountered;
  • Supports the fast retry mode whereby the very first retry attempt will be made immediately thus not imposing delays when recovering from short-lived transient faults;
  • Enables to define the retry policies in the application configuration files;
  • Provides extension methods to support retry capabilities directly in SqlConnection and SqlCommand objects.

The next sections drill down into specific implementation details and are intended to help the developers understand when and how they should make use of the transient condition handling framework referenced above. To follow along, download the full sample code from the MSDN Code Gallery.

Technical Implementation

The following class diagram depicts the underlying technical implementation highlighting all core components and their dependencies:

Transient Conditions Handling Framework Class Diagramm

The key components in the framework are the RetryPolicy<T> and ReliableSqlConnection classes and the ITransientErrorDetectionStrategy interface.

The RetryPolicy<T> class along with its abstract RetryPolicy counterpart encapsulate all the essential logic responsible for iterative execution of developer-defined actions which may result in a transient exception.

The ReliableSqlConnection class is implemented as a look-a-like of SqlConnection and provides a set of value-add methods to ensure that connections could be reliably established and commands could reliably executed against a SQL Azure database.

The ITransientErrorDetectionStrategy interface provides the base mechanism upon which different types of transient conditions can be described and packaged into a reusable policy object that performs validation on a given .NET exception against a well-known set of transient faults. Along with a transient error detection policy for SQL Azure, the framework also includes the transient condition detection strategies for AppFabric Service Bus, AppFabric Message Buffers and Windows Azure storage.

In addition, the class library provides a set of C# extension methods enabling the .NET developers to open SQL Azure database connections and invoke the SQL commands from within a retry policy-aware scope. The extension methods can be useful in the event when the developers are unable to adopt their code to take advantage of the ReliableSqlConnection class. For instance, a developer might be using an existing data access framework (e.g. Enterprise Library) which returns the pre-initialized instances of SqlConnection class. In this case, the extension methods could help add the retry capabilities into the existing code without major re-work.

Usage Patterns

The following sections illustrate some common usage patterns that apply when building reliable SQL Azure client applications using the transient condition handling framework discussed above.

Configuring Retry Policies

There are two primary ways of setting up a retry policy in the transient condition handling framework:

  1. Create an instance of the RetryPolicy<T> class with required transient error detection strategy and appropriate configuration parameters specified at construction time.
  2. Describe the retry policy definitions in the application configuration file and use the provided configuration APIs to instantiate and return an instance of the appropriate retry policy.

The RetryPolicy<T> class allows creating different policies depending on particular needs. The class constructors accept variable input and return an instance of the respective retry policy configured as per specified initialization parameters:

public class RetryPolicy<T> : RetryPolicy where T : ITransientErrorDetectionStrategy, new()
{
    /// <summary>
    /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and default
/// fixed time interval between retries.
/// </summary> /// <param name="retryCount">The number of retry attempts.</param> public RetryPolicy(int retryCount) : this(retryCount, DefaultRetryInterval) { /* ... */ } /// <summary> /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and time
/// interval between retries.
/// </summary> /// <param name="retryCount">The number of retry attempts.</param> /// <param name="intervalBetweenRetries">The interval between retries.</param> public RetryPolicy(int retryCount, TimeSpan intervalBetweenRetries) { /* ... */ } /// <summary> /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and backoff
/// parameters for calculating the exponential delay between retries.
/// </summary> /// <param name="retryCount">The number of retry attempts.</param> /// <param name="minBackoff">The minimum backoff time.</param> /// <param name="maxBackoff">The maximum backoff time.</param> /// <param name="deltaBackoff">The delta value in the exponential delay between retries.</param> public RetryPolicy(int retryCount, TimeSpan minBackoff, TimeSpan maxBackoff, TimeSpan deltaBackoff) { /* ... */ } /// <summary> /// Initializes a new instance of the RetryPolicy class with the specified number of retry attempts and
/// parameters defining the progressive delay between retries.
/// </summary> /// <param name="retryCount">The number of retry attempts.</param> /// <param name="initialInterval">The initial interval which will apply for the first retry.</param> /// <param name="increment">The incremental time value for calculating progressive delay between retries.</param> public RetryPolicy(int retryCount, TimeSpan initialInterval, TimeSpan increment) { /* ... */ } }

The retry policies can also be defined in the application configuration. Each retry policy definition is accompanied with a friendly name and a set of parameters such as retry count and interval:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections>
    <section name="RetryPolicyConfiguration" type="Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration.RetryPolicyConfigurationSettings, Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling" />
  </configSections>

  <RetryPolicyConfiguration defaultPolicy="FixedIntervalDefault" defaultSqlConnectionPolicy="FixedIntervalDefault" defaultSqlCommandPolicy="FixedIntervalDefault" defaultStoragePolicy="IncrementalIntervalDefault" defaultCommunicationPolicy="IncrementalIntervalDefault">
    <add name="FixedIntervalDefault" maxRetryCount="10" retryInterval="100" />
    <add name="IncrementalIntervalDefault" maxRetryCount="10" retryInterval="100" retryIncrement="50" />
    <add name="ExponentialIntervalDefault" maxRetryCount="10" minBackoff="100" maxBackoff="1000" deltaBackoff="100" />
  </RetryPolicyConfiguration>
</configuration>

Once the retry policy configuration is defined, a policy can be instantiated using the following 3 simple lines of code:

// Retrieve the retry policy settings from the application configuration file.
RetryPolicyConfigurationSettings retryPolicySettings = ApplicationConfiguration.Current.GetConfigurationSection<RetryPolicyConfigurationSettings>(RetryPolicyConfigurationSettings.SectionName);
// Retrieve the required retry policy definition by its friendly name. RetryPolicyInfo retryPolicyInfo = retryPolicySettings.Policies.Get("FixedIntervalDefault");
// Create an instance of the respective retry policy using the transient error detection strategy for SQL Azure. RetryPolicy sqlAzureRetryPolicy = retryPolicyInfo.CreatePolicy<SqlAzureTransientErrorDetectionStrategy>();

The RetryPolicy instances carry all the necessary “intellect” capable of recognizing the legitimate transient conditions when executing the user code as shown in the next two sections.

Reliably Opening SQL Azure Database Connections

In order to ensure that a connection to a SQL Azure database can be reliably established, one of the following approaches can be adopted:

  • Use the Open method from the ReliableSqlConnection class. Should the connection fail to be established from the first attempt, the associated retry policy will take effect and a request will be retried as per the specified retry policy;
  • Use the OpenWithRetry extension method against an instance of the SqlConnection class. Its behavior is similar to the above. Behind the scene, the specified retry policy will kick in and will re-try the request should a transient error be encountered.

Below are some examples of using the above approaches:

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString))
{    
     // Attempt to open a connection using the specified retry policy.    
     conn.Open(sqlAzureRetryPolicy);    
     // ... execute SQL queries against this connection ...
}

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString, sqlAzureRetryPolicy))
{
     // Attempt to open a connection using the retry policy specified at construction time.    
     conn.Open();
    // ... execute SQL queries against this connection ...
}

using (SqlConnection conn = new SqlConnection(connString))
{
    // Attempt to open a connection using the specified retry policy. 
// The extension method is used in this context since we are dealing with a SqlConnection instance. conn.OpenWithRetry(sqlAzureRetryPolicy); // ... execute SQL queries against this connection ... }

Note that both approaches deliver the same end result. In comparison to the standard SqlConnection class, ReliableSqlConnection provides a few value-add capabilities such as retrieving the current session’s CONTEXT_INFO value for tracing purposes and executing SQL commands using general purpose ExecuteCommant<T> method.

Reliably Executing Queries Against SQL Azure Databases

When executing queries against a SQL Azure database, it is also important to handle situations when a connection may be terminated due to transient reasons previously discussed, .e.g query throttling. Should this occur, an attempt to retry the query may become a necessity. Note that not all queries could be safely retried, most importantly those which do not leave the data in a consistent state, for instance, when making updates to multiple tables without an outer transaction ensuring atomicity of the overall operation.

For those queries that could be safely retried, one of the following approaches can be adopted:

  • Use the ExecuteCommand or ExecuteCommand<T> method in the ReliableSqlConnection class. A failed command will be automatically retried as per the specified policy. The retry operation will also ensure that a SQL connection will be re-opened (if required) before attempting to re-run the failed SQL command.
  • Use the appropriate extension method available for the SqlCommand class such as ExecuteNonQueryWithRetry, ExecuteReaderWithRetry, etc.

Below are some examples of using the above approaches:

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString, sqlAzureRetryPolicy))
{
    conn.Open();
 
    SqlCommand selectCommand = new SqlCommand("select name, object_id from sys.objects where name = 'Application'", conn.Current);
 
    // Execute the above query using a retry-aware ExecuteCommand method which will
    // automatically retry if the query has failed (or connection was dropped)
    using (IDataReader dataReader = conn.ExecuteCommand<IDataReader>(selectCommand))
    {
        if (dataReader.Read())
        {
            string objectName = dataReader.GetString(dataReader.GetOrdinal("name"));
            tableObjectID = dataReader.GetInt32(dataReader.GetOrdinal("object_id"));
        }
    }
}

using (ReliableSqlConnection conn = new ReliableSqlConnection(connString))
{
    conn.Open(sqlAzureRetryPolicy);
 
    IDbCommand selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "UPDATE Application SET [DateUpdated] = getdate()";
 
    // Execute the above query using a retry-aware ExecuteCommand method which will
    // automatically retry if the query has failed (or connection was dropped)
    int recordsAffected = conn.ExecuteCommand(selectCommand, sqlAzureRetryPolicy);
}

using (SqlConnection conn = new SqlConnection(connString))
{
    conn.Open();
 
    SqlCommand selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "select * from sys.objects where name = 'Application'";
 
    int tableObjectID = Int32.MinValue;
 
    // Execute the above query using a retry-aware ExecuteReaderWithRetry method which will
    // automatically retry if the query has failed (or connection was dropped)
    using (IDataReader dataReader = selectCommand.ExecuteReaderWithRetry(sqlAzureRetryPolicy))
    {
        if (dataReader.Read())
        {
            string objectName = dataReader.GetString(dataReader.GetOrdinal("name"));
            tableObjectID = dataReader.GetInt32(dataReader.GetOrdinal("object_id"));
        }
    }
 
    selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "select object_id from sys.objects where name = 'Application'";
 
    // Execute the above query using a retry-aware ExecuteScalarWithRetry method which will
    // automatically retry if the query has failed (or connection was dropped)
    object objectID = selectCommand.ExecuteScalarWithRetry(sqlAzureRetryPolicy);
 
    selectCommand = conn.CreateCommand();
    selectCommand.CommandText = "UPDATE Application SET [DateUpdated] = getdate()";
 
    // Execute the above query using a retry-aware ExecuteNonQueryWithRetry method which
    // will automatically retry if the query has failed (or connection was dropped)
    int recordsAffected = selectCommand.ExecuteNonQueryWithRetry(sqlAzureRetryPolicy);
}

The usage patterns have so far been focused on fairly primitive ADO.NET examples. The following section offers slightly more advanced scenarios where the transient error handling framework could increase the reliability of SQL Azure client applications regardless how these applications access their data.

Advanced Usage Patterns

It is fair to expect that modern data-oriented software would not always be going down the path of using the plain ADO.NET APIs when accessing the application data. Many alternative technologies have been developed over the past few years to support advanced data access scenarios: Entity Framework, WCF Data Services, LINQ to SQL, ASP.NET Dynamic Data, just to name a few. All these technologies are intended to significantly reduce the complexity of data management and greatly simplify the way how rich data is modeled, queried and projected to the application domain-specific space.

Whenever SQL Azure is chosen as relational data platform interoperable with any of the above technologies, handling transient conditions will immediately become a requirement. Given that the data access is heavily abstracted with the use of these technologies, the approach to adding resistance against transient faults differs from what has been discussed up to this point.

Fortunately, the implementation of the retry policy model in the transient condition handling framework makes it easier to wrap any user code into a retryable scope. Should a transient fault be encountered, the entire scope will be re-run. This capability is delivered by the ExecuteAction and ExecuteAction<T> methods:

sqlAzureRetryPolicy.ExecuteAction(() =>
{
    // Invoke a LINQ2SQL query.
});

return sqlAzureRetryPolicy.ExecuteAction<IEnumerable<string>>(() =>
{
    // Invoke a LINQ query against the Entity Framework model.
    return result;
});

Note that retryable scope should present itself as an atomic unit of work. The scope could be invoked multiple times and it is therefore important to ensure  that it leaves the underlying data in a transactionally consistent state. In addition, the scope should not swallow exceptions, these are required for detecting transient conditions.

The following sample is borrowed from the MSDN Library and enriched with retry-aware logic where appropriate. This will increase the overall reliability of the client code making it robust and more resistant to potential connection or query throttling should the application database be hosted in SQL Azure.

// Define the order ID for the order we want.
int orderId = 43680;

// Create an EntityConnection.
EntityConnection conn = new EntityConnection("name=AdventureWorksEntities");

// Create a long-running context with the connection.
AdventureWorksEntities context = new AdventureWorksEntities(conn);

try
{
    // Explicitly open the connection inside a retry-aware scope.
    sqlAzureRetryPolicy.ExecuteAction(() =>
    {
        if (conn.State != ConnectionState.Open)
        {
            conn.Open();
        }
    });

    // Execute a query to return an order. Use a retry-aware scope for reliability.
    SalesOrderHeader order = sqlAzureRetryPolicy.ExecuteAction<SalesOrderHeader>(() =>
    {
        return context.SalesOrderHeaders.Where("it.SalesOrderID = @orderId", 
                new ObjectParameter("orderId", orderId)).Execute(MergeOption.AppendOnly).First();
    });

    // Change the status of the order.
    order.Status = 1;

    // Delete the first item in the order.
    context.DeleteObject(order.SalesOrderDetails.First());

    // Save changes inside a retry-aware scope.
    sqlAzureRetryPolicy.ExecuteAction(() => { context.SaveChanges(); });

    SalesOrderDetail detail = new SalesOrderDetail
    {
        SalesOrderID = 1,
        SalesOrderDetailID = 0,
        OrderQty = 2,
        ProductID = 750,
        SpecialOfferID = 1,
        UnitPrice = (decimal)2171.2942,
        UnitPriceDiscount = 0,
        LineTotal = 0,
        rowguid = Guid.NewGuid(),
        ModifiedDate = DateTime.Now
    };

    order.SalesOrderDetails.Add(detail);

    // Save changes again inside a retry-aware scope.
    sqlAzureRetryPolicy.ExecuteAction(() => { context.SaveChanges(); });
}
finally
{
    // Explicitly dispose of the context and the connection. 
    context.Dispose();
    conn.Dispose();
}

In summary, the versatility of the transient condition handling framework comes with the ability to perform retry-aware operations in a variety of contexts, whether it’s a single SQL statement or a large unit of work. In all cases, the way how the transient faults are detected will be consistently similar.

Conclusion

The underlying fabric managing the SQL Azure nodes comes with specific elements of behavior which need to be fully understood by the client applications accessing the SQL Azure databases. The throttling behavior of SQL Azure forces to come up with a better way of handling connections and executing queries. This includes the need for handling transient exceptions to ensure that the client code is able to behave reliably in the event of SQL Azure database connections being throttled by the Resource Manager. There are also other intermittent conditions which need to be accounted for. Consequently, having a robust retry mechanism in the SQL Azure client applications becomes imperative.

We aimed to provide the community with validated best practices to help the .NET developers build a reliable data access layer taking into account these specific behavioral attributes of our cloud-based database infrastructure. Our best practices were presented in a form a reusable framework which developers could easily plug in and adopt in their solutions.

The accompanying sample code is available for download from the MSDN Code Gallery.  Note that all source code files are governed by the Microsoft Public License (Ms-PL) as explained in the corresponding legal notices.

Additional Resources/References

For more information on the topic discussed in this paper, please check out the following resources:

Authored by: Valery Mizonov

Reviewed by: James Podgorski, Michael Thomassy