If you are exploring the Windows Azure AppFabric Service Bus capabilities and trying the bits out in your projects, the following article may shed some light on where to be on a lookout when it comes to using the netTcpRelayBinding in Streaming mode. By following the recommendations in this article you can maximize the performance of your TCP streaming solutions.

Scenario

I’m currently leading the development efforts on a customer project where we are streaming large amounts of structured (XML) data from an on-premise BizTalk Server 2010 environment all the way to a cloud-based inventory database hosted on SQL Azure. The message flow can be simplified to an extent where it can be described as follows:

  1. Inventory files are being received from many EDI partners and transformed into a canonical inventory schema representation using BizTalk Server’s support for EDI interoperability and data mapping/transformation;
  2. The canonical inventory schema instances are being picked up by a designated WCF-Custom Send Port configured with netTcpRelayBinding that talks to the Azure Service Bus;
  3. The inventory data is relayed in streaming mode through the Service Bus to a WCF Service endpoint hosted in a worker role on the Windows Azure;
  4. The WCF Service receives the data stream and relays it further to a SQL Azure database-based queue so that the data becomes available for processing.

Below is the depicted version of the message flow that we have implemented at the initial stage of the project:

pic3

The Windows Azure AppFabric Service Bus makes the above scenario shine as it makes it easy to connect an existing on-premise BizTalk infrastructure with cloud-based service endpoints. While it’s truly an eye-opener, there are several observations that we have made as it relates to data streaming over TCP.

Observations

As referenced above, the cloud-hosted WCF service exposes a streaming-aware operation that takes the inbound data stream and makes sure that it safely lands in a SQL Azure database. Specifically, we are reading the data from an inbound stream into a memory buffer in chunks, and then flush the buffer’s content into a  varchar(max) field using the Write() mutator operation supported by the UPDATE command.

The code snippet implementing the above technique is shown below:

#region IPersistenceServiceContract implementation
public Guid PersistDataStream(Stream data)
{
    // Some unrelated content was omitted here and the code below was intentionally simplified for sake of example.

// For best performance, we recommend that data be inserted or updated in chunk sizes that are
// multiples of 8040 bytes.
int bufferSize = 8040 * 10; using (ReliableSqlConnection dbConnection = new ReliableSqlConnection(dbConnectionString)) using (SqlStream sqlStream = new SqlStream(dbConnection, readDataCommand, writeDataCommand, getDataSizeCommand)) { BinaryReader dataReader = new BinaryReader(data); byte[] buffer = new byte[bufferSize]; int bytesRead = 0; do { bytesRead = dataReader.Read(buffer, 0, bufferSize); if (bytesRead > 0) { TraceManager.CustomComponent.TraceInfo("About to write {0} bytes into SQL Stream", bytesRead); sqlStream.Write(buffer, 0, bytesRead); } } while (bytesRead > 0); } return Guid.NewGuid(); } #endregion

While this code does in fact transfer the data, we were surprised to find that it does not do so in 80400 byte chunks!  Despite the fact that both the client and server WCF bindings were configured correctly and identically as and where appropriate, including such important configuration parameters as reader quotas, max buffer size, etc, we have noticed that the specified buffer size was not appreciated by the underlying WCF stream. This basically means that the chunk size returned from the Read method was never even near the anticipated buffer size of 80400 bytes. The following trace log fragment supports the above observations (note the instrumentation event in the above code that we emit before writing data into a SQL Azure database):

SQL Stream Write Bytes

There is an explanation for the behavior in question.

First of all, some fluctuation in the read chunk size bubbled up by the OSI transport layer is expected on any TCP socket connection. With TCP streaming, the data is being made available immediately as it streams off the wire. The TCP sockets generally don’t attempt to fill the buffer completely, they do their best to supply as much data as they can as timely as they can.

Secondly, when we set the buffer size to 80400 bytes, we unintentionally attempted to ask the TCP stack to buffer up to 53 times of its Maximum Transmission Unit (MTU) value as well as potentially exceeding the maximum TCP receive window size. This is an unrealistic ask.

So, why do these small incremental (sometimes appearing to be random) chunks project potential concerns to a developer? Well, in our example, we are writing data into a SQL Azure database and we want this operation to be as optimal as possible. Writing 2, 6, 255 or even 4089 bytes per each call doesn’t allow us to achieve the desired degree of efficiency. Luckily, a solution for this challenge comes across extremely well in the following simple approach.

Solution

Simply put, we need to make sure that the data will be continuously read from the inbound stream into a buffer until the buffer is full. This means that we will not stop after the first invocation of the Read method – we will be repetitively asking the stream to provide us with the data until we are satisfied that we have received the sufficient amount. The easiest way of implementing this would be through an extension method in C#:

public static class BinaryReaderExtensions
{
    public static int ReadBuffered(this BinaryReader reader, byte[] buffer, int index, int count)
    {
        int offset = 0;

        do
        {
            int bytesRead = reader.Read(buffer, index + offset, count);

            if (bytesRead == 0)
            {
                break;
            }

            offset += bytesRead;
            count -= bytesRead;
        }
        while (count > 0);

        return offset;
    }
}

Now we can flip the method name from Read to ReadBuffered in the consumer code leaving the rest unchanged:

do {
// Note the name changed from Read to ReadBuffered as we are now using the extension method. bytesRead = dataReader.ReadBuffered(buffer, 0, bufferSize); if (bytesRead > 0) { TraceManager.CustomComponent.TraceInfo("About to write {0} bytes into SQL Stream", bytesRead); sqlStream.Write(buffer, 0, bytesRead); } } while (bytesRead > 0);

The end result is that we can now guarantee that each time we invoke a SQL command to write data into a varchar(max) field, we deal with completely full buffers and data chunks the size of which we can reliably control:

SQL Stream Write Bytes (80400 bytes)

As an extra benefit, we reduced the number of database transactions since we are now able to stream larger chunks of data as opposed to invoking the SQL command for a number of smaller chunks as it was happening before.

Conclusion

Streaming is a powerful and high-performance technique for large data transmission. Putting on the large Azure sun glasses, we can confidently say that the end-to-end streaming between on-premise applications and the Cloud unlocks extremely interesting scenarios that could make impossible possible.

In this article, we shared some observations from our recent Azure customer engagement and provided recommendations as to how to avoid a specific “gotcha” with WCF streaming over netTcpRelayBinding in the Windows Azure AppFabric Service Bus. When implemented, these recommendations may help the developers increase the efficiency of the application code consuming the WCF streams.

Additional Resources/References

For more information on the related topic, please visit the following resources: