v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
Normal
0
false
false
false
false
EN-GB
X-NONE
X-NONE
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:””;
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin-top:0cm;
mso-para-margin-right:0cm;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0cm;
mso-pagination:widow-orphan;
font-size:11.0pt;
mso-bidi-font-size:10.0pt;
font-family:”Calibri”,”sans-serif”;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
[Source: http://geekswithblogs.net/EltonStoneman]
We’ve had a couple of projects recently with similar requirements to process an Excel file as a batch upload of data. One was a BizTalk project where the FarPoint Spread pipeline component was a good fit; the other was a Web app where we put together a custom parser based on the open-source ExcelDataReader. The custom solution was appropriate for the expected size of upload files, but wouldn’t scale well to deal with large files quickly, so I wanted to look at a distributed alternative using nServiceBus. My sample implementation is on MSDN Code Gallery here: nServiceBus Excel Upload. I’ll look at a comparative BizTalk solution in a future post.
If you haven’t come across nServiceBus, it’s a queue-based messaging framework which is inherently asynchronous. “Scalability and reliability are in its DNA”, and it has some impressive case studies. Using nServiceBus you can set up a simple publish-subscribe architecture between nodes, or a load-balanced architecture with a central distributor. In the distributed version, the upload sample looks like this:
(Note that the diagram represents the bus as a separate entity, in reality it’s distributed among the queues of all the nodes. The diagram also omits the distributor).
In nServiceBus, services are requested by publishing messages onto the bus. Requests are fulfilled by a handler which subscribes to a type of message. The Excel upload sample takes a workbook which contains a set of products and uploads them to the AdventureWorks database. There are three types of message:
- StartBatchUpload – published when a file has been received and is ready to be processed; subscriber does some basic validation on the Excel data structure, and then for each row in the worksheet publishes an AddProduct message;
- AddProduct – subscriber maps the product defined in the message to a stored procedure call which inserts the new product. When the last product in the batch is reached, sends a BatchStatusChanged message to the original publisher of the StartBatchUpload message;
- BatchStatusChanged – logs the status change and renames the Excel upload file.
This is a basic example, more validation would be expected, but the workflow is representative. Parsing the Excel file is done quickly, allowing for any number of nodes to participate in the resource-intensive work of creating the products. Using a single host with 5 threads, an Excel file with 3,500 rows takes just over 4 minutes to process on a dev laptop. That’s 13 messages per second which is nothing special, but this is on a single host which is also running the distributor and SQL Server. The processing host has a flat memory profile (consistently around 40Mb) and runs at less than 20% CPU. The distributor takes around 15% CPU, and MSMQ another 15%.
For a much larger upload – 12,000 rows – the processing and memory profile is the same, and the upload takes around 14 minutes (~14 messages per second) on the same infrastructure.
Running the Sample
Access to a SQL Server instance with the AdventureWorks sample database installed is a pre-requisite. You’ll need to add the new stored procedure with uspInsertProduct.CREATE.sql. The connection string used by the host is specified in ExcelUpload.Host.exe.config (defaults to unnamed local instance).
You’ll need MSMQ running on all nodes. Queues are specified in configuration and are created by nServiceBus if they don’t exist – an exception is the storage queue for the distributor which needs to be manually created, this PowerShell snippet will do it:
[Reflection.Assembly]::LoadWithPartialName(“System.Messaging”)
[System.Messaging.MessageQueue]::Create(“.\Private$\distributorStorage”, $true)
Unzip the file ExcelUpload.Binaries.zip. You’ll have a batch file – start.cmd – and five subdirectories – Client, Host, Distributor, SampleFiles and Drops. Run start.cmd, check the console screens for errors, then copy one of the Excel files from SampleFiles to Drops. You should see activity in the host, client and distributor console screens, and new rows being added to the [Production].[Product] table.
If you drop the same file twice, the unique key on Products will be violated, so the upload will error. On a fresh install there are under 1000 products, so this resets the table to the default state:
delete [Production].[Product] where ProductID > 999;
Implementation Details
The sample uses the release version of nServiceBus – 1.9 – as the distributor was broken in the 2.0 beta at the time of writing.
The two console apps run the “client” (which monitors a configured file location for an Excel drop, and publishes the StartBatchUpload message), and the “host” (which subscribes to StartBatchUpload and publishes AddProduct and BatchStatusChanged messages). Both use Topshelf so they can run as a console, or can be installed as a Windows service (e.g. ExcelUpload.Client.exe /install).
If you want to run several hosts on the same machine, they will need to use different queues. Copy the whole of the Host directory, and modify ExcelUpload.Host.exe.config to specify a unique queue name:
InputQueue=“ExcelUpload.Service.1.InputQueue“
Then run ExcelUpload.Host.exe from all the copied locations, and you’ll see the console hosts sharing the message processing when a file is dropped.