Debatching Options and Performance in BizTalk Server 2004

Download This Article and Sample Code Here: Debatching Options and Performance Considerations in BizTalk 2004 White Paper and Sample Code



Related Sample: Xml Envelope Debatching




In some business scenarios you may be required to receive a batch file that must be broken up and processed as single messages by BizTalk Server 2004.  This could allow for individual record level processing, selective content based routing, or single message mapping. 




General Debatching Design Considerations


Here are some general design considerations you might want to think about when planning a debatching approach with BizTalk Server 2004. 



General Debatching Design Considerations



  • Header and Trailer Validation Requirements for Flat Files
  • All or Nothing Debatching

    • Should single records be allowed to fail some type of validation, mapping, or routing

  • Mapping Required

    • As a whole message
    • As single messages

  • Last Message Indicator Required

    • Is another process depending on knowing when the last message finished

  • Ordered Message Processing
  • Time of Day Processing

    • Will this affect other processes running

  • File Size (less important than in past versions of BizTalk)
  • Record Volume


Although this article is focused on general debatching options, I want to go into more detail on some of the design considerations above.



 


Header and Trailer Validation


Typically batch files are received in a format other than XML, such as flat file.  In this case, the file will typically have a Header and Trailer record.  These records typically contain information about the number of records in the file, date, time, etc.  In some business processes this information needs to be validated prior to processing the file.  This creates some interesting design challenges.



Some options for this type of validation include a pre-debatching map, validation of the file using .net, or validation inside an Orchestration.  The best option depends on message size since some batch files can be very large (I consider very large as greater than half a gigabyte as XML).



Last Message Indicator Required


Debatching implies breaking up a file into many different pieces.  Sometimes, the single items must still behave as a batch.  Some business processes require knowing when the last message in the batch has been processed to activate another process.  In addition, sometimes ordered processing of the debatching messages is required. 



Ordered Message Processing


Ordered processing of the messages can be accomplished in a few ways.  One way is to use an ordered delivery supported adapter, like MSMQt.  This would require the debatcher to write the messages in the correct order to the queue for processing.  This may also require the use of a convoy to route all the single messages to the same business process. The challenge is to allow for ordered delivery without significantly affecting performance.




BizTalk 2004 Debatching Options


BizTalk 2004 provides us with several different methods for batch file debatching.  What is the best way to split up your files?  As always, that depends on your exact business scenario. 



In this posting, I will look at four different debatching options, review the performance data, and explain the benefits of each type.  I also have the test Orchestrations I used available for download.  I do not provide any sample files, but you can make your own since the Orchestrations use untyped messages.  Just make your structure like: DataData.



The four methods I will cover are:



  • Receive Port Pipeline Debatching
  • Orchestration XPath Debatching
  • Orchestration Atomic Scope Node List Debatching
  • Orchestration Outside .Net Component Debatching



Debatching Options Performance Test Results


Here are the results of the performance tests on each type of debatching.  Tests were run on a 2.4 GHz desktop with 1.25GB RAM.  The sample file produced single records that were 3 KB each.  No mapping was done on the files and times do not include time to send the files using a Send Port.  This is just the time to run the pipeline or orchestrations.  Throughput is intended to show a general idea of the amount of data running through the process; it is not the overall system throughput.  Additional tests were run for XPath and Node List that produced larger output files of 29.9 KB and 299.0 KB. 














































































































































































Type


XML Size (MB)


# Msg


Time (Sec)


Msg/Sec


Msg Size (KB)


Throughput (KB/sec)


Receive Port


1.6


500


8


62.5


3.0


205


Receive Port


3.6


1100


14


78.6


3.0


263


Receive Port


7.2


2200


34


64.7


3.0


217


Receive Port


18.1


5500


59


93.2


3.0


314


Receive Port


128.6


38500


603


63.8


3.0


218


XPath


1.6


500


121


4.1


3.0


14


XPath


3.6


1100


200


5.5


3.0


18


XPath


7.2


2200


667


3.3


3.0


11


XPath


18.1


5500


3077


1.8


3.0


6


Node List


1.6


500


9


55.6


3.0


182


Node List


3.6


1100


21


52.4


3.0


176


Node List


7.2


2200


30


73.3


3.0


246


Node List


18.1


5500


225


24.4


3.0


82


Node List


54.3


16500


1460


11.3


3.0


38


Node List


128.6


38500


15256


2.5


3.0


9


.Net Call


1.6


500


49


10.2


3.0


33


.Net Call


3.6


1100


220


5.0


3.0


17


.Net Call


7.2


2200


663


3.3


3.0


11


.Net Call


18.1


5500


3428


1.6


3.0


5


.Net Call


54.3


16500


27000


0.6


3.0


2























































Type


XML Size (MB)


# Msg


Time (Sec)


Msg/Sec


Msg Size (KB)


Throughput (KB/sec)


XPath


12


400


232


1.7


29.9


53


XPath


35.9


1200


870


1.4


29.9


42


Node List


12


400


10


40.0


29.9


1229


Node List


35.9


1200


28


42.9


29.9


1313


Node List


107.7


3600


128


28.1


29.9


862































































Type


XML Size (MB)


# Msg


Time (Sec)


Msg/Sec


Msg Size (KB)


Throughput (KB/sec)


XPath


14.9


50


40


1.3


299.0


381


XPath


59.6


200


430


0.5


299.0


142


XPath


119.2


400


1849


0.2


299.0


66


Node List


14.9


50


8


6.3


299.0


1907


Node List


59.6


200


27


7.4


299.0


2260


Node List


119.2


400


126


3.2


299.0


969





Debatching Options Detailed Overview



Receive Port Pipeline Debatching – A.K.A. Envelope Processing 


This type of debatching requires defining an envelope as the basic structure of your message.  This is handled a little differently depending on if your input is XML or Flat File.  For native XML messages, you must define a Body node that will be broken out by the pipeline component.  When receiving Flat Files, life is easier since you have control over the final schema structure to be broken out.



Using this type of debatching, it will not be possible to determine when all of the large messages have been sent or processed without considerable effort (i.e. like using a convoy that will degrade performance). 



For more information and a sample of this type of debatching please see my past post on this topic.



Pros: Fast! Fast! Fast!  I am guessing this is because it is all streaming and uses “streaming XPath”.  Great for messaging only solutions that require content based routing of single messages.



Cons: All or nothing debatching in the pipeline.  Your XML schema must have a top level node for the envelope to strip out the single messages under it.  Since the message is not persisted to the database until after the pipeline and map, if something fails in the pipeline or map the entire message will be lost.  In addition, I always have a hard time getting envelopes to work correctly.  I think that is just user error on my part. 



Mapping: Maps are applied to the single messages.  If one fails, the whole batch is failed.  This limits your flexibility.




 


Orchestration XPath Debatching – (Best Bet!)


This is my favorite type of debatching.  This method of debatching comes from Darren Jefford’s Blog.  I like it the best because it provides the most control over the process.  I know exactly when the last message has finished.  This would be useful if you are using this type of debatching to make a .net call or web service submission for each debatched message inside the loop.  Just remember this will lower the performance and you will be running sequentially.



I was shocked at the relatively poor performance of this debatching.  When I was testing smaller files, under 100 single messages, I was getting 10+ messages per second. 



Even with the slower performance at higher output message sizes, this is still my preferred method of debatching when message size permits.  Simple reasons: control and manageability!



Just be careful, I ran a 128 MB file through this and after 1 hour I only had 1500 message out.  I think the slower performance is from the XPath call itself inside the loop.  I think it is rescanning the whole message each time I run the loop.



Pros: Excellent flexibility inside the Orchestration.  Great control of the processing of your document!  This process is sequential and ordered by default.  Ability to loop over anything you can XPath and can easily (this is always a relative term) build mini-batches if needed.  This is something Receive Pipelines are not able to do without extensive custom code.



Cons: Performance degrades quickly as message size increases.  Downright slow on large messages and a resource hog.  In some cases, the sequential and ordered processing of this type of debatching may be limiting to the process.



Mapping: Complete flexibility.  Map before the loop on the whole message or inside the loop as single messages.  Inside the loop, single items can fail mapping and if handled correctly will not stop document processing.




Orchestration Atomic Scope Node List Debatching


This is conceptually a combination of the XPath and Receive Port Debatching. You have the flexibility to loop around anything you can XPath but your process is all or nothing.  This must be done inside an Atomic Scope shape since Node List is not serializable. 



This type of debatching seems to be more sensitive to output message size rather than input message size.  That would make sense, since the smaller the message the more messages the Atomic Transaction will be building up. 



To accomplish this debatching, I set up a Node List and an Enumerator inside the Orchestration.  Then, I use MoveNext inside a loop to extract out the single message for the current node.  This involved casting the object to a Node and getting the value using OuterText.  For complete details, see the samples provided.



Pros: Fast! In some cases, the Atomic Scope may be beneficial.



Cons: All or nothing debatching since you must use an Atomic Scope shape.  Massive resource hog!  My CPU was maxed at 100% the entire time the process was running.  In addition, this seems to tax SQL Server.  After running some processes it needed to be restarted or the computer just would not do anything.



In one test, the process ran for over 8 hours maxing out the CPU the whole time just to have something fail and it all roll back.


 


Mapping: Map before the loop on the whole message or inside the loop as single messages.




Orchestration Outside .Net Component Debatching


This debatching uses an outside .net component to break up the message.  The thought here is that the message will not be scanned for each loop.  As the performance numbers show, using an outside .net component did not increase performance. 



Inside the Orchestration, I created a new instance of a helper class and passed in the input XML message.  Then, I looped over the document using a Node List.  I returned items based on an index I passed in from a loop shape inside an Orchestration.  The performance of this is nearly identical to the XPath method. 



Are there any better ways to do it?  I looked into using XmlNodeReader inside the .net component but I did not try to get it to work.


 


Pros: None.  I would not recommend this approach.



Cons: Slow. This adds an additional component to maintain outside of BizTalk.



Mapping: Mapping can be done on the whole message or on single messages.



 


 


Conclusion


BizTalk Server 2004 offers a variety of options for breaking up large files for processing.  An evaluation of your exact business requirements will help you decide on the option that is best for you. 



I welcome questions, comments, feedback, and other ways to debatch large messages!  Just drop me an email.

Role Links in BizTalk Server 2004


Make sure you check out Todd Uhl’s Blog.  He has a sample working with Role Links in BizTalk 2004.  This is something I had not looked into yet but his sample makes it look easy and useful. 



Todd is working on the Tech Arch team on my current project.  Watch for more great deployment and architecture posts from him in the near future.

Parallel Sequential Convoy in BizTalk Server 2004

Here is another sample of a Sequential Convoy in BizTalk 2004.  This is a little different then most, it contains a Parallel Action shape that allows for concurrent parallel processing of inbound messages.



What does this accomplish?  It is all about control.  This process allows for processing a pre-defined number of messages at the same time in a controlled manner.



Performance?  Ok, it is not the fasting running Orchestration I have ever seen.  Actually, in some of my tests this parallel action processed 1/2 to 1/3 less messages per minutes then a single process convoy.  In case you are wondering, running 100 messages has 203 persistence points. 



What’s the point then?  Well, I could see a need for something like this in a de-batching scenario when the parallel actions will take an unknown account of time to complete and the overall status of all the messages is important.



Download the sample below.  I have 4 parallel actions processing messages.  I added a random delay to simulate message submission to an outside system or web service.  I have a Win Form test harness for easy running.  If you want to test the performance, download my single convoy example and remove the delay shapes inside this sample.



The key to this solution is the Synchronized Scope shapes that maintain the overall message count inside the Orchestration.



DOWNLOAD: Parallel Sequential Convoy Sample



See the read me file for set up instructions.  Make sure you run “Clean Up” before you rerun and samples.



A better approach?  I think it would be better to remove the processing logic inside the Parallel Actions and call an additional Orchestration to do any work.  This would increase the overall flexibility and maintenance of the solution.



Take Away: Parallel Convoys allow for better control of your de-batching process but the overall throughput of your messages is decreased.

Performance Counters in BizTalk Server 2004

Have you wanted to know what exactly BizTalk Server 2004 is doing at a giving instance?  How many Orchestrations are about to Dehydrate?  How about how many persistence points inside your Orchestration?  Answers to these questions are easily available through BizTalk Performance Counters.



Aside on Persistence Points: Persistence Points are key when designing optimized Orchestrations.  A Persistence Point is any time the Orchestration saves its current state back to the message box.  The less Persistence Points you have, the better your Orchestration will perform.  Persistence Points are caused by specific shapes inside your Orchestration like: Send, Parallel Action, Transactional Scopes, and others.  Some more information is available in the help guide under “Persistence”



How to view the Performance Counters:


1. Go to the Start Menu and Select Run


2. Type in: perfmon and Enter


3. Performance Window should open up


4. Click on the + inside the window (or Ctrl-I)


5. Under Performance Objects, Select XLANG/s Orchestrations (note your host must be running)


6. Select the Counters from the list you want to watch, then press Add.  You can get information on each by clicking on Explain.


7. Run your Orchestrations



My Favorite Performance Counters:


Orchestrations resident in memory


Pending messages


Persistence points


Running orchestrations



If you want to know exactly how many Persistence Points you have inside your Orchestration, just run it and watch the counter!



Take Away: Performance Counters in BizTalk 2004 can give you a clear picture on the current status of your Server and an idea of how well your Orchestrations will perform.



More information can be found on Performance Counters in the help guide.  Just search for it.

Working with Untyped Send and Receive Ports in BizTalk Orchestrations

Scott posted a message back in June about untyped receive ports inside on Orchestration.  Untyped Messages means you receive or send a message as System.Xml.XmlDocument rather then a specific schema type.  This is an incredibly powerful feature in BizTalk 2004!



Business Case:


Lets say you want to have one Orchestration (i.e. a single business process) to process many different types of messages all in the same way.  Additionally, say you want to change some values in your message and send all of your messages out through the same Send Shape (i.e. send messages as Xml.XmlDocument).



Lets say you can have a Book Review and a Movie Review.  You want them both to be processed in the same way though the same Orchestration, maybe send information to an outside web service or something like that.  Additionally, you have to extract out promoted properties of the message inside the Orchestration and make decisions inside the Orchestration on them.  This can not be done with Xml.XmlDocument since the Orchestration will not allow you to access them (they are in the message context, just you can not get at them). 



Confused?  How about we look at a sample.



DOWNLOAD: Get the sample here!



Set-up is easy, just unzip the SampleProperties folder and put it on your C: drive.  Then, build and deploy the SampleProperties project. I use early binding so the send and receive ports will be created for you. 


To run the sample, drop the 4 Start….xml message into c:\SampleProperties\In.  You will get 4 messages in your Out folder.  Plus, 4 events will be written to your event log.  Do not forget to look inside the expressions shapes inside the Orchestration for comments.  If all else fails, read the ReadMe.txt file.



Key Take Home Points:


– Common properties must be promoted in all of the different schemas (look at the promotions in both BookReview.xsd and MovieReview.xsd)


– Messages for type XmlDocument can be cast into different types


– Typed messages can be cast back into XmlDocument


– Blank Schema can be created and properties changed (look at Movie Review branch)


– This is using Property Promotion and Demotion in the XML Pipelines


– As a test, set the Send Pipeline to Pass Through and see the difference in the Movie Review output data.


– This will also work for distinguished fields



CRITICAL: This process is kind of risky since you can pass in any XML Document.  You can end up with an invalid cast exception or XPath query not valid.



What is this Movie Review Branch inside the Orchestration really doing?



  1. Taking in a message of type Xml.XmlDocument
  2. Casts the In message to CastMovieIn (strongly typed to a schema)
  3. Sets Orchestration variables based on promoted properties inside the MovieReview schema
  4. Creates a new XML Document (CastMovieOut) and loads a blank schema for that type
  5. Changes values inside CastMovieOut
  6. Creates a new message, Out, as XmlDocument
  7. Casts the CastMoveOut to Out


Take Away: Using XmlDocument can greatly increase the flexibility inside your Orchestration.

BizTalk Delivery Notification and NACK Sample

Did anyone read Kevin Smith’s blog on ACK/NACK and run out to try it?  Well, I did and found it a more time consuming then I expected.  I have put together a sample that shows how to catch the SOAP exception and get access to the error message.  I hope that after looking at this sample Kevin’s excellent post on NACK’s will make a little more sense.  It did for me.

DOWNLOAD: Get the sample here!

Set-up is easy, just unzip the SampleNACK folder and put it on your C: drive.  Then, build and deploy the SampleNACK project. You will need to manually create a Send Port.  I set up a File Send Port going to c:\some_location_that_does_not_exist.  Set retries to 0.  To run it, drop the StartFile.xml message into c:\SampleNACK\In.  Your NACK will show up in c:\SampleNACK\Out.  Do not forget to look inside the expressions shapes inside the Orchestration for comments.  If all else fails, read the ReadMe.txt file. 

CRITICAL: Getting at the HTTP error using Delivery Notification will not work for transport type of HTTP or SOAP due to an “issue”.  For more information please see Microsoft KB840008.

Key Take Home Points:
– Delivery Notification is not available on Early Bound Ports.
– Must import System.Web.Services to cast the SOAP exception
– Set Send Port retries to 0
– Must use a Synchronized scope

Under the covers:
Ok, so you ask what is BizTalk doing inside the little Delivery Notification property? Keep in mind this is my interpretation. 

When a message is sent through a send port with Delivery Notification set to transmitted, a correlation set is initialized.  A correlation token is assigned to the outbound message.  A subscription is started based on this correlation token.  This token is stored in the context property of the ACK/NACK and promoted.  When the ACK/NACK is returned, it is routed back to the calling Orchestration.  You get all this just by setting a little property to “Transmitted”!  Cool!

Take Away: Delivery Notification is an easy way to catch exceptions as long as you understand how to get at the error message.

What If a Map Fails on a Send Port in BizTalk 2004

This is a challenge to overcome in BizTalk 2004.  It seems that the message never hits the Send Pipeline and as far as I can tell no error is written to HAT.  The only place this error is written to is the Event Log.  But this is ok if you are using MOM or something else to watch the event log.

The messages will be Suspended inside HAT and marked as resumable.   

CRITICAL: Note that this is not the case if a map fails on the Receive Port. These messages are non-resumable.