Parallel actions, exceptions, and timeouts…

BizTalk 2004 has some very useful behavior around parallel execution and scope-level
timeouts that it is helpful to have a good understanding of.

What follows is a series of experiments & associated findings that should shed
some light on this area.

Experiment One: 

One restriction when using the parallel shape is that if multiple branches make use
of a given orchestration variable, you can get a compiler error:

error X2226: 'someVar': if shared data is updated in a parallel then all references
in every task must be in a synchronized or atomic scope

In the experiment below, we are accesing ‘someVar’ in both parallel branches
– each within an expression shape that also calls Thread.Sleep (the left hand for
10 seconds, the right hand for 30.)  To address the compiler error just noted,
we have a scope around each usage with the ‘Synchronized’ flag on the
scopes set to ‘true’.

The trace messages shown to the right of the diagram tell us that, indeed, the execution
of Scope 2 and Scope 3 is completely serialized (in this case, Scope 2 completes before
Scope 3 begins.)

This experiment also tells us something unrelated (but useful): that an exception
will not interrupt “user code”.  By “user code”
we refer to code in an expression shape, i.e. not using an orchestration “intrinsic”
such as a standard Delay shape or a Receive shape, etc.  Notice via the timings
that the exception thrown in the left-hand branch doesn’t abort the Thread.Sleep
in the right-hand branch.  The exception is caught only after the Thread.Sleep
in the right-hand branch completes (though the final trace message in the right hand
branch – ‘after 30 sec’ – does not execute.)  This is
important to understand if you have expression shapes in orchestrations which are
making blocking calls to .NET components, DCOM servers, etc.

Experiment Two:

If we eliminate the ‘someVar’ reference in the expression shapes above,
we find that the scopes do not execute serially – whether the scope synchronization
flag is set to true OR false.  Notice below (via the timings) that the
sleep operations are executed at the same time – so it is the presence of the common
variable in the expression that forces the synchronization!

We now have a 20 second gap after sleeping for 10 seconds (because the exception we
throw still doesn’t interrupt us).

Experiment Three:

We would like to use a given instance of a .NET object without imposing the
use of synchronized scopes.  As it happens, if we have a .NET object that is
pointed at by two references (i.e. someVar and someVar2 – where someVar2 was set equal
to someVar with a simple assignment), the requirement that the orchestration compiler
normally imposes regarding the use of a synchronized scope goes away.

In the orchestration below, the left branch is using someVar, and the right branch
uses someVar2.  The trace indicates that the sleep operations happen at same
time (though once again the exception doesn’t interrupt the right-hand side
and is caught only when the right-hand completes.)

Lesson: If you have a .NET component (that you know to be thread-safe) you wish to
use in an orchestration – and you wish to use a single instance of it – you will need
to have multiple references to that same instance to use in each branch of a parallel
shape.  (The synchronized-scope alternative is likely unacceptable!)  The
first variable declaration of your component might use the default constructor, while
the others will have “Use Default Constructor” set to false and will be
assigned to the first.

Experiment Four:

Now, there is more to be said regarding exceptions and what they will interrupt in
parallel branches.  If instead of using a Thread.Sleep we use a Delay shape (or
a Receive shape, etc.) we find that throwing an exception in one branch of the parallel
shape will indeed interrupt the other branch(es).  Notice in the timings below
that the Delay 30sec shape does not complete once the exception in the left-hand
branch in thrown.  Particularly when you are structuring real-world business
flows, this is an important and quite useful behavior.

Experiment Five:

The behavior of exceptions in parallel flows is closely related to another behavior
in BizTalk 2004 – that of what BizTalk is prepared to “interrupt” when a long-running
scope has exceeded the timeout value that has been configured for the scope. 
A Delay shape (or Receive, etc.) will indeed be interrupted when the timeout expires
(and a TimeoutException will be thrown).  (Note that for an atomic scope, the
timeout governs the maximum time to allow prior to aborting the transaction.) 
See the timings below and note that the Delay 30sec shape does not complete.

On the other hand, as you might expect at this point, a blocking call in an expression
shape (like a Thread.Sleep or a DCOM call, etc.) will not be interrupted. 
However, the exception will be raised when the blocking call eventually returns:

Summary:

  • The Synchronized flag on a scope will only cause synchronized (serialized) behavior
    among “peers in parallel branches” when shared variable or message state
    is involved.  (This is ignoring any transactional semantics you might layer on
    top – which is beyond the scope of this article!)
  • If you have a thread-safe .NET component that you wish to use in an orchestration
    from multiple parallel branches, strongly consider having multiple variables point
    to a common instance.  The first instance might have “Use Default Constructor
    = true”, while the remaining variables will have that flag set equal to false
    and be assigned to the first instance in an expression shape: someVar2 = someVar1;
    someVar3 = someVar1;
    someVar4 = someVar1, etc.


    An alternative is to use scope-local variables that are assigned to a global instance.
  • A line of code in an Expression shape will not be interrupted by either an
    exception in a parallel branch or a TimeOutException arising from a timed-out
    scope.
  • A Delay shape or Receive shape, etc. will be interrupted by either an exception
    in a parallel branch or a TimeOutException arising from a timed-out scope.

Parallel actions, exceptions, and timeouts…

BizTalk 2004 has some very useful behavior around parallel execution and scope-level
timeouts that it is helpful to have a good understanding of.

What follows is a series of experiments & associated findings that should shed
some light on this area.

Experiment One: 

One restriction when using the parallel shape is that if multiple branches make use
of a given orchestration variable, you can get a compiler error:

error X2226: 'someVar': if shared data is updated in a parallel then all references
in every task must be in a synchronized or atomic scope

In the experiment below, we are accesing ‘someVar’ in both parallel branches
– each within an expression shape that also calls Thread.Sleep (the left hand for
10 seconds, the right hand for 30.)  To address the compiler error just noted,
we have a scope around each usage with the ‘Synchronized’ flag on the
scopes set to ‘true’.

The trace messages shown to the right of the diagram tell us that, indeed, the execution
of Scope 2 and Scope 3 is completely serialized (in this case, Scope 2 completes before
Scope 3 begins.)

This experiment also tells us something unrelated (but useful): that an exception
will not interrupt “user code”.  By “user code”
we refer to code in an expression shape, i.e. not using an orchestration “intrinsic”
such as a standard Delay shape or a Receive shape, etc.  Notice via the timings
that the exception thrown in the left-hand branch doesn’t abort the Thread.Sleep
in the right-hand branch.  The exception is caught only after the Thread.Sleep
in the right-hand branch completes (though the final trace message in the right hand
branch – ‘after 30 sec’ – does not execute.)  This is
important to understand if you have expression shapes in orchestrations which are
making blocking calls to .NET components, DCOM servers, etc.

Experiment Two:

If we eliminate the ‘someVar’ reference in the expression shapes above,
we find that the scopes do not execute serially – whether the scope synchronization
flag is set to true OR false.  Notice below (via the timings) that the
sleep operations are executed at the same time – so it is the presence of the common
variable in the expression that forces the synchronization!

We now have a 20 second gap after sleeping for 10 seconds (because the exception we
throw still doesn’t interrupt us).

Experiment Three:

We would like to use a given instance of a .NET object without imposing the
use of synchronized scopes.  As it happens, if we have a .NET object that is
pointed at by two references (i.e. someVar and someVar2 – where someVar2 was set equal
to someVar with a simple assignment), the requirement that the orchestration compiler
normally imposes regarding the use of a synchronized scope goes away.

In the orchestration below, the left branch is using someVar, and the right branch
uses someVar2.  The trace indicates that the sleep operations happen at same
time (though once again the exception doesn’t interrupt the right-hand side
and is caught only when the right-hand completes.)

Lesson: If you have a .NET component (that you know to be thread-safe) you wish to
use in an orchestration – and you wish to use a single instance of it – you will need
to have multiple references to that same instance to use in each branch of a parallel
shape.  (The synchronized-scope alternative is likely unacceptable!)  The
first variable declaration of your component might use the default constructor, while
the others will have “Use Default Constructor” set to false and will be
assigned to the first.

Experiment Four:

Now, there is more to be said regarding exceptions and what they will interrupt in
parallel branches.  If instead of using a Thread.Sleep we use a Delay shape (or
a Receive shape, etc.) we find that throwing an exception in one branch of the parallel
shape will indeed interrupt the other branch(es).  Notice in the timings below
that the Delay 30sec shape does not complete once the exception in the left-hand
branch in thrown.  Particularly when you are structuring real-world business
flows, this is an important and quite useful behavior.

Experiment Five:

The behavior of exceptions in parallel flows is closely related to another behavior
in BizTalk 2004 – that of what BizTalk is prepared to “interrupt” when a long-running
scope has exceeded the timeout value that has been configured for the scope. 
A Delay shape (or Receive, etc.) will indeed be interrupted when the timeout expires
(and a TimeoutException will be thrown).  (Note that for an atomic scope, the
timeout governs the maximum time to allow prior to aborting the transaction.) 
See the timings below and note that the Delay 30sec shape does not complete.

On the other hand, as you might expect at this point, a blocking call in an expression
shape (like a Thread.Sleep or a DCOM call, etc.) will not be interrupted. 
However, the exception will be raised when the blocking call eventually returns:

Summary:

  • The Synchronized flag on a scope will only cause synchronized (serialized) behavior
    among “peers in parallel branches” when shared variable or message state
    is involved.  (This is ignoring any transactional semantics you might layer on
    top – which is beyond the scope of this article!)
  • If you have a thread-safe .NET component that you wish to use in an orchestration
    from multiple parallel branches, strongly consider having multiple variables point
    to a common instance.  The first instance might have “Use Default Constructor
    = true”, while the remaining variables will have that flag set equal to false
    and be assigned to the first instance in an expression shape: someVar2 = someVar1;
    someVar3 = someVar1;
    someVar4 = someVar1, etc.


    An alternative is to use scope-local variables that are assigned to a global instance.
  • A line of code in an Expression shape will not be interrupted by either an
    exception in a parallel branch or a TimeOutException arising from a timed-out
    scope.
  • A Delay shape or Receive shape, etc. will be interrupted by either an exception
    in a parallel branch or a TimeOutException arising from a timed-out scope.

BizTalk Modular Deployment

I’ve just posted an article on Modular Deployment of BizTalk Server projects. I’m using it at the moment in my current project, and it seems to work fine.


 


Also, I picked up Scott’s book, and am about half way through it. It’s good to be able to fill in the gaps in my knowledge, and looks like a great intro to the product. It does not cover that many advanced topics, with the exception of creating custom adapters, but I guess it would have to be double the size to cover everything in any great detail.


 

Defiantly worth picking up a copy if you work or play with BizTalk.

Debugging Routing Failures

Okay, I disappeared for a while. There is this crazy thing called “work” which my bosses (for those of you who are confused, Scott Woodgate is not my boss 🙂 seem to think I need to do sometimes. Some positive news is that BizTalk 2004 SP1 should be released early next year which is a great thing. When it comes out, I recommend getting it, running it through your test environments and then plopping it into your systems. There are some good things done here and I have seen it improve some performance scenarios up to 10% (although I don’t gaurantee any of that just consider it another way for me to make you more likely to try it out :). We are definitely trying.

  In other news, it looks like I might be talking again at teched 2005. Plans are to do more performance related talks (similar to last years expect probalby with more concrete scenarios as examples and with more lessons learned from what customers actually try to do). I am also slated to give a talk on operational health … how to setup a system for high availability and how to monitor your system to make sure it stays healthy and what to do when certain indicators go off. If you have any other areas you think would be good talks for me to give, let me know. Always curious. 🙂

Okay, now to the topic I mentioned here. I am guessing that by now, if you have been using BTS for long enough, you have figured this out already, but I might as well put some information here anyways so people can find it.

First, what is a routing failure. My previous post describes how BizTalk sits on top of a pub/sub routing engine which is part of the MessageBox. When the messaging engine or orchestration engines publish a message to the messagebox, if no one is subscribing to that message, that is considered a routing failure and an event will be raised, a routing faliure report (described later) will be generated, and possibly a message / instance will be suspended. There are a couple of expections to this like for NACKs where the engine knows that routing failures are exceptable but these are only for internal messages. This “Ignore Routing Failures” type functionality is not something you can configure and while I am sure that the hunt is on now, you cannot hack this up either and it wouldn’t be supported. 🙂 Back to the real story. So how do you figure out why it failed to route??

The routing failure report is literaly a dummy message associated with a dummy instance. The only really interesting part to this message is the context. This is the context of the message which failed to route at the time of the routing failure. It is possible (probable) that the message which gets suspended will not have all of the contextual information which was there when the message failed to route since we ususally suspend the adapter’s message, not the message out of the pipeline which we tried to publish. That is why we generate these reports so that you have a chance to see what really happened. If you open up the context in HAT, you can see the list of properties which were promoted. Now why didn’t it route. 99.99% of the time routing failures occur is when you are in testing. They usually occur because you have a misconfigured filter on your sendport or your activation receive. The easiest way to see this is to use the subscription viewer located in the <installdir>\sdk\utilites directory. This tool is a bit rough (sorry, I am not really a UI guy :), but it gives you a view of what your subscriptions are. Ideally you have some idea of where you expected this message to go. Most subscriptions will have readable names so you can find the one associated with the sendport / orchestration you were expecting it to route to and check the subscription. Simply compare the properties which we are subscribing to to the properties which were promoted in the context. A couple of gotchas which I think are more difficult to see and not well displayed. First, you cannot route a message to 2 solicit response ports. We do not support that cause we have no idea what that means. You sent out one message but got two responses. Request response is considered a one – to -one operation. I know there are lots of scenarios for doing more, but to cleanly support those would require exposing a lot more in the engine like how many people you actually routed the message to so that you can correctly consume all of the responses. This is not something we are planning on doing anytime soon. So, you should know that a routing failure will be generated if you try to route to multiple solicit reponse ports. Another boundary case is when you try to route a previously encrypted message. The host to which the message is being routed (be it the orchestatraion’s host or the sendport’s host) must “own” the decryption certificate. This is because we do not consider the receive port as the destination of the message. It’s job is simply to convert the message into a desired format and extract all relevant information including information required to route the message. The orchestration / sendport is the destination of the message. As such, they need to have the certificate to demonstrate that they could have decrypted the message if it hadn’t already been done for them. Adding the certificate can be done in the Admin MMC via the proprties setting for the host. I am not sure if you get a different error in the eventlog for these two boundary cases. Not sure. All of these cases, though, can be debugged with the Routing Faliure report in HAT, the subscription viewer, the eventlog and a bit of knowledge of what you system is actually trying to do.

If you get routing failures in production, it is going to be something in your design. The most common cases I know of is you have a request response call out of an orchestration, but the response receive shape is in a listen with a timeout. Hence if you hit the timeout and terminate the orchestration, and then the response gets sent back, well the messaging engine could get a routing failure. In general, these types of scenarios either end up in zombies or routing failures since it is simply a race. Not all zombie scenarios cause routing faliures since it is often the case that if the instance is gone, the message might trigger the creation of a new instance (as is the case for a lot of convoy scenarios). You can read more about these subjects in earlier blog entries. In general, though, in this case, it is up to you to decide how / what you want to do with this response since the original sending service is gone. I can’t really think of other scenarios where you would hit this in production. It is going to be built into your design, somewhere. Some race condition exists in your design that can cause this … almost always because you have a listen with a timeout or perhaps a listen with a receive on both sides and both can happen (like with control terminate messages).

Hope that gives some insight. Hopefully I will get a couple more posts in this year. 🙂 Have a happy holiday season.

 

Lee

Flat File Disassembler Output Types in BizTalk Server 2004

I have seen a lot of posts on various news groups over the past few months about the Flat File Dissembler and how it produces output.  I think it is rather confusing so I put together a sample that I hope will shed some light on the subject.



Download: Flat File Disassembler Output Sample
Watch the video: Flat File Disassemblier Output Options Video



The Flat File Disassembler is used to convert flat file documents (either positional, delimited, or a supported combination) into XML based on a defined schema.  The schema must import the Flat File Schema Extensions and have all the required delimiters and positions set, of course.  Flashback: This type of conversion was accomplished using envelops in BizTalk 2002.



The Flat File Disassembler can take in a large batch file and produce either one large XML output or single record Xml outputs.  This is the confusing part… The control of this is based on how the schema is defined and how the properties are set on the Flat File Disassembler inside the Receive Pipeline.



Producing One XML Output


In order to produce one output file, simply define a single schema with the required Header, Body (make sure this is set to 1 to many), and Trailer records.  Then, set the Document Schema property inside the Receive Pipeline Disassembler component to this schema.  Do not set anything for the Header or Trailer Schema.  This will produce one output based on the input file. 



In my sample, this is illustrated in schema AllAsOne.xsd and AllAsOneNoHeader.xsd.  The accompanying pipelines are recAllAsOne.btp and recAllAsOneNoHeader.btp.



Producing Single Message Output


In order to produce a single XML document per input record, the Header, Body, and Trailer records will need to be defined as separate schemas.  Then, each of these will need to set accordingly inside the Receive Pipeline Disassembler component. The base Body message should be set to the Document Schema property.  



In my sample, this is illustrated in schema AllAsOne.xsd and AllAsOneNoHeader.xsd.  The accompanying pipelines are recAllAsOne.btp and recAllAsOneNoHeader.btp.



Inside the sample, pay special attention to the Receive Pipelines.  Note the differences in the setting and the schema to return a single record verses one file.  The sample includes both a flat file with a Header and one with just Body records.  To run the sample, see the ReadMe.txt. I have included 4 Orchestrations to allow for easy Receive and Send Port creation.

BizTalk Server 2004 Unleashed – 4 Free Chapters

At last! Scott’s book made it to press. More people have been waiting for this than for the next Harry Potter. There’s four free chapters available for download. Very generous od Sams I think. (As I’m living in Sweden, I’m still waiting for it to become available over here.)


Nice to see the second book on BizTalk 2004 finally unleashed ;-), nice one Scott and the team…


Chapter 3 – Building Message Specifications is here:


http://www.samspublishing.com/title/0672325985


Chapter 7 – Working with Adapters
Chapter 10 – Orchestrating Web Services and Correlation
Chapter 11 – Developing Rules


http://www.theserverside.net/books/sams/BizTalkUnleashed/index.tss
(Free reg required).

Some Overdue Plugging

A couple of blog entries that I wanted to highlight, but I’ve been a bit late blogging on (blame short-sharp-“just do this project before Christmas would you?” type work).


First, I’m sure you’ve all seen it by now, but Scott Colestock is continuing his barnstorming work on BizTalk deployment with NAnt. Check it out. BAT files are so 1980’s 😉 If you’re not doing something similar in your deployments (either through your own work, or using Scott’s) then you are really missing a trick.


Also, a big plug must go to Thinktecture’s new release of WSCF (WsContractFirst). As soon as I get two minutes to breathe (i.e. after Xmas), I’ll take a proper look at this, but for now it looks pretty good.

Debatching Options and Performance in BizTalk Server 2004

Download This Article and Sample Code Here: Debatching Options and Performance Considerations in BizTalk 2004 White Paper and Sample Code



Related Sample: Xml Envelope Debatching




In some business scenarios you may be required to receive a batch file that must be broken up and processed as single messages by BizTalk Server 2004.  This could allow for individual record level processing, selective content based routing, or single message mapping. 




General Debatching Design Considerations


Here are some general design considerations you might want to think about when planning a debatching approach with BizTalk Server 2004. 



General Debatching Design Considerations



  • Header and Trailer Validation Requirements for Flat Files
  • All or Nothing Debatching

    • Should single records be allowed to fail some type of validation, mapping, or routing

  • Mapping Required

    • As a whole message
    • As single messages

  • Last Message Indicator Required

    • Is another process depending on knowing when the last message finished

  • Ordered Message Processing
  • Time of Day Processing

    • Will this affect other processes running

  • File Size (less important than in past versions of BizTalk)
  • Record Volume


Although this article is focused on general debatching options, I want to go into more detail on some of the design considerations above.



 


Header and Trailer Validation


Typically batch files are received in a format other than XML, such as flat file.  In this case, the file will typically have a Header and Trailer record.  These records typically contain information about the number of records in the file, date, time, etc.  In some business processes this information needs to be validated prior to processing the file.  This creates some interesting design challenges.



Some options for this type of validation include a pre-debatching map, validation of the file using .net, or validation inside an Orchestration.  The best option depends on message size since some batch files can be very large (I consider very large as greater than half a gigabyte as XML).



Last Message Indicator Required


Debatching implies breaking up a file into many different pieces.  Sometimes, the single items must still behave as a batch.  Some business processes require knowing when the last message in the batch has been processed to activate another process.  In addition, sometimes ordered processing of the debatching messages is required. 



Ordered Message Processing


Ordered processing of the messages can be accomplished in a few ways.  One way is to use an ordered delivery supported adapter, like MSMQt.  This would require the debatcher to write the messages in the correct order to the queue for processing.  This may also require the use of a convoy to route all the single messages to the same business process. The challenge is to allow for ordered delivery without significantly affecting performance.




BizTalk 2004 Debatching Options


BizTalk 2004 provides us with several different methods for batch file debatching.  What is the best way to split up your files?  As always, that depends on your exact business scenario. 



In this posting, I will look at four different debatching options, review the performance data, and explain the benefits of each type.  I also have the test Orchestrations I used available for download.  I do not provide any sample files, but you can make your own since the Orchestrations use untyped messages.  Just make your structure like: DataData.



The four methods I will cover are:



  • Receive Port Pipeline Debatching
  • Orchestration XPath Debatching
  • Orchestration Atomic Scope Node List Debatching
  • Orchestration Outside .Net Component Debatching



Debatching Options Performance Test Results


Here are the results of the performance tests on each type of debatching.  Tests were run on a 2.4 GHz desktop with 1.25GB RAM.  The sample file produced single records that were 3 KB each.  No mapping was done on the files and times do not include time to send the files using a Send Port.  This is just the time to run the pipeline or orchestrations.  Throughput is intended to show a general idea of the amount of data running through the process; it is not the overall system throughput.  Additional tests were run for XPath and Node List that produced larger output files of 29.9 KB and 299.0 KB. 














































































































































































Type


XML Size (MB)


# Msg


Time (Sec)


Msg/Sec


Msg Size (KB)


Throughput (KB/sec)


Receive Port


1.6


500


8


62.5


3.0


205


Receive Port


3.6


1100


14


78.6


3.0


263


Receive Port


7.2


2200


34


64.7


3.0


217


Receive Port


18.1


5500


59


93.2


3.0


314


Receive Port


128.6


38500


603


63.8


3.0


218


XPath


1.6


500


121


4.1


3.0


14


XPath


3.6


1100


200


5.5


3.0


18


XPath


7.2


2200


667


3.3


3.0


11


XPath


18.1


5500


3077


1.8


3.0


6


Node List


1.6


500


9


55.6


3.0


182


Node List


3.6


1100


21


52.4


3.0


176


Node List


7.2


2200


30


73.3


3.0


246


Node List


18.1


5500


225


24.4


3.0


82


Node List


54.3


16500


1460


11.3


3.0


38


Node List


128.6


38500


15256


2.5


3.0


9


.Net Call


1.6


500


49


10.2


3.0


33


.Net Call


3.6


1100


220


5.0


3.0


17


.Net Call


7.2


2200


663


3.3


3.0


11


.Net Call


18.1


5500


3428


1.6


3.0


5


.Net Call


54.3


16500


27000


0.6


3.0


2























































Type


XML Size (MB)


# Msg


Time (Sec)


Msg/Sec


Msg Size (KB)


Throughput (KB/sec)


XPath


12


400


232


1.7


29.9


53


XPath


35.9


1200


870


1.4


29.9


42


Node List


12


400


10


40.0


29.9


1229


Node List


35.9


1200


28


42.9


29.9


1313


Node List


107.7


3600


128


28.1


29.9


862































































Type


XML Size (MB)


# Msg


Time (Sec)


Msg/Sec


Msg Size (KB)


Throughput (KB/sec)


XPath


14.9


50


40


1.3


299.0


381


XPath


59.6


200


430


0.5


299.0


142


XPath


119.2


400


1849


0.2


299.0


66


Node List


14.9


50


8


6.3


299.0


1907


Node List


59.6


200


27


7.4


299.0


2260


Node List


119.2


400


126


3.2


299.0


969





Debatching Options Detailed Overview



Receive Port Pipeline Debatching – A.K.A. Envelope Processing 


This type of debatching requires defining an envelope as the basic structure of your message.  This is handled a little differently depending on if your input is XML or Flat File.  For native XML messages, you must define a Body node that will be broken out by the pipeline component.  When receiving Flat Files, life is easier since you have control over the final schema structure to be broken out.



Using this type of debatching, it will not be possible to determine when all of the large messages have been sent or processed without considerable effort (i.e. like using a convoy that will degrade performance). 



For more information and a sample of this type of debatching please see my past post on this topic.



Pros: Fast! Fast! Fast!  I am guessing this is because it is all streaming and uses “streaming XPath”.  Great for messaging only solutions that require content based routing of single messages.



Cons: All or nothing debatching in the pipeline.  Your XML schema must have a top level node for the envelope to strip out the single messages under it.  Since the message is not persisted to the database until after the pipeline and map, if something fails in the pipeline or map the entire message will be lost.  In addition, I always have a hard time getting envelopes to work correctly.  I think that is just user error on my part. 



Mapping: Maps are applied to the single messages.  If one fails, the whole batch is failed.  This limits your flexibility.




 


Orchestration XPath Debatching – (Best Bet!)


This is my favorite type of debatching.  This method of debatching comes from Darren Jefford’s Blog.  I like it the best because it provides the most control over the process.  I know exactly when the last message has finished.  This would be useful if you are using this type of debatching to make a .net call or web service submission for each debatched message inside the loop.  Just remember this will lower the performance and you will be running sequentially.



I was shocked at the relatively poor performance of this debatching.  When I was testing smaller files, under 100 single messages, I was getting 10+ messages per second. 



Even with the slower performance at higher output message sizes, this is still my preferred method of debatching when message size permits.  Simple reasons: control and manageability!



Just be careful, I ran a 128 MB file through this and after 1 hour I only had 1500 message out.  I think the slower performance is from the XPath call itself inside the loop.  I think it is rescanning the whole message each time I run the loop.



Pros: Excellent flexibility inside the Orchestration.  Great control of the processing of your document!  This process is sequential and ordered by default.  Ability to loop over anything you can XPath and can easily (this is always a relative term) build mini-batches if needed.  This is something Receive Pipelines are not able to do without extensive custom code.



Cons: Performance degrades quickly as message size increases.  Downright slow on large messages and a resource hog.  In some cases, the sequential and ordered processing of this type of debatching may be limiting to the process.



Mapping: Complete flexibility.  Map before the loop on the whole message or inside the loop as single messages.  Inside the loop, single items can fail mapping and if handled correctly will not stop document processing.




Orchestration Atomic Scope Node List Debatching


This is conceptually a combination of the XPath and Receive Port Debatching. You have the flexibility to loop around anything you can XPath but your process is all or nothing.  This must be done inside an Atomic Scope shape since Node List is not serializable. 



This type of debatching seems to be more sensitive to output message size rather than input message size.  That would make sense, since the smaller the message the more messages the Atomic Transaction will be building up. 



To accomplish this debatching, I set up a Node List and an Enumerator inside the Orchestration.  Then, I use MoveNext inside a loop to extract out the single message for the current node.  This involved casting the object to a Node and getting the value using OuterText.  For complete details, see the samples provided.



Pros: Fast! In some cases, the Atomic Scope may be beneficial.



Cons: All or nothing debatching since you must use an Atomic Scope shape.  Massive resource hog!  My CPU was maxed at 100% the entire time the process was running.  In addition, this seems to tax SQL Server.  After running some processes it needed to be restarted or the computer just would not do anything.



In one test, the process ran for over 8 hours maxing out the CPU the whole time just to have something fail and it all roll back.


 


Mapping: Map before the loop on the whole message or inside the loop as single messages.




Orchestration Outside .Net Component Debatching


This debatching uses an outside .net component to break up the message.  The thought here is that the message will not be scanned for each loop.  As the performance numbers show, using an outside .net component did not increase performance. 



Inside the Orchestration, I created a new instance of a helper class and passed in the input XML message.  Then, I looped over the document using a Node List.  I returned items based on an index I passed in from a loop shape inside an Orchestration.  The performance of this is nearly identical to the XPath method. 



Are there any better ways to do it?  I looked into using XmlNodeReader inside the .net component but I did not try to get it to work.


 


Pros: None.  I would not recommend this approach.



Cons: Slow. This adds an additional component to maintain outside of BizTalk.



Mapping: Mapping can be done on the whole message or on single messages.



 


 


Conclusion


BizTalk Server 2004 offers a variety of options for breaking up large files for processing.  An evaluation of your exact business requirements will help you decide on the option that is best for you. 



I welcome questions, comments, feedback, and other ways to debatch large messages!  Just drop me an email.

Mapping to an Envelope Schema

A few days ago, I blogged about splitting documents.


It seems that there’s a bug in BizTalk that means this approach won’t work. The typical pattern when defining an envelope schema is to define your repeating node as one schema, and xs:import it to your envelope schema (for details, see Jan Tielen’s excellent blog entry).


Unfortunately, if you create a map that maps to your envelope schema (for example in my case because the envelope schema is also the canonical batched data format), then that map is unable to execute in a receive port (and presumably a send port, although I haven’t checked). When a document is received, you will get the following error logged in the Application log:


Document type “uri:my-uri#Customer” does not match any of the given schemas.
(where uri:my-uri#Customer is the root node of the xs:imported schema).


Hugo Rodger-Brown also ran into this at the same time as me so we’re both left scratching our heads.


I can’t help feeling that this is somehow related to Scott Colestock’s old post where he notes that, unless you have a fully qualified assembly name, a pipeline will fail to load your schema because BTS simply doesn’t know which assembly to search for the type.


Obviously, receive port maps are run after the pipeline, but I still feel it’s all relating to the same kind of issue. If you open up the source of a .btm file, you’ll see that the node only references the type of the destination schema – it doesn’t use a fully qualified name. Unfortunately, editing the map source to use a FQN won’t work (the mapper shows an error in the designer).


So, in summary it looks like you simply can’t map to a schema that references other schema from within a receive (& send?) port. Note that it works fine from within an Orchestration – probably because the Orchestration has a reference to the correct assembly, and so it can resolve the type name successfully.