A few weeks back I worked on a process that looked something like this –

It was triggered by the scheduled task adapter and then used a SQL send port to call SP to return list of ’things’.
It needed to split the things in the list to individual records, and to start a new, different, process, through pub/sub (to avoid the binary dependency with the called process), for each ’thing’.

Fairly simple.

A lot of have been said on the different ways to split messages, I won’t repeat this discussion here; I would just say that initially I used a different approach – I used the SQL adapter in the initial, triggering, receive port and then used a receive pipeline, with an XmlDisassembler component, to split the incoming message so that each record was published individually thus avoiding the need to have a ’master process’; that back fired though, in my case – I quickly realised I’ll be choking the server with the amount of messages published and needed a way to throttle the execution; I’ve played a bit with host throttling but then came to the conclusion the best approach for me would be to throttle in a process, which is what I’ve done.

And so – to make things interesting, and because I already had it all ready – I decided to use a call to a pipeline from my process to split the message.

The first thing I realised, trying to take that approach, was that I had to change type of the response message received from the SQL port to be XmlDocument (which is an approach I generally dislike – I’m a sucker for strongly-typed-everything) – but my schema was configured as an envelope so that when I call the pipeline from my process it knows how to split it correctly, but, when used in the SQL port BizTalk split the message too early for me – I needed to whole message in the process first, which was no good to me; if , however, I removed the envelope definition from the schema when I would call the pipeline directly from my process it won’t know how to split the message, which is no good either; nor could i have two schemas (BizTalk, as we all know, dones’t like that bit at all, not without even more configuration); XmlDocument it is.

It then came back to me (in the form of a compile time error :-)) that the pipeline variable has to exist in an atomic scope, and so I added one to contain my pipeline variable; I then added the necessary loop with the condition set to the GetNext() method of the pipeline and in each iteration constructed a message using the GetCurrent() method; all standard stuff.

I would then set some context properties to route my message correctly and allow me to correlate the responses (I used a scatter-gather pattern in my master process) and published it to the message box

What I noticed when testing my shiny new process was that all those sub-processes that were meant to start as a result the published messages in my loop were delayed by quite a few minutes (6-8), which seemed completely unreasonable, so I embarked on a troubleshooting exercise which resulting in that big “I should have thought of that!” moment.

While the send shape in my loop successfully completed its act of publishing the message in each iteration, moving my loop to the next message and so on, being in an atomic scope BizTalk would not commit the newly published messages to the message box database, allowing subscriptions to kick in, before the atomic scope would finish; that is to allow it to rollback should something in the atomic scope would fail.
What it meant for me though, was that all the messages were still effectively published at once, which brought me back to square one (or, minus one, actually, considering that the great delay caused my this approach means I’m even worse off from my first debatch-in-pipeline approach).

And so I went back to the old and familiar approach of splitting the messages using xpath in the process, which allowed me to carefully control the publishing rate of messages for my process and throttle them as needed.