Large Messages and Large Maps in BizTalk


Updated 8/8/2005:
Please note the information below is outdated and I would never recommend mapping messages of this size inside Biztalk 2004.  Biztalk 2006 will have support for larger messages by breaking up the message into chunks.  More information on large message support can be found on the Performance Blog.


 


Original Post:


 


Ok, I hear it all the time: “Can BizTalk handle large files?” 



But, really the question should be: “How large of a file do you want BizTalk to process?”



Large is a subjective term.  For the purposes of this post, large is a 150 MB XML Document.  I see lots of blogs with negative posts on big messages in BizTalk 2004.  My experiences have been quite the opposite.  But, I’ll let you make the determination yourself.  Here are my test results.



My Test Computer: P4 2.4 GHz with 1.25 GB Ram.


 


CRITICAL: Working with large files such as this will consume a HUGE amount of system resources.  My CPU was maxed at 100% and RAM utilization was extensive.  A better approach is to break the large file down, but that is a subject for another post coming soon…



Sample File Size: 180 MB


Schema Nodes: Around 400


All Mapping on Send Port, map size is around 1.2 MB



Test1: Straight line mapping to itself


Run Time: 7 minutes 13 seconds



Test2: Straight line mapping to itself


Functoids: Around 80 String Manipulation


Run Time: 8 minutes 26 seconds



Test3: Straight line mapping to itself


Functoids (over 250):


          Around 80 String Manipulation


          Around 50 Logical


          Around 50 Value Mapping


          Around 80 Scripting with calls to an external assembly


Run Time: 9 minutes 42 seconds



Take Away: I was surprised and impressed with the processing time of large files inside BizTalk.  But, I’ll let you make the final determination…

About XML Schema’s determism requirement…

For those of you who haven’t noticed yet: Dare Obasanjo has published a refined version of the work David Orchard did about a year ago.  Doing so, Dare explains best practices in designing an extensible xml schema.  Writing schemas that are both forward and backward compatible is not easy, believe me.  Even an experienced schema author can be tricked by some of the requirements the xml schema spec needs you to comply with.

One of the most import things to be aware of when designing xml schemas is the requirement on the xml content model to be deterministic.  Some people like to refer to this as the “Unique Particle Attribution Constraint”.  For the normative definition of this constraint, I’d like to refer to the W3C.  No one could explain this so fuzzy as they can!!  MSDN does certainly a better job 🙂

Even the very first xml spec itself has recommendations about this in a non normative section.  For compatibility reasons with SGML. In addition section 3.2.1 in this specification says: “For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model.”

As for the question “why”… even Tim Ewald, recently wondered why exactly xml schema had this requirement in the first place…  I can only guess… but it certainly makes writing an xml parser way easier since only one single lookahead symbol is required… (While otherwise some form of backtracking is needed, which could mean an enormous perf hit!)

.NET is a good xml citizen and requires you to comply with this constraint.  If you don’t, you won’t even be able to validate your schema.  Good work guys!  As close to the standards as possible!

Could someone explain to me then, why it is that “the world’s leading product family of XML development tools” – laughing out loud now – doesn’t even *support* detection of this constraint!!!  .NET, BizTalk Server, Word and InfoPath certainly do a better job!  It’s not because of Xerces supporting this kind of bad schema’s that they have an excuse not to implement at least a check for this critical type of content model requirements!  Even worse, I came across this post in their public FAQ.  They publicly state that the detection of a non-deterministic model as an error would be wrong!  Pfffff… so far for the standards.

Share this post: Email it! | bookmark it! | digg it! | reddit!

Querying the WMI MSBTS_MessageInstance class

As I suppose not everyone reading my blog is reading the newsgroups on a daily basis, here’s a highlight.  Today, a question came up regarding the querying of the MSBTS_MessageInstance class. 


Let’s summarise what is possible with this WMI class:



  • Retreiving all message instances that are currently available in the message box.
  • Retreiving all message instances that comply with certain conditions and are currently available in the message box.
  • Saving any of those messages to the file system.

What is not possible with this class?



  • Saving or retreiving a message that is tracked.  Tracked messages are not handled in the same way as messages that are still resided in the message box.  If you want to save tracked message instances, you need to use the MSBTS_TrackedMessageInstance WMI class.  Please keep in mind, when doing this, that this class does *not* support enumeration.  You will need to use the MessageInstanceID in order to create an instance of it!

In addition, remember that it is not possible to use the MSBTS_MessageInstance class to make select queries based upon the content of the message context!  Why?  Message context is something dynamic and is not compiled into the WMI classes.  The message context is accessible on the class as an XML string though…  Making selects is only possible using the WMI properties on this class.  Like: ServiceName, ServiceInstanceStatus, ServiceInstanceID, ServiceClassId, ServiceClass…


If you’d like to see this class in action using real life code – please check out my pet project: the BizTalk Server 2004 Tracking Playground.


Happy coding!

Share this post: Email it! | bookmark it! | digg it! | reddit!

Working with Untyped Send and Receive Ports in BizTalk Orchestrations

Scott posted a message back in June about untyped receive ports inside on Orchestration.  Untyped Messages means you receive or send a message as System.Xml.XmlDocument rather then a specific schema type.  This is an incredibly powerful feature in BizTalk 2004!



Business Case:


Lets say you want to have one Orchestration (i.e. a single business process) to process many different types of messages all in the same way.  Additionally, say you want to change some values in your message and send all of your messages out through the same Send Shape (i.e. send messages as Xml.XmlDocument).



Lets say you can have a Book Review and a Movie Review.  You want them both to be processed in the same way though the same Orchestration, maybe send information to an outside web service or something like that.  Additionally, you have to extract out promoted properties of the message inside the Orchestration and make decisions inside the Orchestration on them.  This can not be done with Xml.XmlDocument since the Orchestration will not allow you to access them (they are in the message context, just you can not get at them). 



Confused?  How about we look at a sample.



DOWNLOAD: Get the sample here!



Set-up is easy, just unzip the SampleProperties folder and put it on your C: drive.  Then, build and deploy the SampleProperties project. I use early binding so the send and receive ports will be created for you. 


To run the sample, drop the 4 Start….xml message into c:\SampleProperties\In.  You will get 4 messages in your Out folder.  Plus, 4 events will be written to your event log.  Do not forget to look inside the expressions shapes inside the Orchestration for comments.  If all else fails, read the ReadMe.txt file.



Key Take Home Points:


– Common properties must be promoted in all of the different schemas (look at the promotions in both BookReview.xsd and MovieReview.xsd)


– Messages for type XmlDocument can be cast into different types


– Typed messages can be cast back into XmlDocument


– Blank Schema can be created and properties changed (look at Movie Review branch)


– This is using Property Promotion and Demotion in the XML Pipelines


– As a test, set the Send Pipeline to Pass Through and see the difference in the Movie Review output data.


– This will also work for distinguished fields



CRITICAL: This process is kind of risky since you can pass in any XML Document.  You can end up with an invalid cast exception or XPath query not valid.



What is this Movie Review Branch inside the Orchestration really doing?



  1. Taking in a message of type Xml.XmlDocument
  2. Casts the In message to CastMovieIn (strongly typed to a schema)
  3. Sets Orchestration variables based on promoted properties inside the MovieReview schema
  4. Creates a new XML Document (CastMovieOut) and loads a blank schema for that type
  5. Changes values inside CastMovieOut
  6. Creates a new message, Out, as XmlDocument
  7. Casts the CastMoveOut to Out


Take Away: Using XmlDocument can greatly increase the flexibility inside your Orchestration.

Untyped Documents Inside An Orchestration

Untyped messages inside an Orchestration allow for many different types of messages to be received by the same Orchestration using System.Xml.XmlDocument. This sample shows basic use of this feature.

Key Take Home Points:
– Common properties must be promoted in all of the different schemas (look at the promotions in both BookReview.xsd and MovieReview.xsd)
– Messages for type XmlDocument can be cast into different types
– Typed messages can be cast back into XmlDocument
– Blank Schema can be created and properties changed (look at Movie Review branch)
– This is using Property Promotion and Demotion in the XML Pipelines
– As a test, set the Send Pipeline to Pass Through and see the difference in the Movie Review output data.
– This will also work for distinguished fields

Get more information from the original blog post on this topic: https://www.biztalkgurus.com/biztalk_server/biztalk_blogs/b/biztalk/archive/2004/08/10/working-with-untyped-send-and-receive-ports-in-biztalk-orchestrations.aspx

Bug in CustomPartyResolution SDK Pipeline Sample…

Owen found a bug today in the custom party resolution SDK pipeline component located in @ $:\Program Files\Microsoft BizTalk Server 2004\SDK\Samples\Pipelines\CustomPartyResolution. 


 


The bug is in the Read() method in the PartyResolutionStream class:


 


public override int Read(byte[] buffer, int offset, int count)
{
      int ReturnValue = mBaseStream.Read(buffer, offset, count);


if(mFirstRead)
      {
            mFirstRead = false;
            if(mFirstReadCallback != null)
                  mFirstReadCallback();
      }
      if(ReturnValue != 0)
            if(mReadCallback != null)
                  mReadCallback();
      else if(mLastReadCallback != null)
            mLastReadCallback();


      return ReturnValue;
  }


For those of you not familiar with the sample, it demonstrates how to implement an eventing stream to help process message data in a streaming fashion, one reason why this is important is because properties are only guarenteed to be promoted once the stream has been read in its entirety. If you need to read a a particular property that the dissassembler is going to promote, your down stream pipeline component will need to wait until the end of stream event has been fired in order to guarentee the dissassembler has promoted it. I’ll drill down to this in more detail sometime soon.


The problem with the sample code is that mLastReadCallback delegate gets called even when the end of the stream hasn’t been hit. This is because the last else in the code was being matched against the preceeding if. The fix is pretty trivial, {}’s need to be added to the last if, this is a great example of why constructs such as if should always use {} even if they are only on a single line:


 


public override int Read(byte[] buffer, int offset, int count)
{
      int ReturnValue = mBaseStream.Read(buffer, offset, count);


      if(mFirstRead)
      {
            mFirstRead = false;
            if(mFirstReadCallback != null


            {
                  mFirstReadCallback(); 


            }
      }
      if(ReturnValue != 0)
      {
            if(mReadCallback != null


            {
                  mReadCallback(); 


            }
      }
      else if(mLastReadCallback != null)


      { 
            mLastReadCallback();


      }


      return ReturnValue;


}


 


Also, I’m in the process of cleaning up the sample code for part 2 of ACK’s/NACK’s and will hopefully get that posted in the not too distant future.


 

Okay, so where do subscriptions come from?

So where do subscriptions come from or more importantly when do they appear. As we described in our earlier post, there are two types of subscriptions, activation subscriptions and instance (correlation) subscriptions. All activation subscriptions are created by admin tools like BizTalk Explorer or BizTalk Admin MMC. It does not make sense for the engine to create an activation subscription. (there is one exception to this and that is our caching service, but that is really not so important I just like like to through excessive amounts of detail at you). So these activation subscriptions are created when you do enlistment of your services (sendports and schedules). When you enlist a service, its activation subscription is put in the stopped state. This simply means that all messages routed via this subscription are sent immediately to the “suspendedQ”. Why is this state usefull? What happens if your backend database is down and so you can’t send any messages to it. Technically, you could just let them keep failing and throwing errors and wasting your CPU cycles and filling your event log. Or you can stop your sendport and start it back up when the back-end system is backup. When you start a service, the subscription goes active. That means all messsages which are routed via this subscripton are sent to the “workQ” where they can be dequeued as soon as someone is available to process them. Also, when you start a service, we automatically resume any messages which were suspended because the service was previously stopped. Hence you can do what I described above. You could have a service running, the backend system goes down, you can stop the service, causing all messages to be routed to the suspended queue, and when the backend system is backup, you restart the service. Unenlisting a service causes the subscription to “go away”. In most cases, this means we delete it, but in all cases, this means that our no messages will be routed via this subscription. So when you are enlisting, stopping, starting, and unelisting your services (sendports and schedules) what you are really doing is playing around with the state of their activation subscriptions. It is really as simple as that.


Instance subscriptions are always generated by the runtime. Instance subscriptions are created by orchestrations which have a non-activation recieve on some correlated property. I can tell you how it works now in the runtime, but you should know that this is no guarantee as to how it will work in the future and really, it doesn’t matter to you as long as it works. Instance subscriptions are created at the next persistence point after all correlation sets involved in the subscription have been initialized. If you have an activation receive which initializes correlation set C1, and a subsequent send (S1) which follows C1 followed by two more receives on C1, the two  subscriptions would be created when we send the message out on S1 (assuming no internmediate persistence points like a random atomic scope). To go into more details on this, I really need to describe what a persistence point is (assuming you did not see my tech-ed talk which talked in detail about these). I guess I will have to save that for next time.


Note that convoys are as always a bit wierd and you should read the post on convoys to understand how their subscriptions are generated.


 


Have a good one.


Lee

BizTalk Delivery Notification and NACK Sample

Did anyone read Kevin Smith’s blog on ACK/NACK and run out to try it?  Well, I did and found it a more time consuming then I expected.  I have put together a sample that shows how to catch the SOAP exception and get access to the error message.  I hope that after looking at this sample Kevin’s excellent post on NACK’s will make a little more sense.  It did for me.

DOWNLOAD: Get the sample here!

Set-up is easy, just unzip the SampleNACK folder and put it on your C: drive.  Then, build and deploy the SampleNACK project. You will need to manually create a Send Port.  I set up a File Send Port going to c:\some_location_that_does_not_exist.  Set retries to 0.  To run it, drop the StartFile.xml message into c:\SampleNACK\In.  Your NACK will show up in c:\SampleNACK\Out.  Do not forget to look inside the expressions shapes inside the Orchestration for comments.  If all else fails, read the ReadMe.txt file. 

CRITICAL: Getting at the HTTP error using Delivery Notification will not work for transport type of HTTP or SOAP due to an “issue”.  For more information please see Microsoft KB840008.

Key Take Home Points:
– Delivery Notification is not available on Early Bound Ports.
– Must import System.Web.Services to cast the SOAP exception
– Set Send Port retries to 0
– Must use a Synchronized scope

Under the covers:
Ok, so you ask what is BizTalk doing inside the little Delivery Notification property? Keep in mind this is my interpretation. 

When a message is sent through a send port with Delivery Notification set to transmitted, a correlation set is initialized.  A correlation token is assigned to the outbound message.  A subscription is started based on this correlation token.  This token is stored in the context property of the ACK/NACK and promoted.  When the ACK/NACK is returned, it is routed back to the calling Orchestration.  You get all this just by setting a little property to “Transmitted”!  Cool!

Take Away: Delivery Notification is an easy way to catch exceptions as long as you understand how to get at the error message.