by community-syndication | Jun 6, 2005 | BizTalk Community Blogs via Syndication
Here’s the powerpoint decks we used for some of the extended sessions. Some of these have added sections on all the perf tuning recommendations we layed out. There’s quite a bit that you have to consider when working in a farm scenario. Also, you’ll find some best practice around flat file, and the Flat File project I walked you through building. In all you’ll find the following:
1. Advanced Orchestration
2. Rules Engine
3. Building Pipelines
4. Messaging and Flat File best practice
5. Installation of farms and best practice and tuning parameters
Here are the decks, and here is the project we walked through to demonstrate parsing Flat Files…
by community-syndication | Jun 6, 2005 | BizTalk Community Blogs via Syndication
Messaging-Database Aggregator Pattern
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
al
al
2
282
2005-06-06T16:37:00Z
2005-06-06T16:37:00Z
1
1651
9414
Know IT
78
18
11561
9.3821
21
It’s been a
while since I blogged an article, and a long while since I blogged about the
“Sequential Convoy Aggregator” pattern. Back then I was fairly new to BizTalk,
I was aware of persistence points, and orchestration state, but did not have a
really clear understanding of how badly they could affect an orchestration
design.
Since then
I have been thinking about this more, and wondering how well my aggregator
implementation would stand up in a high load scenario. After running a few
tests, I discovered the design pattern does not scale well to aggregate large
batches of messages. The problem then was to come up with a design that would
Pattern Implementation
Using an Orchestration
The
sequential convoy aggregator pattern is implemented by building a sequential
convoy orchestration with two receive shapes correlated to receive all the
messages that are to be aggregated. The orchestration maintains the state of
the aggregation, by creating a .NET XmlDocument object, and adding each message
as a node to this document. When the aggregation is complete, the XmlDocument
object is assigned to a BizTalk message, which is then sent from the
orchestration to a subscribing send port.
Sequential Convoy
Aggregator Performance
The
performance tests are using the aggregator pattern I created and blogged about
last August. The orchestration was modified slightly to use a message count for
the completeness criteria so it could be tested with different numbers of
messages. The tests were then run, starting with 100 messages then increasing
the size of the messages being aggregated.
After each
test, the “Backup BizTalk Server” SQL Server Agent job was manually started to
create log backups for the processing of the message aggregation. The results
of the test are shown below.
|
Msg
Count
|
Log Size /KB
|
KB/Msg
|
|
100
|
7,038
|
70.38
|
|
200
|
16,446
|
82.23
|
|
300
|
29,999
|
100.00
|
|
400
|
47,703
|
119.26
|
|
600
|
96,976
|
161.63
|
|
800
|
171,584
|
214.48
|
|
1,000
|
262,825
|
262.83
|
|
Msg
Count
|
The
number or messages used in the test.
|
|
Log
Size /KB
|
The size
of the message box database file produced by running the Backup BizTalk
Server SQL Agent job after the test.
|
|
KB/Msg
|
The size
of the log file divided by the number of messages.
|
From the
results of the tests, it can be seen that the size of the message box database
log increases dramatically as the number of messages being aggregated
increases. As the log file is a log of the database activity on the message box
database, this implies that the aggregator implementation does not scale well
as the number of messages increases.
If there
were thousands of messages to aggregate over a 24-hour period for example, the
size of the message box database log files would become a significant problem.
The Problem or Persistence
In order to
provide reliable messaging, the BizTalk Server orchestration engine saves its
state to the message box database at specific points during it’s execution.
This serialization of the message state is known as persistence, and each
saving of the orchestration state is known as a persistence point. Whilst these
orchestration persistence is great from the reliability perspective, it can
have some detrimental effects in terms of performance if orchestrations are
designed without paying a thought to the mechanics of the orchestration engine.
In the
aggregator pattern, as the orchestration starts to aggregate messages, the size
of the XmlDocument object that aggregated the messages is small. At each
persistence point, this XmlDocument object is serialized to the message box
database to allow recovery form a possible failure. As the orchestration
aggregates more and more messages, the size of the XmlDocument object
increases.
This object
is persisted to the database (three times for each message aggregation loop in this
case), every time a message is received, so when the orchestration is holding
the stare of a few hundred messages, the load on the message box database
starts to get heavy. If the patterns is used to aggregate thousands of
messages, and the messages themselves are large, the pattern will quickly
become unusable.
Building a Scalable
Aggregator
The problem
affecting the scalability of the aggregator was caused by maintaining the state
of the aggregated message as the orchestration looped to receive the other
messages in the convoy. If a way can be found to escape the state persistence
problem, it should be possible to design a scaleable aggregator pattern.
Aggregator Design Using
Messaging and a Database
One of the
possible designs would be to abandon the convoy orchestration, and create a
solution that was based purely on messaging. At present, in BizTalk Server
2004, there is no way to aggregate messages in the pipeline of a send port in a
similar manor to splitting in a receive pipeline. This pattern uses a SQL
Server database to store the messages as they are aggregated, and uses the SQL
Server adapter to add the orders to the database, and to receive the aggregated
list when the aggregation is complete. The completeness criteria is determined
in the database, and in this case, a message count is used.
Messaging-Database
Aggregator Pattern
|
In File
Drop
|
The order
messages to be aggregated are dropped here.
|
|
Order
In Receive Port
|
The
receive port receives the order messages and publishes them in the message
box database.
|
|
Add Order
SQL Send Port
(Map
Order to SQL)
|
Port with
subscription to the Order In Receive Port, and map to map the order messages to the Add Order stored
procedure. The port is configured to call the stored procedure using the SQL adapter.
|
|
Add Order
SP
|
Stored
procedure to add the messages to a table in the Order Aggregation Db, also
checks for the completeness criteria, which is based on the number of
messages in the batch to be aggregated. When this number is reached, the
status of the messages is updated to mark them for aggregation.
|
|
Order
Aggregation Db
|
Database
containing the table for aggregated orders.
|
|
Get Order
Aggregation SP
|
Stored
procedure to poll the database for batches ready to be aggregated. When a
batch is ready, the data is returned, and the status flag for the orders
updated.
|
|
Get Order
Aggregation
SQL
Receive Port
|
Receive
port configured to poll the Get Order Aggregation SP to check for aggregation
batches.
|
|
Aggregated
Order Send Port
(Map SQL
to Order List)
|
Send port
with a subscription to Get Order Aggregation
SQL
Receive Port. This port also maps the SQL result set schema to the order list
schema.
|
|
Aggregated
Order File Out
|
File
location for the aggregated messages.
|
Test Results
The tests
were repeated using the messaging-database aggregator pattern, and the
following results obtained. The Messaging log size is a sum of the message box
database log file, and the order aggregation log file.
|
Msg
Count
|
Convoy
Orch
Log
Size /KB
|
Convoy
Orch KB/Msg
|
Messaging
Log Size /KB
|
Messaging
KB/Msg
|
% Log
Size Reduction
|
|
100
|
7,038
|
70.38
|
3,064.00
|
30.64
|
56.46%
|
|
200
|
16,446
|
82.23
|
5,984.00
|
29.92
|
63.61%
|
|
300
|
29,999
|
100.00
|
8,676.00
|
28.92
|
71.08%
|
|
400
|
47,703
|
119.26
|
10,977.00
|
27.44
|
76.99%
|
|
600
|
96,976
|
161.63
|
16,866.00
|
28.11
|
82.61%
|
|
800
|
171,584
|
214.48
|
21,755.00
|
27.19
|
87.32%
|
|
1,000
|
262,825
|
262.83
|
26,593.00
|
26.59
|
89.88%
|
|
Msg
Count
|
The
number or messages used in the test.
|
|
Convoy
Orch
Log Size /KB
|
The size
of the message box database log file produced by the sequential convoy
aggregator orchestration.
|
|
Convoy
Orch KB/Msg
|
The size
of the log file divided by the number of messages.
|
|
Messaging
Log Size /KB
|
The size
of the message box database log file, and the order aggregation database log
file produced by the messaging-database orchestration.
|
|
Messaging
KB/Msg
|
The size
of the log file divided by the number of messages.
|
|
% Log
Size Reduction
|
The
percentage reduction in log file size gained by using the messaging-database
pattern.
|
KB Log File Size for
Aggregation / Message Count
The effect
of the batch size on the sequential convoy orchestration can clearly be seen in
the above graph. As the batch size increases, the messaging-database aggregator
offers significantly better performance.
KB Log File Size per Message
/ Message Count
As the
batch size increases, the messaging-database aggregator becomes slightly more
efficient, producing less log file activity per message. With the orchestration
based aggregator, continues persistence of the increasing aggregation state
makes the pattern less efficient.
Why is the Message Box
Database Log File Size so Important?
The log
files created when performing a transaction log backup on the message box
database are an indication of the load that has been placed on the database
since the last transaction log backup. As a BizTalk server installation is
scaled up, and out, the load on the message box database is often a bottleneck.
Adding another SQL cluster to handle increased message box activity is costly.
It’s much better to design a solution that uses the message box in an efficient
way if there it is anticipated there will be a high load on the system.
Conclusions
From the
results it can be seen that the messaging-database aggregator provides a 90%
reduction in message box database load compared with the sequential convoy
aggregator in this scenario. If the message count were to increase, this design
could perform 20 or ever 100 times better than the orchestration based design,
in a production system, this could result in significant cost reductions. Your
database administrator would also appreciate the difference in log file sizes
used for the backups :-p.
I tried to
keep the tests as scientific as possible, and had planned to include execution
times in the results as well. Unfortunately, Virtual PC is not a great platform
for performance testing, and does not seem to give the speed that running on
bare metal would do, so I have not included those figures. The log file sizes
should provide an accurate indication of performance, (and I did remember to
add the log file sizes for the order aggregation database as well ;-).
I think the
option of using a messaging-database aggregator should be considered when
building an aggregator in BizTalk. Apart from the performance advantages, it
also provides the option to recover better from failure, as the messages are
present in a database, and the aggregation can be triggered manually.
There’s a
few improvements that could be made to the design, I’m not happy with creating
a database table that contains the fields in the schema, with a hierarchical
schema, this would involve multiple tables, and references. It would be much
better to save the XML content of the message in one column.
I’ve herd a
few people discussing the option of using a pure messaging based solution to
improve performance on high load systems, and it would be great to see more
examples of this.
To sum it up
-
Sequential Convoy Aggregators
are fine for handling aggregations with a low number of messages.
-
Sequential Convoy Aggregators
do not scale well with higher message traffic if the aggregation state is
held within the orchestration.
-
Building a messaging-database
aggregator provides a design that scales well with large numbers of messages.
-
It is also possible to retain
the sequential convoy orchestration design, and use a database to store
the aggregation state if the need arises.
by community-syndication | Jun 6, 2005 | BizTalk Community Blogs via Syndication
Hi Everyone…….well…this is a first! I never thought I’d actually post something to my own Blog. I think I set this up 6 months ago or so, but after seeing all the wonderful content already being posted to blogs, I thought to myself, what could I probably add. However, I think this is going to be a great avenue to distribute all the goodies we cover in our training courses, as well as other things of interest…
Last week Richard Seroter and I tried to to kill about 60 people by putting them through the extended 5 day MOC course. What does that mean? 14 to 16 hours a day for 5 days, first going through the the typical 5 day MOC (Microsoft Official Curriculum) lab training (this was done in about 3 days), followed by about 2+ days of optional deep dive training.
This was trial by fire…..as the air conditioning in the Santa Monica office was something to be desired …. 🙂
During the last 2 days, we had everyone build a fully functional BizTalk 2004 Farm with high security. This consisted of setting up all the users and groups on an Active Directory box, and installing BizTalk 2004 on two servers with a remote SQL Server….4 servers in our farm. We then went thru all the Perf Tuning stuff, deploying an application and seeing for ourselves all the cool things you can do around availablity with Host Instances. You can find the performance white paper here, Performance WhitePaper.
After that I think we installed the MSMQT and MSMQ adapter side by side and got those bad boys running and functional. Then it was off to the following:
1. Using the Business Rules Engine API to automate our deployment and execution policies from C#.
2. Creating some custom code so that we could call BizTalk maps dynamically from C#.
3. Configuring App Domains and using config files from within our BizTalk projects.
4. Briefly going over Convoys, which I’ll post some good working samples….
5. Parsing REAL messy flat files that I got a thrill doing on a project once..
6. Developing a custom pipeline component to copy context properties into our messages…
I’ve set up an FTP site which I’ll be posting many of these materials to, code samples and powerpoints….so stay tuned. Apparently I can’t VPN out when logged onto the Microsoft Network….so off to starbucks I go for unfettered connectivity….
by community-syndication | Jun 6, 2005 | BizTalk Community Blogs via Syndication
The following is an extract from a thread at msnusers.com/biztalkserverstuff. Charles Young explains with great clarity the subtle differences in the rules engine parsing of facts expressed using XPath Selector and XPath Field combinations (or “facts” and “slots”);
This causes so much confusion.
The rules engine works by asserting ‘facts’ into its working memory. A ‘fact’ is some object, and therefore contains fields, commonly called ‘slots’ in the world of rule engines. The exception is thrown because the rule engine is attempting to access a slot in a fact in order to evaluate a condition within a rule. The engine is attempting to do this evaluation because the relevant fact has been asserted. However, when it looks for the slot, it can’t find it.
This translates into the world of XML as follows. When you create a vocabulary definition for a node in your schema, there are two properties that are set. These are ‘XPath Selector’ and ‘XPath Field’. Thinks of these as the technical terms that refer to some data item. The vocabulary definition is mapping these to business-friendly terms defined by the Name and Display Name properties.
The XPath Selector is used to define and select a ‘fact’. Vocabulary definitions can refer to a fact, rather than a slot, and in this case the XPath Field property is empty. However, in your case, there will be an additional XPath expression in the XPath Field property. This will typically be used to select a descendant node of the fact. The exception is happening because the ‘fact’ (the programId element) exists (i.e., the XML Selector addresses at least one programId element – if it addresses more than one programId element, multiple facts will be asserted into memory), but the ‘slot’ (the id element) does not.
If the programId element did not exist, no error would occur. Very simply, the engine would not be able to assert the programId fact and would therefore realise that it could not evaluate any rule conditions that depends on this fact. Because the programId element exists, the engine assumes that the id child element exists, and throws an error when it finds it doesn’t.
One way to deal with this is to edit your XPath Selector so that it only selects programId elements that contain an id element. XPath support filters, so you could amend your selector as follows (addition shown in bold):
“/*[local-name()=’SOA_Message’ and namespace-uri()=’http://schemas.test.com/20050307/SoaMessageSchema’]/*[local-name()=’products’ and namespace-uri()=”]/*[local-name()=’clients’ and namespace-uri()=”]/*[local-name()=’client’ and namespace-uri()=”]/*[local-name()=’programs’ and namespace-uri()=”]/*[local-name()=’program’ and namespace-uri()=”]/*[local-name()=’programId’ and namespace-uri()=”][id]”
You might want to further improve on this with the following filter [id = “”]. This would only select programId nodes as facts if they contain an id child element with a non-empty text node.
The key to running the rules engine effectively over XML is to understand XPath and the difference between ‘facts’ and ‘slots’, and to edit your XPath Selectors and XPath Fields accordingly to meet your needs.
by community-syndication | Jun 1, 2005 | BizTalk Community Blogs via Syndication
BizTalk 2006 ships in the first half of 2006
I have read through the articles released by Scott Woodgate and other resources I could find on BizTalk Server 2006. So partly as a learning exercise and to provide a summary to my Boss who was concerned about the state of BAM in BizTalk 2004 I have condensed the articles down into the following bullet pointed summary with some personal opinions. Maybe this will be of use to people who are too time-strapped to read all the articles (if you are a more visual reader I suggest you get hold of the articles lots of pics of the new dialogs etc). Thanks to the BizTalk team for their openness in what’s in the next release it looks like a goodie. To get the original articles go here. I will be adding to this summary as more articles come out.
BizTalk Server 2006 Setup & Migration Microsoft Corporation April 2005
BizTalk Server 2006 Business Activity Monitoring Microsoft Corporation April 2005
BizTalk Server 2006 Developer Tools Improvements Microsoft Corporation April 2005
BizTalk Server 2006 Adapter Enhancements Microsoft Corporation April 2005
Looking Ahead to BizTalk 2006 and Beyond http://www.microsoft-watch.com/article2/0,1995,1751293,00.asp
Latest DBMS & Framework Upgrades
Supports 64 bit server releases
Runs on SQL Server 2005
Support for Visual Studio .Net 2005
Support for Virtual Server 2005
Simplified BizTalk installation and configuration
• Automatic upgrade for BizTalk 2004 implementations there is NO support side by side installation for 2004 and 2006
• Simplified BizTalk 2006 installation (the BizTalk 2004 installation could be difficult if you didn’t follow the 16 pages of instructions and install the supporting components this will be supplied in a single CAB file)
• The prerequisites which will still need to be manually installed first are:
o Windows
o SQL Server 2000 \ 2005
o Visual Studio .Net 2005
• There will the different setup options for different types of users beginners vs seasoned developers
• Simplified BizTalk Database and Services Configuration
• Default Configuration Option for single user account for all BizTalk Server 2006 services i.e. don’t need to enter all the user names and passwords for BAM, HWS, ESSO, Rule Engine etc but the Custom configuration Option is also there if you want to select different users for different services as well i.e. if you are doing a BizTalk 2004 upgrade this may be a good idea.
• Only 4 screens in the configuration process down from ~8.
• More flexibility and less effort required to deploy BizTalk databases \ services over multiple CPUs in a Server Farm
• Setup log file contains more detailed information for trouble shooting install, config issues
Much needed overhaul of Business Activity Monitoring
• BAM Portal web app which exposes BAM activities to the Information worker this provides:
o Customisable branding for the portal in its banner
o An Activity Search page for generating and saving queries for reporting “instance data” and setting alerts i.e. notify me if, when
o An Aggregations page providing pivot table and graphing functionality for a point-in-time snapshot for the health of the business reporting “aggregate data”. You can easily set alerts here too (on values of individual cells in a pivot table) and there is the ability to “drill down” to the individual transactions\message instances. Thus alerts may be set on a single transactional value or the aggregation of many transactional values
o An Alert Manager page to create and edit real time alerts set email addresses, messages and the ability for users to subscribe to those alerts. These alerts are fired on the monitoring of live data
o Ability to create an advanced query for an alert if it cannot be created from query or pivot table data on the activity Search and Aggregation pages
o The BAM Portal is big improvement to the messing around creating SQL Server Analysis Service Cubes & exporting it to Excel it is basically an easy to use query builder with subscribable alert functionality. It takes a step towards making BAM Information Worker user friendly rather than a reliance on the developer to implement and maintain BAM.
• BAM Pipeline Interceptor
o No longer need to write custom pipelines components to call the BAM API a BAM pipeline interceptor is supplied out of the box
o The Tracking Profile Editor (available in BizTalk 2004) has been improved to retrieve the message context and promoted properties from BizTalk message schemas. It enhances the ability to capture BAM data from orchestrations
• Ability to expose all BAM Portal Functionality as a Web Service
o You can create queries for aggregate and instance data, set alerts and retrieve BAM configurations through a BAM Query Service and a BAM Management Service
• BAM Management command line utility
o The input for this utility is the BAM Excel Template that is created by Business Analysts to define events and track data which will provide setup data for the BAM Portal
o Supports change-on-the-fly so in order to make a change you don’t need to un-deploy and redeploy your BAM setup as in BizTalk 2004 which was tricky and could result in BAM data loss
o Can retrieve a list of all BAM activities, views and aggregations
o Alert and subscription management
o Security management
• Excel Improvements
o Installable BAM add-in for Excel which provides a BAM menu which enables you to create import and export Activities and View and query live data
Small additions to the developer user experience
• Flat File Wizard
o A wizard interface to speed up the creation BizTalk XSD Schemas for flat files looks a bit like the MS Excel \ Access Flat File Importers but much more attuned and sophisticated to producing BizTalk schemas with a flat file extension. Many developers must have found this area of BizTalk development difficult so this UI is more or less an add-in however with the help of Google and some patience mastering the art of BizTalk schema development using the Flat File extension isn’t beyond reach
• Simple UI functionality in the Orchestration designer such as:
o Zoomability for dealing with large and complex orchestrations
o Preserving the collapsed and expanded settings on shapes on saving
o Enlarging the set size of the expression window … why not make it resizable
• All in all there is not much change to the developer UI watch this space more changes may be revealed at a later date.
New and updated Adapters
• New Adapters
o POP3 Adapter for receiving email messages and attachments BizTalk 2004 shipped with a SMTP adapter for sending email. There are a few of these already freely available on the net
o Windows SharePoint Services Adapter for accessing documents stored in SharePoint libraries through polling for a document in a SharePoint directory or folder through a SharePoint web method. This may utilise a SharePoint view to filter the messages available to be received by the adapter. Sending a document to SharePoint is the reverse. Upon first glance this adapter is extremely configurable and flexible to use.
o MSMQ adapter “out of the box” at the moment it’s available on the net
o MQSeries Adapter “out of the box” at the moment it’s available on the net now has dynamic receive allowing it to determine at runtime the queue it will receive messages from.
• Updated Adapters
o SMTP Adapter better user experience with the new Compose and Attachments tabs as part of the SMTP Adapter Property Pages Dialog and the ability to send plain text or html emails with multiple attachments. SMIME encryption and digital signing is not included add this through a MIME Encoder Pipeline Component in a send pipeline. New performance counters.
o File Adapter better user experience not so many property pages to click around. Ability to give the Adapter different user credentials when accessing a file in a directory on a remote file share. New advance settings dialog to allow you to rename the files while reading, set a polling interval, set retry counts and intervals on removing files. New performance counters.
o HTTP Adapter new Suspend Failed Requests setting to control whether a request just errors out or whether it can be resumed. New performance counters.
Summary
In creating BizTalk Server 2006 it seems the BizTalk team has focused on the most important improvements to BizTalk 2004 rather than a complete rewrite of the product.
R. Addis