Messaging-Database Aggregator Pattern

v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}

al
al
2
282
2005-06-06T16:37:00Z
2005-06-06T16:37:00Z
1
1651
9414
Know IT
78
18
11561
9.3821

21

 

It’s been a
while since I blogged an article, and a long while since I blogged about the
“Sequential Convoy Aggregator” pattern. Back then I was fairly new to BizTalk,
I was aware of persistence points, and orchestration state, but did not have a
really clear understanding of how badly they could affect an orchestration
design.

 

Since then
I have been thinking about this more, and wondering how well my aggregator
implementation would stand up in a high load scenario. After running a few
tests, I discovered the design pattern does not scale well to aggregate large
batches of messages. The problem then was to come up with a design that would

 

Pattern Implementation
Using an Orchestration

The
sequential convoy aggregator pattern is implemented by building a sequential
convoy orchestration with two receive shapes correlated to receive all the
messages that are to be aggregated. The orchestration maintains the state of
the aggregation, by creating a .NET XmlDocument object, and adding each message
as a node to this document. When the aggregation is complete, the XmlDocument
object is assigned to a BizTalk message, which is then sent from the
orchestration to a subscribing send port.

 

Sequential Convoy
Aggregator Performance

The
performance tests are using the aggregator pattern I created and blogged about
last August. The orchestration was modified slightly to use a message count for
the completeness criteria so it could be tested with different numbers of
messages. The tests were then run, starting with 100 messages then increasing
the size of the messages being aggregated.

 

After each
test, the “Backup BizTalk Server” SQL Server Agent job was manually started to
create log backups for the processing of the message aggregation. The results
of the test are shown below.

 

Msg
Count

Log Size /KB

KB/Msg

100

7,038

70.38

200

16,446

82.23

300

29,999

100.00

400

47,703

119.26

600

96,976

161.63

800

171,584

214.48

1,000

262,825

262.83

 

 

Msg
Count

The
number or messages used in the test.

Log
Size /KB

The size
of the message box database file produced by running the Backup BizTalk
Server SQL Agent job after the test.

KB/Msg

The size
of the log file divided by the number of messages.

 

From the
results of the tests, it can be seen that the size of the message box database
log increases dramatically as the number of messages being aggregated
increases. As the log file is a log of the database activity on the message box
database, this implies that the aggregator implementation does not scale well
as the number of messages increases.

 

If there
were thousands of messages to aggregate over a 24-hour period for example, the
size of the message box database log files would become a significant problem.

 

The Problem or Persistence

In order to
provide reliable messaging, the BizTalk Server orchestration engine saves its
state to the message box database at specific points during it’s execution.
This serialization of the message state is known as persistence, and each
saving of the orchestration state is known as a persistence point. Whilst these
orchestration persistence is great from the reliability perspective, it can
have some detrimental effects in terms of performance if orchestrations are
designed without paying a thought to the mechanics of the orchestration engine.

 

In the
aggregator pattern, as the orchestration starts to aggregate messages, the size
of the XmlDocument object that aggregated the messages is small. At each
persistence point, this XmlDocument object is serialized to the message box
database to allow recovery form a possible failure. As the orchestration
aggregates more and more messages, the size of the XmlDocument object
increases.

 

This object
is persisted to the database (three times for each message aggregation loop in this
case), every time a message is received, so when the orchestration is holding
the stare of a few hundred messages, the load on the message box database
starts to get heavy. If the patterns is used to aggregate thousands of
messages, and the messages themselves are large, the pattern will quickly
become unusable.

 

Building a Scalable
Aggregator

The problem
affecting the scalability of the aggregator was caused by maintaining the state
of the aggregated message as the orchestration looped to receive the other
messages in the convoy. If a way can be found to escape the state persistence
problem, it should be possible to design a scaleable aggregator pattern.

 

Aggregator Design Using
Messaging and a Database

One of the
possible designs would be to abandon the convoy orchestration, and create a
solution that was based purely on messaging. At present, in BizTalk Server
2004, there is no way to aggregate messages in the pipeline of a send port in a
similar manor to splitting in a receive pipeline. This pattern uses a SQL
Server database to store the messages as they are aggregated, and uses the SQL
Server adapter to add the orders to the database, and to receive the aggregated
list when the aggregation is complete. The completeness criteria is determined
in the database, and in this case, a message count is used.

 

Messaging-Database
Aggregator Pattern

 


 

In File
Drop

The order
messages to be aggregated are dropped here.

Order
In Receive Port

The
receive port receives the order messages and publishes them in the message
box database.

Add Order
SQL Send Port

(Map
Order to SQL)

Port with
subscription to the Order In Receive Port, and  map to map the order messages to the Add Order stored
procedure. The port is configured to call the stored procedure using the SQL adapter.

Add Order
SP

Stored
procedure to add the messages to a table in the Order Aggregation Db, also
checks for the completeness criteria, which is based on the number of
messages in the batch to be aggregated. When this number is reached, the
status of the messages is updated to mark them for aggregation.

Order
Aggregation Db

Database
containing the table for aggregated orders.

Get Order
Aggregation SP

Stored
procedure to poll the database for batches ready to be aggregated. When a
batch is ready, the data is returned, and the status flag for the orders
updated.

Get Order
Aggregation

SQL
Receive Port

Receive
port configured to poll the Get Order Aggregation SP to check for aggregation
batches.

Aggregated
Order Send Port

(Map SQL
to Order List)

Send port
with a subscription to Get Order Aggregation

SQL
Receive Port. This port also maps the SQL result set schema to the order list
schema.

Aggregated
Order File Out

File
location for the aggregated messages.

 

Test Results

The tests
were repeated using the messaging-database aggregator pattern, and the
following results obtained. The Messaging log size is a sum of the message box
database log file, and the order aggregation log file.

 

Msg
Count

Convoy
Orch

Log
Size /KB

Convoy
Orch KB/Msg

Messaging
Log Size /KB

Messaging
KB/Msg

% Log
Size Reduction

100

7,038

70.38

3,064.00

30.64

56.46%

200

16,446

82.23

5,984.00

29.92

63.61%

300

29,999

100.00

8,676.00

28.92

71.08%

400

47,703

119.26

10,977.00

27.44

76.99%

600

96,976

161.63

16,866.00

28.11

82.61%

800

171,584

214.48

21,755.00

27.19

87.32%

1,000

262,825

262.83

26,593.00

26.59

89.88%

 

 

 

Msg
Count

The
number or messages used in the test.

Convoy
Orch

Log Size /KB

The size
of the message box database log file produced by the sequential convoy
aggregator orchestration.

Convoy
Orch KB/Msg

The size
of the log file divided by the number of messages.

Messaging
Log Size /KB

The size
of the message box database log file, and the order aggregation database log
file produced by the messaging-database orchestration.

Messaging
KB/Msg

The size
of the log file divided by the number of messages.

% Log
Size Reduction

The
percentage reduction in log file size gained by using the messaging-database
pattern.

 

 

 

KB Log File Size for
Aggregation / Message Count




The effect
of the batch size on the sequential convoy orchestration can clearly be seen in
the above graph. As the batch size increases, the messaging-database aggregator
offers significantly better performance.

 

 

KB Log File Size per Message
/ Message Count




As the
batch size increases, the messaging-database aggregator becomes slightly more
efficient, producing less log file activity per message. With the orchestration
based aggregator, continues persistence of the increasing aggregation state
makes the pattern less efficient.

 

Why is the Message Box
Database Log File Size so Important?

The log
files created when performing a transaction log backup on the message box
database are an indication of the load that has been placed on the database
since the last transaction log backup. As a BizTalk server installation is
scaled up, and out, the load on the message box database is often a bottleneck.
Adding another SQL cluster to handle increased message box activity is costly.
It’s much better to design a solution that uses the message box in an efficient
way if there it is anticipated there will be a high load on the system.

 

Conclusions

From the
results it can be seen that the messaging-database aggregator provides a 90%
reduction in message box database load compared with the sequential convoy
aggregator in this scenario. If the message count were to increase, this design
could perform 20 or ever 100 times better than the orchestration based design,
in a production system, this could result in significant cost reductions. Your
database administrator would also appreciate the difference in log file sizes
used for the backups :-p.

 

I tried to
keep the tests as scientific as possible, and had planned to include execution
times in the results as well. Unfortunately, Virtual PC is not a great platform
for performance testing, and does not seem to give the speed that running on
bare metal would do, so I have not included those figures. The log file sizes
should provide an accurate indication of performance, (and I did remember to
add the log file sizes for the order aggregation database as well ;-).

 

I think the
option of using a messaging-database aggregator should be considered when
building an aggregator in BizTalk. Apart from the performance advantages, it
also provides the option to recover better from failure, as the messages are
present in a database, and the aggregation can be triggered manually.

 

There’s a
few improvements that could be made to the design, I’m not happy with creating
a database table that contains the fields in the schema, with a hierarchical
schema, this would involve multiple tables, and references. It would be much
better to save the XML content of the message in one column.

 

I’ve herd a
few people discussing the option of using a pure messaging based solution to
improve performance on high load systems, and it would be great to see more
examples of this.

 

To sum it up

 

  • Sequential Convoy Aggregators
    are fine for handling aggregations with a low number of messages.

  • Sequential Convoy Aggregators
    do not scale well with higher message traffic if the aggregation state is
    held within the orchestration.

  • Building a messaging-database
    aggregator provides a design that scales well with large numbers of messages.

  • It is also possible to retain
    the sequential convoy orchestration design, and use a database to store
    the aggregation state if the need arises.