BizTalk SQL Adapter Debatching and MessageBox Dos and Don’ts

I’ve posted a few things over to the new blog lately and thought I’d keep this blog in sync until it shuts down next week.

First, I did a walkthrough of how do use the out-of-the-box BizTalk XML pipeline to debatch messages coming inbound from the SQL Adapter.  Fun stuff.

Second, check out the latest post by Lee on the BizTalk Core Engine blog which covers some of the underlying SQL magic contained in the MessageBox.   He also threatens physical violence against anyone who changes the query plans or stored procs.  Ok, not really.  Still a good read, even sans violence.

Technorati Tags: BizTalk

Understanding Design-Time Properties for Custom Pipeline Components in BizTalk Server

Recently I been involved in building variety of Biztalk pipeline components with different design-time properties ranging from simple “string” to “complex collections”. I struggled a bit to figure out the ways to do it, due to poor error description you get while developing and also due to lack of documentation around it. Normally you’ll find some blog articles here and there explaining stuff, but in this particular instance I couldn’t find any blog articles either. So, I went ahead and written a white paper explaining design-time properties in Custom Pipeline component, the article explains using complex types and their serialization techniques, development tips and error diagnosis procedures.

You can download the White Paper and the accompanying code from the following location: 

http://www.microsoft.com/downloads/details.aspx?FamilyId=8C2572A2-FB61-4016-A7A7-1C37518B39FF&displaylang=en

I need to thank Angus Foreman (from Microsoft,UK) and Luke Nyswonger for their help in publishing the article. I also would like to thank all the editors.

Nandri!

Saravana

Commerce Server 2007: Developing for B2B Scenarios

I spent about an hour on the phone earlier this week with a Commerce Server 2007 developer who asked a very basic but vital question… "What do I do if the StarterSite doesn't fit my scenario?"

First off, the Commerce Server development team has done an excellent job with the new StarterSite. It's a great reference even if it doesn't fit your scenario (such as a strictly B2B site) since it demonstrates some really innovative web techniques and includes some excellent server controls as well. I highly recommend setting up a development instance using the StarterSite, even if you just use it to see how things work in CS2007.

Secondly, it's really very straight-forward to implement Commerce Server functionality in any ASP.NET 2.0 web application if you understand how the various subsystems (Profiles, Catalog, Orders, Inventory) work together. It also helps to be able to see some sample code so I've put together a list of my old posts to help get you started.

Please feel free to ping me via my blog if you have any questions, comments or suggestions.

Commerce Server 2007: Begin with the End in Mind!

Commerce Server 2007: Development Tip #1 – Returning a User's Catalogs

Commerce Server 2007: Development Tip #3 – Creating a New User

Commerce Server 2007: Development Tip #4 – Returning Site Terms

Commerce Server 2007: Development Tip #5 – Using the RegionCodeDataSet

Commerce Server 2007: Development Tip #6 – Adding Items to your Basket

Commerce Server 2007: Development Tip #7 – Using the CatalogItemsDataSet

Commerce Server 2007: Development Tip #8 – UpmRoleProvider

Commerce Server 2007: Development Tip #9 – Shipping Addresses

Commerce Server 2007: Integrating Commerce Server Orders with BizTalk Server

and there's more to come in future posts…

Technorati Tags: Commerce Server 2007

Share this post: Email it! | bookmark it! | digg it! | reddit!| kick it!

What you can and can’t do with the Messagebox Database server.

This question has come up a lot in internal and external discussions. It got to the point where we had to put out a KB article as so many DBAs out there thought that it was okay to go and “tweak” the settings (http://support.microsoft.com/kb/912262/en-us). So here is an attempt on my part to explain what you can and can’t do in relatively simple terms and why.


What can’t you do?


You may not make any changes which could effect how our queries are executed (ie their execution plan). I will list some specific things which can affect the query plans, but they are just examples. If you know of a setting which I don’t list and you aren’t sure, then don’t change it. Here are a couple of things to understand around this:


1) Think of the sql code (stored procedures) which we ship as uncompiled C# (or whatever your favorite language is) code. Every customer who installs our product gets their own compiled version of this code courtesy of the SQL Server engine. You can read up on documentation on how the sql engine performs this compilation (or go to one of Paul Randall’s talks at Tech Ed or some other conference) but it is important to realize that it is being done. SQL even has a performance counter which you can monitor called recompiles which will show you how often “code” is being recompiled potentially while your applicatin is running at full throttle. SQL Server uses various statistics about the table layout and cardinality of indexes and such to attempt to chose the best plan to reduce IO and provide the best performance. One thing that the optimizer cannot anticipate is the locking side-effects certain plan choices could have. It is therefore possible that in an environment in which lots of threads are accessing the database at very high rates, a plan which might be optimized for potential IO access could cause tremendous locking and hence be a much worse overall plan choice.


2) The next very, very important thing to understand is that while the database is installed on your server, it is not “your” database. It is our (or if you catch me on the wrong day, not thinking, “my”) database. When you have performance problems with it, the buck stops over here, not at your DBAs desk. For that reason, it is unbelievably important that the system runs the same on every installation. If every customer who called in with a question was forced to gather a dump of the plan cache from within SQL Server and send it to us so that we could reconstruct exactly what sql server is doing, it would make support pretty much impossible.


3) Access to the messagebox is very controlled and very limited. Via our Admin MMC (in bts 2006) you can construct somewhat flexible queries against the data, but even that is somewhat limited. From the runtime, there are a set of precanned generic stored procedures and some per-host templated stored procedures which provide all the access that the runtime needs. Since we have locked down access to the database, we can understand all of the potential queries. With that in mind, we have done everything we possibly can to “hardcode” the query plans for all of our stored procedures. This means using index hints, force order hints, occassionally join hints, norecompile hints (why recompile if you are just going to generate the same plan) and every trick we can come up with the tie the hands of the optimizer (part of the sql engine which calculates the best query plan). The sql server team is well aware of what we are doing. Technically in SQL 2005 there is support for USE PLAN where we could really hardcode the plan completely, but (in order of importance): (a) we support SQL 2000, (b) it is only supported on SELECT statements which only ~30% of our code and (c) it is non-trivial for generated stored procedures like the ones we use for per host access. Along with attempting to force the optimizer to chose the plan we want, we also disable some of the features of SQL Server which “help” the optimizer chose the best plan. This includes statistics generation and updating. Statistics provide information about the layout of a table so that the optimizer might decide it is more efficient to scan the clustered index than to seek over the non-clustered index (as an example). Since we are hardcoding the plan, statistics are not so important. We also disable parallelism (setting at the server level MAX DOP to 1). Parallelism provides more options for the optimizer to chose amongst for plans. Typically it is very useful for queries over very large databases (warehouse type applications). We are a OLTP (online transaction processing) style app which executes the same sets of sprocs on multiple threads at very high rates. We already provide our own parallelism. I “apologize” that we chose to set this at the server level which could cause it to effect any other database you have installed on that server, but you may not change this setting or else we will simply push back on any support until you switch it back (no matter how large a customer you are). If you have only one server, install an alernate instance of sql server on the same machine and put us on the separate instance. In our testing, when SQL Server choses a plan with parallelism in it for one of our queries it can cause orders of magnitude performance degradataion.


4) We are not perfect (but we try :). Occassionally we do not include enough hints so that the optimizer can chose another plan which is worse. I have personally fixed two bugs in which we needed to add hints or slightly change the structure of our query because the optimizer chose a less than optimal plan. It can happen, but we do our darned’ist and I have only had I think 3 cases of it in about 3 years which is not bad. It also not fair to blame this on the optimizer. As I said above, since we are hardcoding the plans for all of our queries, we turn off a lot of the features which might help the optimizer (but add overhead to the overall system).


A quick list of things which can effect the query plans:


Statistics (don’t enable these)


Parallelism (don’t turn this on )


table structure (don’t add indexes, columns, triggers, … If you do you will hear silence when you call for help)


Stored procedures (don’t change them. You can look all you want, but no touching)


 


So what can you do?


You can change the underlying storage model for our data on the physical disks. By this I mean stuff like file groups and files and such. By default all of our tables are setup on a single (PRIMARY) filegroup which has one file. This was admittedly perhaps not the best decision. You are more than welcome to create multiple files for a given file group and also to create multiple filegroups and move our tables around. This can give potentially significantly better IO performance and hence overall system performance. We are currently working on a paper (and perhaps tool to automate) which will give recommendations on how to layout out tables given as many phsical disks as you have access to. You can also change setings related to how sql pre-allocates space for the files and for some of their internal structures. None of this effects how our plans are executed, it simply allows for faster IO access and better overall performance. One interesting settings which we do not enable by default but have been playing with for a little while now is “TEXT IN ROW”. This table level option is used to tell sql server whether to store the image column data in the same data page as the clustered index or in another data page. By default this is set to off. If you think about an application like Outlook / Exchange this makes a lot of sense, as the image data (you email message) is potentially only loaded when the user clicks on a specific email to read it (disregarding preview pane). Storing the image data in a separate page allows more rows to be stored in one page and hence less IO is needed to read a large set of data. However, in BizTalk, when a message is delivered to a subscribing service, that message is going to be “processed” and hence loaded every time (we do special handling for large messages to fragment them already). So instead of an optimization, we incur an extra IO to lookup the image data in the separate data page. I don’t gaurantee it will always provide a performance benefit, but we have seen some decent gains when processing small messages. As always you should test this out in a QA environment first. You can read more about text in row simply by searching for “SQL SERVER text in row” and picking one of the links. Text in row is enabled at the table level, so you need to have some understanding of our table structures. Really, there are only two tables we need to worry about. The first table is the Spool. This table contains for every message, its MessageID, a timestamp, some other properites, and the blob form of the message context. The second table you need to worry about is called the parts table. This table contains both the part property bag (for things like content type and charset) as well as the first fragment for each part. Enabling text in row on these tables could provide a benefit for high throughput systems which are processing smaller messages. Since the size of a page can only be 8K, if your messages are greater than this, enabling text in row probably won’t help since sql will be forced to push the data to a separate page anyways (although we do compress your data so it is smaller in storage than you might think). You can also play around with the max size for the image column before it is offloaded to a separate page (http://msdn2.microsoft.com/en-us/library/ms173530.aspx). As always, have someone with database experience read over this and understand it before testing anything out. A quick script to turn on the option would look like:


exec sp_tableoption N’Spool’, ‘text in row’, ‘ON’


exec sp_tableoption N’Parts’, ‘text in row’, ‘ON’


 


 


So the conclusion is that you should never do anything which effects the way our queries are executed but you are more than welcome to try and optimize the IO access to improve performance. Hope this helps some of you out there.


 


Lee


 


Addendum: 


I was asked about Index Rebuilds. First, we do not support online index defragmentation. That process involves page locks which we explicitly disable as part of our installation on all of our tables. Page locks make lock acquisition ordering almost impossible and will cause us to deadlock all over the place. We are very carefull about how we acquire locks and expect that the granularity is at the row level so that our ordering is enforced. Having said that, offline rebuilding of indexes is perfectly okay and supported. In our labs, we have not seen significant performance gains from doing this but it is true that our indexes will fragment since many of them are on guids (most). SQL Server team is not a huge fan of guid based indexing, but we have done numerous tests which show that as long as you don’t ever scan the table, they can perform better than identity based indexes (for our specific workloads). They will of course get fragmented, but usually our tables don’t get too big as data flows in and out of them at a relatively steady pace. If you have large amounts of data which is expected to build up in the messagebox, you are more than welcome to periodically do index rebuilds during scheduled downtime. The same applies to the tracking database where you can rebuild indexes during downtime.

VPC2007: Erratic Mouse Behavior in RC

I've spent several hours testing the RC build of VPC2007 and I'm still experiencing erratic mouse behavior where the mouse seems to "skip" (disappear and reappear during movement). In the beta, this was caused by my Intel Centrino 915 chipset (Dell D810 running Vista Ultimate RTM) and several other symptoms were also prevalent including repeating keystrokes. Now it's only the erratic mouse behavior which is a big improvement over the beta. I've tested this on five different virtual machines and this issue exists on every one, but no two are the same. On vm's created by VPC2004 SP1 and upgraded to the latest vm additions, the problem is very mild. On vm's created by the VPC2007 beta, the problem is so bad it makes the vm unusable. On vm's created by the RC, its usable but very annoying. I'm very surprised that the "degree" of erratic mouse behavior is different on each virtual machine. I've submitted this issue (again) on the MSFT Connect site and am awaiting a response from MSFT.

I'm wondering if this is an isolated issue on my Dell D810 so I'm asking everyone reading this post and testing VPC2007 to post a comment with your experiences. If you have any feedback, please let me know!

Technorati Tags: Virtual PC 2007

Share this post: Email it! | bookmark it! | digg it! | reddit!| kick it!

BizTalk SQL Adapter Debatching and MessageBox Dos and Don’ts

I’ve posted a few things over to the new blog lately and thought I’d keep this blog in sync until it shuts down next week.

First, I did a walkthrough of how do use the out-of-the-box BizTalk XML pipeline to debatch messages coming inbound from the SQL Adapter.  Fun stuff.

Second, check out the latest post by Lee on the BizTalk Core Engine blog which covers some of the underlying SQL magic contained in the MessageBox.   He also threatens physical violence against anyone who changes the query plans or stored procs.  Ok, not really.  Still a good read, even sans violence.

Technorati Tags: BizTalk

Managing Real Time Aggregation Data

I was putting together a demo involving BAM’s real time aggregation capability, and was interested to see that it the aggregation was calculated via a SQL trigger, which stores the aggregated data in a SQL table. This differs to non-real time aggregation, which BAM implements with OLAP / SQL Analysis Services. OLAP / SQL Analysis Services is geared towards aggregating large sums of data. SQL tables (like the ones used in BAM’s real time aggregation implementation) are relatively less capable of this aggregation task. This makes the maintenance of the real time aggregation table important – especially keeping the number of rows in the table to a level allowing reasonable query performance. The BAM Management Utility in BizTalk Server 2006 provides a handy command to do just that: set-rtawindow Note – the BAM Management Utility also provides a command to set the duration of a non-real time aggregation activity: set-activitywindow See the following link for more information: http://msdn2.microsoft.com/en-us/library/aa547898.aspx…