Article Source: http://geekswithblogs.net/michaelstephenson
I’ve been asked the same question a few times recently by a couple of BizTalk projects about how to map their reference data. When this question comes up we often get involved in a discussion about the pros and cons of caching the reference data and increasing memory usage versus hitting the database every time.
As a rule I tend to use the BizTalk Cross Referencing features for this data mapping unless there is a specific requirement which requires some custom approach. I’ve blogged about this kind of thing a few times before but I thought its worth a post with some thoughts on the different approaches I’ve seen used when people have wanted to use caching.
I mentioned in a previous post that the Value cross referencing features already implement a simple caching mechanism. In my opinion though the value cross referencing is aimed more at mapping data type values between types of systems rather than business reference data which would be held in instances of systems which is what I feel the ID cross referencing is aimed more at.
Anyway when it comes to this design decision the things people are usually trying to balance are as follows:
- Performance – If I have a lot of things to map I don’t want to be hitting the database thousands of times
- Performance – If a I cache the reference data is there a risk it will consume a fair bit of memory and potentially cause throttling based on host process memory threshold being hit
- Managability – If I cache the data it will have an instance of the cache in each host instance that uses it. How will I ensure these stay in sync
- Managability – Caching will mean I have to restart all the hosts when the data changes
There are a number of possible ways to solve this problem and each have their own considerations which are discussed in the rest of this article.
Simple Singleton Approach
This is probably the most common approach I’ve seen. In this approach I’ve normally seen a custom database implemented to manage the reference data. The developer would then implement a custom data access method and a singleton which would be used to control access to the reference data. This is a pretty standard use of the singleton pattern. In this approach I think some of the considerations which need to be made are:
Pro’s
- Fast access to the data
- Easy to implement in terms of the C# coding
Con’s
- In most cases there is additional development of a database to manage the data. This then involved additional development/testing/deployment and management work
- The data is cached in the host process so you need to watch for the impact on the process memory of the BizTalk host
- If you access this reference data from BizTalk maps running in different hosts then you may end up with multiple instances of the cached data on each server
- By default your cache usually will not detect changes to the underlying data, however with additional coding you can monitor the custom data and update any changes
- In most cases the hosts need to be restarted to pick up changes
- The cache will not be cleared when the data is no longer used
Caching the Response from a Web Service
Sometimes I’ve seen an approach where a custom database has been implemented then a web service fa%u00e7ade has been implemented on top of it. The web service will access the data and return it. In consuming this from BizTalk a C# assembly has been developed which uses the web service to get the reference data which is then consumed by a map.
Pro’s
- The caching is outside of the BizTalk process
- The caching can be relatively easily configured
- If the web service is located on the BizTalk box then a local machine hop would be quicker than going remotely, and also with WCF you could optimise this further using appropriate channels
Con’s
- There is a lot of additional development in this approach, custom coding of the web service, development of the database
-
There is a lot of additional management and deployment effort for the database and virtual directories etc for the web service
Using the HTTPCache
In this approach I’ve normally seen it implemented in the same way as the singleton approach above. The key difference is that the reference data is usually held locally in a static hash table in the singleton approach where as in this approach the HttpCache object from the System.Web namespace is used. This gives a couple of options around a sliding and absolute expiration which will remove unused data from the cache helping to control the memory usage. You can also add one of the .net cache dependency objects which would allow you a way to detect changed and refresh the cache.
Pro’s
- Would be fast access to data
- Relatively simple to implement ways to detect changes
- Ability to clear the cache for unused data
Con’s
- Again usually has a custom database for the reference data
-
This is in process caching so you need to be aware of the memory usage
Using Enterprise Library/Caching Application Block
Enterprise Library has a caching block which provides a number of features which could help you solve this problem. One of the key benefits of enterprise library is that it supports different types of stores for the cached data including:
- Null – means just stored in memory
- Database
- Isolated Storage
If I remember right the cache supports the same features as the HTTPCache approach which allows you to have a dependency and also expirations. There is an article at the following location which discusses using Enterprise Library Caching in BizTalk http://www.malgreve.net/2007/07/using-enterprise-library-in-biztalk.html.
Enterprise Library can also integrate with external backing stores to support out of process caching.
Pro’s
- Ability to abstract the caching store from the consuming code
- Standard caching feature set
Con’s
- Again usually some requirement for custom data store for reference data
- Enterprise Library usually required lots of configuration to setup and manage
- Most commonly cached in process so near to be aware of memory usage
Out of Process Caching
One approach I quite like involves caching the data outside of the BizTalk process. This provides the benefit that you can cache without having to worry about the impact on the BizTalk process memory usage. There are a number of caching tools which you can use to help here such as:
- NCache – http://www.alachisoft.com/ncache/
Alachisoft offer an express version of their caching product which is free and a version for a relatively small cost which comes with some management tools for their distributed caching system.
- Memcached – http://www.danga.com/memcached/
Memcached is an open source distributed caching system. I know of some guys who have used this very successfully on a .net project with a major UK company.
- Velocity – http://www.microsoft.com/downloads/details.aspx?FamilyId=B24C3708-EEFF-4055-A867-19B5851E7CD2&displaylang=en
Velocity is an initiative at Microsoft at present to create a distributed in memory caching platform. I feel that as this evolves it is important to keep an eye on this as it will in the future be likely to become the best approach to this.
These distributed caching systems offer the benefit of taking the memory usage out of your process, but offer fast access to the data via their API. Most of these products also offer high availability and synchronisation across a group of caches when you distribute them across your server group. I have in particular looked at NCache for this example and it is setup as a windows service which you would deploy on each BizTalk box. These services would then be configured to work as a cluster meaning they would synchronise themselves when changes were made.
Pro’s
- Out of process caching offers still fast access to the cached data, but removes the likely hood that the cache might affect BizTalk performance
- These caches are designed for high performance such as NCache which is intended for high performance customer facing ASP.net applications
- They can be integrated with caching frameworks such as Enterprise Library (NCache comes with this out of the box)
- NCache supports cache dependencies and expiration
Con’s
- Again requires work and management of additional components. I think NCache (the buy version) offers a better management and operations
- Potentially brings up the 3rd party or open source debate around which cache system to use
Summary
Hopefully this article has highlighted the many options available when you are considering a caching solution to support your BizTalk implementation. There are many considerations which can be made and there isn’t always a one size fits all rule like in most design decisions. I think some of the things that stand out from this discussion are that most of the approaches above always end up using a custom database to manage the reference data. I think in a future post I will look at how to combine some of the approaches discussed here with the BizTalk Cross Referencing features to produce a fairly simple yet effective combination of all of the approaches.