I recently went through a really nast bout of troubleshooting with the client I currently
work with, related to MSMQT.  Hopefully, my tale can save you similar pain.

The core issues was this: The BizTalk MSMQT adapter can be configured during installation
to integrate with Active Directory.  The default is that it will not operate
in this fashion, but rather in “workgroup” mode.  There are (at least) two reasons
why you might want to have MSMQT integrate with Active Directory: 1) you want to make
use of an MSMQ router in your environment or 2) you want to use certificate-based
authentication at a protocol level (where the public certificate is managed by AD.)  (Note:
I know this now; I didn’t know it a couple weeks ago…)

We have been installing our servers in “workgroup” mode.  To install in Active
Directory mode requires a special permission granted by the domain administrator.

Now, when you a configure a Send Port within BizTalk and select MSMQT as the transport,
the property pages in the BizTalk Explorer offer a checkbox that is labeled “Use MSMQ
Authentication”.  If you hit the “Help” button on this dialog, the explanation
that is provided is this: “Identify whether BizTalk Message Queuing uses protocol
authentication every time it sends a message on this port.” 

As it turns out, although it isn’t documented as such, a Send Port with this
option checked can only work if MSMQT has been installed in Active Directory-integrated
mode.  If you have the “Use MSMQ Authentication” option checked on a Send Port
and you are not in Active Directory-integrated mode, then messages will not flow. 
When we eventually discovered this discrepency and fixed our bindings files, the problem
was resolved.  (Note: there is a similar option when configuring Receive Locations.)

This checkbox had been checked at the point our initial binding files were exported,
and became a part of our scripted deployment.  What was worse, when we encountered
this problem a few weeks ago in QA, we began troubleshooting the BizTalk configuration
on the server directly and wound up “fixing” the problem by creating an additional
Send Port (subscribing to the same traffic as the original) that simply had the MSMQ
Auth checkbox off.  But we didn’t realize that discrepancy at the time, so we
had to troubleshoot the same problem all over again a few weeks later.  We definitely
got ourselves into the wrong troubleshooting mindset by assuming that Biztalk was
flaky in some way.

Key lesson: If you don’t get into a given environment (QA, production, whatever) with
your scripted deployment, then you really didn’t get there at all….

A few more notes.  As I said above, if you have the “Use MSMQ Authentication”
option checked on a Send Port and you are not in Active Directory-integrated mode,
then messages will not flow.  What you will see is:

  • Messages will appear in the HAT “Queries-Messages Sent in Past Day” report, but they
    will not actually have arrived in the destination queue.  (Fixed in SP1?)
  • You will see strange behavior in the HAT “Operations-Messages” view, but nothing that
    indicates an error condition.  Retry count will increment on the original service
    instance.
  • There will be no error condition reported in the event log.  (OK, Premier Support
    indicates in a phone conversation you might see something after 5 days have elapsed,
    when an exponential backoff algorithm has run its course.)

IMHO, Biztalk 2004 should be more serviceable in this regard, and should give better
error information.  And of course, the documentation for MSMQ Send Port configuration
should have mentioned that MSMQ Authentication would only work for Active Directory.

Microsoft Premier Support became involved, and after around 18 hours of analysis they
said “We see some certificate-related errors in the traces.  Do you use MSMQ
authentication?  Are you AD-integrated?”

We looked in our binding files (since the decision had long since been forgotten)
and saw this snippet:


<TransportType Name="MSMQT" Capabilities="16495"
ConfigurationClsid="9a7b0162-2cd5-4f61-b7eb-c40a3442a5f8"/>
<TransportTypeData>&lt;CustomProps&gt;&
lt;Authenticated vt="11"&gt;-1&lt;/Authenticated&gt;&lt;/CustomProps&gt;
</TransportTypeData>
<RetryCount>3</RetryCount>
<RetryInterval>5</RetryInterval>

See that in the escaped XML?  Yup, that is a property called “Authenticated”
that is an old-fashioned Variant of type bool, where “-1” means “true”.

Leaps out at you, right?  Determining if you are AD-integrated means looking
at

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BTSSvc.3.0\MessageQueuing\MsmqtWorkgroupMode.

From all of this I (gently) conclude that the product instrumentation/tracing should
point out this condition more quickly to a support engineer.  In addition, the
MSMQT adapter should warn you of a mismatch during configuration with the Biztalk
Explorer and, ideally, when you deploy/import bindings.

Hindsight being 20/20, the support engineer should have asked to see our binding file
– and should have compared it with one exported from a server that was indeed sending
messages (since we had one.)  Of course, we should have made such a comparison,
too! (and much earlier…)  The engineer did look at the Biztalk Admin Console,
but of course that doesn’t give any of the detailed port configuration information
– only Visual Studio/BT Explorer does.

Having said that, the support engineers were great to work with and were certainly
dedicated to getting to the bottom of our issue.

Key lesson: Diffing binding files will prove to be a key troubleshooting technique
with Biztalk…