As the below quote says,
Quality in a service or product is not what you put into it. It is what the client or customer gets out of it – Peter Drucker
We, the product support team at BizTalk360, always put in our best efforts to solve the customer problems to make them feel satisfied. Our team often gets different varieties of problems, some related to functionality, performance, and data related issues and we make sure the issue is resolved within proper timelines. Recently there was a case with the customer that they were facing exception in Data Monitoring. In this blog, I am going to share my experience on how we resolved this problem.
The Message Box Data Monitoring:
How many times in a day does the support person have to watch for suspended instances in a particular application and take an appropriate action, or look out for ESB exceptions with a particular fault code? Wouldn’t it be nice if there was a way to set up monitoring on a particular data filter, get notified when there is a violation, and take actions automatically depending on the actual situation? Yes, that’s exactly what BizTalk360 achieves through the concept of data monitoring, which was a result of a customer feedback.
In MessageBox Data Monitoring, we can identify the number of suspended instances, the messages flowing and take appropriate action, either to suspend, resume or terminate the instances. This can be done from BizTalk360 itself, given the user has appropriate permissions. In my previous blog, I have explained about the permissions to be given for the user to resume/suspend/terminate service instances. Let’s look into the customer’s case.
The support ticket was raised for the case that the customer has configured auto termination functionality for the suspended instances. The messages were neither getting archived nor terminated. But the Data Monitoring dashboard was showing the details of the successful run.
Our investigation starts:
For a support ticket, with respect to Data Monitoring section, the first thing that we would check is for the alarm configuration details and then the logs for an exception. So, we started the investigation by checking the alarm details and they seemed to be fine. But the exception was captured in the Data Monitoring dashboard when the details of the task action were checked for.
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
The real cause for the Timeout exception:
There was an additional information from the customer that they were not able to delete the existing MessageBox data alert from BizTalk360, even though the BT360 service account was a superuser. The same timeout exception was displayed while deleting the alarm also. The suspended instances were getting terminated from BizTalk admin console without any issues. Since this was an issue in the production environment, we immediately went on a call with the customer to probe further on the case.
There were four BizTalk360 servers configured in High Availability mode in the production environment, out of which 3 were passive and one was active. We checked the status of the monitoring sub-services and found that Purging was not getting updated properly.
The timeout exception usually happens when there is a large volume of data. Checking all the permissions and configurations, everything was fine. The next step was to check for the data in the BizTalk360 database. But from where does the large volume of data come from?
BizTalk360 communication with other BizTalk databases:
It’s a well-known fact that BizTalk360 is a one-stop monitoring solution to monitor BizTalk server. So, for monitoring the BizTalk artefacts and the messages flowing through the receive and send ports, BizTalk360 polls for the data from the BizTalk databases namely BizTalkDTADb and BizTalkMsgBoxDb and inserts the required data into the BizTalk360 database as per the alarm configurations. For MessageBox Data Monitoring, the data from BizTalkMsgBoxDb is fetched. If there is any action (resume/terminate) is configured for the suspended instances in Data Monitoring, then the data is inserted into the following tables in BizTalk360 database.
When we checked the number of records in the b360_st_DataMonitorTaskActionResults table, the select query was just spinning and was taking lot of time to load the results. This was due to the reason that there were 8 million records in that table. And obviously, this was cause for the timeout exception in BizTalk360.
Purging in BizTalk360:
BizTalk360 comes out of the box with the ability to set purging duration and the background monitoring service has the capability to purge older data automatically after the specified period. The Administrators/Superusers can set up the “Purge duration” under “Settings”. This will control the database growth and hence the performance of BizTalk360 will not get affected. The default purging settings in BizTalk360 can be seen in the below screenshot.
We can see that the purging duration for Data Monitoring is 2 Months. Hence the historical data for 2 months will be present in BizTalk360 database.
Purging needs to be done to remove the historical data, thereby making the database healthy. BizTalk360 purges the data by running the stored procedure in the specified duration specified in the settings. The purging settings can be altered by the customers according to their business needs and data flow. If a large volume of data flows through the ports, they can set the purge duration to a minimum value so that data growth is controlled.
We recommended the customer to decrease the purge duration to 1 month and the observe the Data Monitoring. After modifying the purge duration, the MessageBox Data Monitoring started working as expected. The suspended instances were getting terminated as per the alarm configuration and there were no timeouts happening.
The best practice to follow in case of timeout exception:
Whenever a timeout exception occurs, the first thing to be checked is the standard database reports in the relevant databases. This will ensure which table occupies more space. Then we can act on the purging policy and change it according to the business needs and data flow.