This post was originally published here
The term purge means removing or erase something completely. Data purging is a process or activity mostly heard with Database technologies, i.e. Cleaning up of Database.
There are many different techniques and strategies for data purging, which is often contrasted with data deletion. Deletion is more of a temporary process whereas Purging removes data permanently which in turn frees up the storage and/or memory space for other uses.
Purging process lets you archive the data even though it removed from the main source permanently, giving us an option to retrieve from the archive in case there is a need for it. The deleting process also removes data permanently but doesn’t necessarily involve keeping a backup and generally involves on insignificant amounts of data.
Why Data Purging is Necessary?
For any organization and in any environment, it is important to keep track of database growth and execute the data purging policies whenever necessary.
When the database size grows, the SQL server needs more memory and CPU to read data from the tables, which slows down the database operation.
Besides that, there is also the risk that, due to the growing database(s), you are running out of disk space.
Database Administrators spend a lot of their time dealing with the problem of database processes consuming too much disk space. So, it’s very important to monitor the database size to ensure the database is not seizing the memory and CPU for smooth operation. Also, it’s essential and necessary to automate the purging process.
The Purge Process
The purge process moves data between and deletes data from three categories of data or data sets:
- The current data set, which contains data that needs to be available to users. Users can change or review the data.
- The history data set, which contains data that can only be reviewed or aggregated by users. Users can’t change the data.
- The archived data set, which contains data that falls out of the availability threshold but needs to be stored to meet legal or business requirements. This data is archived, so it can’t be changed, reviewed, or aggregated by users.
Available Options for Purging
In any BizTalk environment, to purge the SQL database there are two main categories of options available. The most common way is
- SQL Server Jobs (by using SQL Agent)
- Windows Services
SQL Server Jobs
The purpose of the SQL Agent is to serve as a job scheduler. Many experienced DBAs use jobs running inside the SQL Agent to perform routine tasks such as backups, updating statistics, and rebuilding indexes as needed.
Scheduling jobs is one of the core SQL Server functions. Many businesses have numerous SQL Server jobs scheduled that perform any number of different tasks from database maintenance jobs like backup and index rebuilds to running queries and kicking off ETL (Extract. Transform and Load) tasks.
What is SQL Agent?
The SQL Agent is the SQL Server subsystem that’s responsible for automatic task scheduling. The SQL Agent is available in all of the editions of SQL Server except the SQL Server Express edition. Taking advantage of SQL Agent enables you to automate many of your routine IT database infrastructure tasks.
Advantages of SQL Agent
The SQL Server Agent gives you a more specific focus around scheduling specific types of jobs around SQL Server. As the Agent is a part of the cluster resource when a failover occurs as long as the cluster has been set up correctly the SQL Agent job will start once the SQL Server Service has started.
Windows task Scheduler is a component of Microsoft Windows that provides the ability to schedule the launch of programs or scripts at pre-defined times or after specified time intervals. SQL Server Scheduler is created for Agent job scheduling, so using SQL Server Scheduler to schedule the job is the best method.
Disadvantages of SQL Agent
If the Agent is not running at the time a job is supposed to run, when the Agent is started it will run the job at the next scheduled time. It will not automatically run the job.
SQL Server Agent jobs that are running at the time of a failover event on a SQL Server failover cluster instance do not resume after failover to another failover cluster node. Jobs that begin but fail to complete because of a failover event are logged as started but do not show additional log entries for completion or failure. SQL Server Agent jobs in these scenarios appear to have never ended. So if your server failover, you need to run this job again in the other node.
Windows Services
Definition of Windows Services
Windows Services is a core component of the Microsoft Windows operating system and enables the creation and management of long-running processes.
Advantages
Unlike regular software that is launched by the end-user and only runs when the user is logged on, Windows Services can start without user intervention and may continue to run long after the user has logged off. The services run in the background and will usually kick in when the machine is booted. An option ideal for use on servers when long-running functionality is needed without interference with other users on the same system.
Data Collection in BizTalk360
BizTalk360 consists own database. Collecting the necessary data is important to be able to represent the data in BizTalk360. As many of you aware that BizTalk360 is a one-stop solution for BizTalk monitoring. So for monitoring and analytics purposes, we do collect different data from various environments (it collects the information from BizTalk, SQL Server, and Tracking database) and stored in the BizTalk360’s database for future references.
- Advanced Event viewer
- Analytics
They are the sparkling examples for the data collection within BizTalk360 when you have configured/added more environments (for each environment there will be multiple machines say example a simple multi-server environment consists of 2-BizTalk servers for HA, 1-SQL server. For cluster servers and Load balancing you can imagine the numbers). However, you don’t want unlimited growth of your database.
BizTalk360 comes out of the box with the “Data Purging” feature to able to manage the size of the BizTalk360 database.
Monitor BizTalk360 Database Growth
There might be a situation where the BizTalk360 database grows due to the amount and size of event log data and performance counter data. The BizTalk360 purging policy will take care of the data growth, but it is also important to monitor the size regularly to ensure BizTalk360 is working seamlessly.
Data Purging in BizTalk360
As mentioned, BizTalk360 comes out of the box with the ability to purge the historic data by setting up the purging duration (days to be monitor) after the specified period. The Administrators/Superusers can set up the “Purge duration” under “Settings -> BizTalk360 Health -> Data Purging”. This will control the database growth and hence the performance of BizTalk360 will not get affected.
Purge Mechanism in BizTalk360
BizTalk360 uses Windows Services to trigger the purging. BizTalk360 owns two services,
- Monitoring service
- Analytics service
Out of those, purging was created as a sub-service of Monitoring service.
In a regular interval time (for every 2 hours) the “Purge Data – a sub-service of Monitoring service” will purge the data by hitting the Stored Procedures for purging available in the BizTalk360 database.
We have chosen the windows service to utilize the advantages of Sub-service and the services that can be started and paused. As most of you aware that the BizTalk360 services should be always running to collect the data and perform few operations, so we would like to effectively use the service rather than creating SQL job and monitor separately. Also, if any exception appears for the sub-service, the exception will be captured from the log files and it will be shown adjacent to the respective sub-service in the UI.
Based on the values set in the data purge duration, for every 2 hours, the data will be purged using the stored procedures with the trigger of Purge Data sub-service.
Until BizTalk360 v5.0, there was a single stored procedure (sp_b360_admin_PurgePolicies) called from our monitoring service which takes care of all the purging. In v6.0, we improved the logic by splitting it into multiple stored procedures (sp_b360_admin_PurgePolicies) which takes care of various parts of data purging.
How to Check if Purging Service Status is Healthy?
Purge Data – a sub-service of Monitoring service from BizTalk360 Settings -> BizTalk360 Health -> Monitoring Service Status -> Purge Data, will help you to understand and find the information that the purging is running for every 2 hours.
If you can compare the started time and Last execution time, for every 2 hours the execution should happen. Even if there is an exception, the exception will be captured in the log files and it will be shown adjacent to the purge data sub-service.
Additional Capabilities
Along with the purging service within BizTalk360, there are few more options to check if the database size is under the expected size.
- Using the Database size monitoring feature, users can monitor the database and log file size of BizTalk and BizTalk360 databases, by simply configuring the error and warning threshold values for the database and log file sizes. By this when the database size exceeds the expected size, you will receive an alert from BizTalk360.
- Use Database Query monitoring to Monitor the size of a database.
Whenever the database grows beyond expectations, the first thing to be checked is the standard database reports. This will ensure which table occupies more space. Then we can act on the purging policy and change it according to the business needs and data flow.
What are the Stored Procedures Involved in Purging?
Till the latest release, there were 14 Stored Procedures involved in Purging for 32 tables. Each stored procedure will purge multiple tables. Let’s see what the stored procedures are used for purging the corresponding tables.
- sp_b360_admin_PurgePolicies_MBVReports – It will purge the BizTalk Health Check tool execution results.
- b360_mbv_Executions
- sp_b360_admin_PurgePolicies_GovernanceAudit – It will purge the complete Governance and Audit tables, which is related to BizTalk and BizTalk360 activities.
- b360_governance_Application
- b360_governance_HostInstance
- b360_governance_ServiceInstance
- b360_governance_rules
- b360_governance_Server
- b360_governance_ESBException
- b360_audit_History
- sp_b360_admin_PurgePolicies_AlertNotification – Will purge the Alert notifications which is sent out from BizTalk360 like Email, SMS and Notification channel.
- b360_alert_History_Email
- b360_alert_History_SMS
- b360_alert_History_System
- b360_historical_Alert_Maintenance
- sp_b360_admin_PurgePolicies_EventLog – Will purge the data of event logs that are already collected from different machines.
- b360_ev_EventLogData
- sp_b360_admin_PurgePolicies_Throttling – Will purge the data collection for Throttling analyzer
- b360_td_Throttling
- b360_td_ThrottlingDataCollectionError
- sp_b360_admin_PurgePolicies_MonitoringDashboard – The alarm status update of the monitoring dashboard will be purged
- b360_alert_MonitorExecution
- sp_b360_admin_PurgePolicies_MonitoringLogState – will purge the monitoring log state, monitoring will put the entries of current and expected state.
- b360_alert_Monitor_LogStateCheck
- sp_b360_admin_PurgePolicies_MonitoringLogThreshold – will purge the monitoring threshold log, monitoring will put the current, warning and error levels.
- b360_alert_Monitor_LogThresholdCheck
- sp_b360_admin_PurgePolicies_ProcessMonitoring – will purge the data of all types of data monitoring like Process, Message box data, Tracking data, BAM, EDI, ESB, Logic Apps, and Event log data monitoring.
- b360_st_ScheduleActivity
- b360_st_ProcessMonitorResults
- b360_st_DataMonitorResults
- b360_st_DataMonitorTaskActionResults
- sp_b360_admin_PurgePolicies_PerformanceAnalyzer – The data collected for the Analytics module will be purged
- b360_perf_PerfCounter_Data
- b360_perf_PerfCounter_Expanded
- sp_b360_admin_PurgePolicies_LiveFeed – Live feed data will be purged
- b360_broadcast_Message
- sp_b360_admin_PurgePolicies_TrackingData – The data collected for the Analytics module will be purged.
- b360_dta_Counter_Data
- b360_dta_Counters_Expanded
- sp_b360_admin_PurgePolicies_DBCache –
- b360_admin_Cache
- sp_b360_admin_PurgePolicies_BizTalkReporting – The data collected for the Analytics – Reporting module will be purged.
- b360_BizTalkDB_DiskUsages
- b360_BizTalkDB_Tbl_sizes
- b360_Report_History_Email
- b360_report_ReportArchive
Purge Settings in BizTalk360
Let’s see how purge helps. The default purging settings in BizTalk360 can be seen in the below screenshot;
Based on the values/duration provided by the purging policy, the data will be purged. We can see that the purging duration for Data Monitoring is 2 months. Hence the historical data for 2 months will be present in the BizTalk360 database.
Purging needs to be done to remove the historical data, thereby making the database healthy. BizTalk360 purges the data by running the stored procedure in the specified duration specified in the settings. The purging settings can be altered by the customers according to their business needs and data flow. If a large volume of data flows through the ports, they can set the purge duration to a minimum value so that data growth is controlled.
The customer configured 3 environments in a single installation of BizTalk360. In each environment, there were 5 servers available due to BizTalk and SQL server. Hence there were 15 servers to collect data for Advanced Event Viewer and they have set-up 30days to monitor. The Database has grown almost to 300 GB because of the number of servers to monitor.
We have recommended the customer to decrease the purge duration from 1 month to 2days and TRUNCATE the particular table (Since the table belongs to BizTalk360 and it is historical data, it won’t affect the BizTalk server). After modifying the purge duration, the data were reduced to 2 GB started working as expected.
Can the Purging be done Manually?
BizTalk360 will purge the data automatically. Sometimes the data collection may be out/beyond our expectation and that leads to a huge volume of data.
Our first suggestion is to reduce the number of purge days. So that the data will be reduced.
Until v6.0, the stored procedure sp_b360_admin_PurgePolicies was used and in recent versions of BizTalk360, you can use the stored procedure sp_b360_admin_CleanupDB to manually purge the database in case of large database growth.
When the number of records in the database grows quickly, call to the stored procedure(s) can start to get timed out from the monitoring services. This will result in data not being cleared and resulting in unusual database growth.
Conclusion
Considering the feedback provided by our customers we have improved and fine-tuned the collection of the data into the BizTalk360 database.
Whenever the database grows beyond expectations, the first thing to be checked is the standard database reports. This will ensure which table occupies more space. Then we can act on the purging policy and change it according to the business needs and data flow.
If you have any questions, contact us at [email protected]. Also, feel free to leave your feedback in our forum.
The post Data Purging in BizTalk Server and BizTalk360 appeared first on BizTalk360.