Lately, a lot of CTP’s came available on the Windows Azure platform. My goal is to try them all out. I blogged on two of these technologies already:
- Using Windows Azure VM Role
- Azure connect – VPN as a service
This time, we’re getting on the data side, with SQL Azure Data Sync. This data synchronization service is built on the Microsoft Sync Framework technologies. It provides bi-directional data synchronization and data management capabilities allowing data to be easily shared across SQL Azure databases within multiple data centers.
A very good and detailed overview on Data Sync can be found on the TechNet Wiki: http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-data-sync-overview.aspx
Scenarios
This SQL Azure Data Sync service is a very promising and interesting feature that opens a lot of great scenarios:
- For development/testing purposes: provide data in your local database (on development) and synchronize with the Azure database to feed the cloud instances.
- Synchronizing configuration or master data across databases in a very easy way.
- Use SQL Azure Reporting Services on ’local’ data.
Registration for CTP
Since this feature is still in CTP, you need to register first to get an invitation code. To do all this, just browse to http://datasync.azure.com and sign in with your live id. After this, you can enter your e-mail and other details and you should receive an invitation code some time later. With that registration code, you can log on to the full features.
Concepts
The following concepts are important in setting up Data Sync:
- Sync group: a sync group is a group of databases that can be synchronized together
- Database: a database that gets registered for SQL Azure Data Sync
- Agent: an agent is a Windows Service that performs and orchestrates the actual synchronization
Configuring data synchronization
Step 1: download the Sync Agent
A Sync agent is only required when synchronizing data from an on-premise database. If the synchronization is set up from Cloud to Cloud, then it is not required to download the Agent, since the synchronization logic will be run in the Cloud.
For this example, I will synchronize a local database with a SQL Azure database.
On the Data Sync portal, click the Agents tab page and download the Agent installer at the bottom of the page. After running the installer (and providing credentials for the Data Sync windows service), you can open a local configuration tool, through the Start menu: SQL Azure Data Sync Agent CTP2.
This tool allows you to configure local databases for synchronization. The following screenshot shows the default configuration window, after installation. To join this agent to a Windows Azure subscription, it is important to configure the Agent Key first. This can be done by clicking the Edit Agent Key button and providing the key that can be copied from the Data Sync Portal.
Once this is configured, it should be possible to Ping the Azure Sync Service, by clicking the Ping Sync Service button.
Step 2: Add your local database
Now we have the local agent installed, we will add a local (on premise) database to the configuration tool, so that it can be synchronized later.
In this example, I am using a custom database with 3 tables: Customer, OrderLines, OrderEvents.
- I will add this database in the SQL Azure Data Sync Agent tool, by clicking the ’Add Member’ button at the left. This pops up a configuration window where the server, database and the authentication method need to be selected. (1)
- The Data Sync services have some limitations on supported data types, etc. It is possible to check the database for issues, by clicking the ’Check Member Schema’ button. (2).
- I added a field with an unsupported data type (geography) to the customer table and the Schema validation provided a warning that indicated that this column would be skipped. (3) This is because geography is not supported in Windows Azure at this point in time.
- It is very important to start the Windows Service of the Azure Data Sync Agent, in order to register the client database with the Windows Azure subscriptions.
Step 3: Add the SQL Azure database
Adding a SQL Server database to the Data Sync subscription is much easier and can be done on the Data Sync management portal, by clicking the Databases tab and clicking the Add button. In the pop up, you just need to provide the server, database and credentials, before saving.
In my example, I am just adding a new empty database, here.
Step 4: Set up the synchronization, through a Sync Group
- In the management portal, click the New Sync Group button in the Sync Groups tab. And add all the databases you want to synch together to the database list(1) and click Next.
- In the next screen, you can add the tables, per database that you want to synchronize. For this demo, I want to synchronize everything, except the events table. (2) You can also enable a schedule for the synchronization (expressed in minutes).
- Once the group is created, you can synchronize the group easily.
Testing the data synchronization
Now I have the databases configured for synchronization, it’s time to play around with it a bit. To execute the synchronization, you can either rely on the synchronization schedule , when configured on the Sync group, or you can manually trigger the synchronization.
1st synchronization: Creating tables in the cloud.
The first synchronization I did created the two synchronized tables on my empty Cloud database and added the data there. One thing to notice is that the Location column (with the spatial data type) was not created on the SQL Azure database.
2nd synchronization: Adding data locally, synchronizing to the cloud.
In this test, I added two customers and some order lines to the local database and synchronized with the Cloud database, to find out that everything was copied without problems.
3rd synchronization: Adding data in the cloud, synchronizing locally.
In this test, I added a customer to the cloud database and synchronized with the on premise database, to find out that everything was copied without problems.
4th synchronization: Adding data on premise and in the cloud, synchronizing bi-directionally
In this test, I added customers in both databases before synchronization to find out that the data was synchronized correctly.
Data conflicts
Now, I wanted to simulate some conflicts to find out how the Data Sync would handle them.
Adding or updating data in both databases, with the same primary key.
I added two different customers, but with the same primary key in both databases. But, surprisingly, the synchronization happened without any issue, but my local customer was overridden by the cloud customer, resulting in lost data.
- Both records (local + cloud) were normally marked as new, so it should be possible to detect this.
- I was looking to find out if the ’most recent’ record won, but in all scenarios, the data from the cloud database survived the synchronization. (probably, because this database was added first)
Maybe something for a next CTP?
Deleting data that is being referenced by a non-synchronized table
Another interesting test was to delete an order line on my cloud database. But this order line was being referenced by a record in my local database (OrderEvent). Knowing that the OrderEvent table is not being synchronized, this should result in a conflict.
Here I did not receive an exception, but I also noticed that my record on my local database still existed, where my cloud record was deleted. So here my data was out of synch.
Maybe something for a next CTP?
Adding a third database
The last test I did, was adding a new empty (cloud) database to the sync group and after the synchronization, everything seemed to work automagically. Great!
Underlying design
- When configuring a table for synchronization, triggers are being created for the three actions: insert, update, delete. They have the following name structure: [TableName]_dss_[action]_trigger.
- These triggers add the data in new tables that are being created during configuration of the sync group. For every data table, a sync table is being added with the following name: DataSync.[TableName]_dss_tracking
- Next to that, we can also see that a lot of new stored procedures are getting created
Conclusion
This CTP looks already very stable and the sychronization (between multiple databases) is very smooth. I am just hoping that there will be a better view or configuration for synchronization exceptions (like explained in the conflicts section).
Sam Vanhoutte, Codit