Azure Data Factory is one of those services in Azure that is really great but that doesn’t get the attention that it deserves.
It is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. Basically, it is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.
I’ve been using Data Factory a lot in the past year and it makes it very easy to create & manage data flows in the cloud. It comes with a wonderful monitoring experience which could be an example for other services like Azure Functions & Azure Event Grid where this would be beneficial.
However, Azure Data Factory was not perfect.
The drawbacks of Azure Data Factory
There were a couple of drawbacks & missing features when using the service:
- Only Supports Data Slicing – The only way to schedule your data pipeline was to run every x minutes, hours or days and process the data that was in that time slice. You couldn’t trigger it on demand or whatsoever.
- No Granular Scheduling Control – No granular control on when the pipeline should be triggered in terms of calendar scheduling ie. only run the pipeline during the weekend.
- Limited Operational Experience – Besides the Monitor-portal, the monitoring experience was very limited. It only supported sending email notifications that were triggered under certain criteria while it did not provide built-in metrics nor integration with Azure Monitor.
- JSON All The Things – The authoring experience was limited to writing everything in JSON. However, there was also support for Visual Studio, but even there it was only to edit JSON files.
- Learning Curve – The learning curve for new people was pretty steep. This is primarily because it was using mainly JSON and I think having a code-free experience here would make things a lot easier.
Last but not least, the most frightening factor was radio silence. And for a good reason…
Enter Azure Data Factory 2.0.
Azure Data Factory 2.0
During Ignite, Microsoft announced Azure Data Factory 2.0 that is now in public preview.
Azure Data Factory 2.0 takes data integration to the next level and comes with a variety of triggers, integration with SSIS on-prem and in Azure, integration with Azure Monitor, control flow branching and much more!
Let’s have a look at a couple of new features.
Introduction of Integration Runtime
A new addition is the concept of an Integration Runtime (IR). It represents a compute infrastructure component that will be used by an Azure Data Factory pipeline will use to offer integration capabilities as close as possible to the data you need to integrate with.
Every integration runtime provides you the capability to move data, execute SSIS packages and dispatch & monitor activities and come in three different types – Hosted in Azure, Self-Hosted (either in the cloud or on-premises) or Azure-SSIS.
Here is an overview of how you can mix and match them.
Basically, the Azure Data Factory instance itself is only in charge of storing the metadata that describes how your data pipelines will look like while at execution time it will orchestrate the processing to the Integration Runtime in specific regions to handle the effective execution.
This allows you to more easily work across regions while the execution is as close as possible.
As far as I can see, the self-hosted Integration Runtime also enables you to integrate with data that is behind a firewall without having to install an agent like you had to do in the past since everything is happening over HTTP.
Another big advantage here is that you can now run SSIS packages as part of your integration pipelines allowing you to re-use existing business intelligence that was already there, but now with the power of the cloud.
You can read more about the various Integration Runtimes in this article.
New pipeline triggers
Triggers, triggers, triggers! I think this is what excited me the most because because Data Factory only supported building data pipelines for scenarios where data slicing was used.
If you had scenarios where this was not the case, then there was no (decent) Data Factory pipeline that could help you.
The first interesting trigger: On-demand execution via a manual trigger. This can be done via .NET, PowerShell, REST or Python and it can be useful when you want to trigger a data pipeline at the end of a certain process, regardless of what the time is.
A second trigger is the scheduler trigger that allows you to define a very granular schedule for when the pipeline should be triggered. This can range from every hour to every workday at 9 AM. This allows you to still have the simple data-slicing model if you prefer that, or define more advanced scheduling if that fits your needs.
For example, we had to run pipelines only during the workweek. With v1, this is not possible and we have pipeline failures every Saturday & Sunday. With Scheduler Triggers we can change this approach and define that it should only be triggered during the week.
Another great addition is that you can now pass parameters to use in your pipeline. This can be whatever information you need, just pass it when you trigger it.
In the future, you will also be able to trigger a pipeline when a new file has arrived. However, by using the manual trigger, you could already set this up with an Azure Event Grid & Logic App as far as I see.
Last but not least – One pipeline can now also have multiple triggers. So, in theory, you could have a scheduler trigger but also trigger it manually via a REST endpoint.
It’s certainly good stuff and you can find a full overview of all supported triggers here.
Data Movement, Data Transformation & Control Flow Activities
In 2.0 the concept of Activities has been seperated into three new concepts: Data Movement, Data Transformation Activities & Control Flow Activities.
Control Flow Activities allows you to create more reactive pipelines in that sense that you can now react on the outcome of the previous activity. This allows you to execute an activity, but only if the previous one had a specific state. This can be success, error or skipped.
This is a great addition because it allows you to compensate or rollback certain steps when the previous one failed or notify people in case it’s required.
Control Flow Activities also provide you with more advanced flow controls such as For Each, Wait, If/Else, Execute other pipelines and more!
Here’s a visual summary:
This tutorial gives you a nice run-through of the new control flow activities.
In the past, one of the biggest pains was authoring pipelines. Everything was in JSON and there was no real alternative besides the rest API.
In v2 however, you can use the tool that gets your job done by choosing from a variety of technologies going from .NET & Python, to pure REST or script it with PowerShell!
You can also use ARM templates that have embedded JSON files to automatically deploy your data factory.
But what I like the most is the sneak peek of the visual tooling that Mike Flasko gave at Ignite 2017:
It enables you to author pipelines by simply dragging & dropping activities in the way your business process is modeled. This abstracts away the JSON structure behind it, allowing people to jump more easily on the Data Factory band wagon.
By having this visual experience it also gives you a clear overview of how all the services tie together and are also a form of documentation to a certain degree. If a new person joins the team he can easily see the big picture.
However, this is not available yet and is only coming later next year.
Mapping data with Data Flow
One feature that is not there yet, but is coming early 2018, is the Data Flow activity that allows you to define data mappings to transform your datasets in your pipeline.
This feature is already in v1 but the great thing is that for this one you will also be able to use the code-free authoring experience where it will help you create those mappings and visualize what they will look like.
We currently use this in v1 and I have to say that it is very nice, but not easy to get there if you need to do this in JSON. This visualizer will certainly help here!
Improved Monitoring experience
As of October, the visual monitoring experience was added to the public preview which is very similar to the v1 tooling.
For starters, it lists all your pipelines and all their run history allowing you to get an overview of the health of your pipelines:
If you’re interested in one particular run, you can drill deeper and see the status of each activity. Next to that, if one has failed you can get more information on what went wrong:
Next to that, you can also filter on certain attributes so that you can see only the pipelines that you’re interested in.
Another great aspect is that Azure Data Factory v2 now integrates with Azure Monitor and now comes with built-in metrics such as run, activity and trigger outcomes. This allows you to configure Azure Alerts based on those and can integrate with your overall alert handling instead of only supporting email notifications. This is a very big plus for me personally!
Diagnostic logs can now also be stored in Azure Storage, send to Azure Event Hubs & analyzed in Operations Management Suite (OMS) Log Analytics!
Read more about the integration with Azure Monitor & OMS here.
Taking security to the next level
One of the most important things in software is security. In the past, every linked service had its passwords linked to it and Azure Data Factory handled this for you.
In v2, however, this approach has changed.
For starters – When you provision a new Azure Data Factory, it will automatically register a new managed Azure AD Application in the default Azure AD subscription.
This enables you not only to copy data from/to Azure Data Lake Store, it also enables you to integrate with Azure Key Vault.
By creating an Azure Key Vault linked service, you can store the credentials of all your other linked services in a vault. This gives you full control of managing the authentication keys for the external services and giving you the capability to have automatic key rolling without breaking your data factories.
Authentication with Azure Key Vault is fully managed by Data Factory based on the Azure AD Application that was created for you. The only thing you need to do is grant your AD Application access on the vault and create a linked service in your pipeline.
More information about handling credentials in Data Factory can be found in this article or read more about data movement security here.
Migration Path to v2
As of today you can already start creating new pipelines for Azure Data Factory v2 or migrate your v1 pipelines over to v2. However, this is currently a manual process and not all features from v1 are currently available such as the Data Flow.
In 2018 they will provide a tool that can migrate your v1 pipelines to v2 for you so if it’s not urgent I’d suggest to sit back and wait for it to land.
Making Data Factory more robust over time
While I’m a fan of the recent changes to Azure Data Factory, I think it can be improved by adding the following features to make the total experience more robust:
- The concept of pipeline versioning where all my pipeline definitions, regardless of how they are created, have a version stamped on it that is being displayed in the Azure/Monitor portal. That way, we can easily see if issues are related to a new version that was deployed or if something else is going on.
- As far as I know, correlation ids are not supported yet in Azure Data Factory and would be a great addition to improve the overall operational experience even more. It would allow you to provide end-to-end monitoring which can be interesting if you’re chaining multiple pipelines, or integrate with other processes outside Data Factory. In the monitoring portal, you can currently see the parameters but would be nice if you could filter on a specific correlation id and see all the related pipelines & activities for that.
- While they are still working on the code-free authoring portal, I think they should provide the same experience in Visual Studio. It would allow us to have best of both words – A visualizer to author a pipeline, jump to the code behind for more advanced things and integrate it with source control without having to leave Visual Studio.
- Integration with Azure Data Catalog would be really great because then we can explore our internal data catalog to see if we have any valuable data sources and connect to them without having to leave the authoring experience.
But we have to be reasonable here – Azure Data Factory v2 was only recently launched into public preview so these might be on their radar already and only come later.
The industry is moving away from using one-data-store-to-rule-them-all and is shifting to a Polyglot Persistence approach where we store the data in the data store that is best suited. With this shift comes a need to build integration pipelines that can automate the integration of all these data stores and orchestrate all of this.
Azure Data Factory was a very good start, but as I mentioned it was lacking on a couple of fronts.
With Azure Data Factory 2.0 it feels like it has matured into an enterprise-ready service that allows us to achieve this enterprise-grade data integration between all our data stores, processing, and visualization thanks to the integration of SSIS, more advanced triggers, more advanced control flow and the introduction of Integration Runtimes.
Data integration is more important than ever and Azure Data Factory 2.0 is here to help you. It was definitely worth the radio silence and I am looking forward to migrating our current data pipelines to Azure Data Factory 2.0 which allows us to simplify things.
Want to learn more about it? I recommend watching the “New capabilities for data integration in the cloud” session from Ignite.
Thanks for reading,
You might have seen various people use this term “reporting and analytics” interchangeably to describe the typical application and use of data — to track the ongoing health of the company and to inform decision making. Reporting helps companies to monitor their business and be alerted to when data falls outside of expected ranges. Reporting is the act of translating raw data into information. This raw data can come from a multitude of data sources such as a production database, an operational database like MySQL, Google Analytics and a CRM system like Salesforce. While the reporting that comes out of these databases are useful, they’re often standardized and summarized versions of raw data. This might be enough information for your business, but to get the most out of your data—you need analytics.
In case of BizTalk Server, your day to day monitoring analytics may cover various aspects of performance like BizTalk Servers performance, CPU performance, etc.. BizTalk360 offers out of the box capabilities that provide a graphical display of the key performance metrics of the BizTalk Server. In any business environment, these metrics will be of critical importance for the management to take business decisions. As of version 8.6, BizTalk360 provides the option for users to generate PDF documents of critical performance metrics at specific time periods depending on the requirement.
We have introduced a new section called Reporting under Analytics section. BizTalk360 enables you to create and switch between multiple reports and widgets of your choice, enabling you to visualize the BizTalk performance information in the way you want it. The Reporting section is mainly introduced to monitor the performance of server, host, IIS and disk usage of the most important BizTalk databases. These are the custom reports that you can access with the help of analytics, which can also be delivered on a recurring basis to a group of end users. The Reports provide a comprehensive, high-level view of business performance for specific audiences.
You can create a schedule and configure it to a schedule type based on the requirement. There are three types of the schedule available:
The Daily report will be sent to the user, on the configured timing (Say 10am every day).
The Weekly report will be sent every week at the configured time for the user (say every Monday 11am).
The Monthly report will be sent once in a month on the set date and time (Say 15th of every month at 11am).
Once the schedule is created, the user can map the schedule to a report. We are able to edit schedules and to, delete schedules. Once the schedule is created, there is an option for disabling the schedule and if you don’t want the email alert for that time duration you can enable the option Disable Schedule.
BizTalk360 is designed such that majority of the functionalities will work out of the box without any configuration. However, certain features will depend on Third-Party files which need to be installed separately. The BizTalk360 Reporting functionality requires certain dependency files for users, to be able to download PDF reports and/or trigger email notifications about the health of the BizTalk Server environment. This dependency file is required for BizTalk Reports to map the report for the created schedule.We have an option to download the dependency file from Settings -> Manage Dependencies section. The Dependency file will get downloaded in the Source folder (Web-> Dependencies ->Analytics). From the Source folder, the file will get copied to the Destination folder (Analytics folder).
Reports can be created and mapped to the schedules which have been created and widgets can also be added to the reports based on the requirement. You can add, remove and customize the widgets based on the necessity. Widgets in the report will have the live data. Reports will be sent to the configured email address at the configured time. Reports have historic data section where you will be able to see the live data and the historical data. Historical data will contain the data of the reports which has been sent in the email alert.
Following are the widgets which are available in the reports. You can add, remove and Customize the widgets based on the necessity.
- BizTalk Hosts Performance
- BizTalk Server Performance
- SQL Server Performance
- MessageBox Database Disk Usage
- MessageBox Database Top 5 Tables
- Tracking Database Disk Usage
- Tracking Database Top 5 Tables
- IIS server Performance
- Messaging Performance
- BizTalk Messaging Performance
- Message Volume by Schema
- Transmission Failure Rate
- by Schema (Top 10)
- by Port (Top 10)
- Event Log Data Count
- SQL Server
- Internet Information Server
- BizTalk360 Monitoring Service
- BizTalk360 Analytics Service
- Message Box KPI
Report notification history
Based on the report settings, BizTalk360 will trigger the reports to the users through email with the PDF attachment of the reporting dashboard. The Report notification history shows you whether the email is delivered to the configured email address or not.
I hope this blog has given some useful information about Reporting section in Analytics. Please don’t hesitate to contact us at firstname.lastname@example.org, if you want to know more about this particular feature or any other feature in BizTalk360.
BizTalk360 has the capability to trigger notifications to custom external notification channels. What this means for customers, is that if you are already using either of these channels in your organization, it becomes easier to receive alerts from BizTalk360 right into these channels. Triggering alerts to custom notification channels works exactly the same way as sending an email notification.
We have constantly been endeavoring to provide our customers with the latest notification channels. Microsoft introduced Teams, which is an entirely new experience that brings together people, conversations, and content—along with the tools that teams need—so they can easily collaborate to achieve more. In one of my previous blogs, I showed how you can integrate Microsoft Teams as a Notification channel in BizTalk360. Now we have decided to bring Teams officially as a notification channel in BizTalk360.
Create the Channel in Microsoft Teams
First, we need to create the Team in the Microsoft Teams application by clicking the ‘Add Team’ at the bottom.
Once the Team has been given a suitable name and was successfully created, we can create a new channel for that Team. (Click the … near the newly created Team and choose ‘Add channel’.
You need to select the Connectors, when you click on the Notification channel you just created, and then “Add” the Incoming WebHook connector.
Copy and save the connector URL to configure it in the BizTalk360 notification Channel.
Configuring the Teams Notification Channel in BizTalk360
You can find the Teams Notification Channel under Settings > Monitoring and Notification > Manage Notification Channels. Select B360.Notifier.Teams. Click “Configure” to open the configuration panel.
Next, you need to provide the web hook connector URL from the Teams that we retrieved earlier and also provide any proxy details, if any at this point.
Now once you click Save, you have successfully configured the Teams Notification Channel.
You can overwrite the web hook connector URL for a specific alarm in the Manage Alarms screens, where you enable the Teams Notification channel.
This is a sample of how the Teams notification will look like.
So thus you can now send notifications to a new notification channel – Teams via BizTalk360.
Event logs are normally used to inform about an important event in the running applications and subsystems, which plays a vital role in troubleshooting problems.
While monitoring multi-server environments, how many times in a day does your administration team log in to multiple servers to check for the root cause of a problem? Have you ever thought of a tool that could help you avoid this time-consuming process? Yes, BizTalk360’s in-built Advanced Event Viewer (AEV) helps you solve this business problem.
Set-up AEV to retrieve the event data you want from your BizTalk and SQL servers in your environment and display it all in a single screen, where you can use the rich query capabilities to search and analyze the data.
How to Set Up AEV in BizTalk360
As a first step, in BizTalk360 settings, you need to configure event logs and event sources that you want to monitor and then Enable AEV for the environment. Now the BizTalk360 Monitoring service will collect event log data for all the configured servers in that environment and store it in BizTalk360 DB.
What ‘s new in v8.6?
BizTalk360 already supports AEV in operations and monitoring section for a long time. While demonstrating BizTalk360 to customers, we asked for “How to monitor a specific event occurring in BizTalk environments on a specific frequency and get an alert based on Threshold conditions”. So, keeping that in mind we have implemented Event Log Data Monitoring in version v8.6.
Let us take this complex scenario to understand more about Event Log Data Monitoring.
Scenario1: User wants to monitor different event logs for multiple servers. Example: If an administrator wants to monitor ESB events from BizTalk server and also wants to ensure there is no problem in SQL servers and also to monitor ENTSSO events form SSO server.
Start Monitoring Event log Data in 3 Steps:
- Enable AEV for an environment
- Create a Data Monitoring Alarm
- Create a schedule under event log and configure the rich filtering conditions based on your business needs as below.
Server Type : BizTalk, SQL
Server Names :BizTalk Server ,SQL Server,SSO Server
Event Type: Error
Event Sources: ESB Itinerary Selector, ENTSSO , MSSQLSERVER,
And group (All these below conditions are true)
Event ID Greater than or equal to 3010
Event ID Less than or equal to 3034
Message Contains 'ESB.ItineraryServices.Generic.WCF/ProcessItinerary.svc'
EventID IS Between 10500-10550
Message Contains ‘SSO Database’
When we looked in more detail, it would normally take us into running a filtering query against configured event sources in servers and alert them when certain conditions are met.
Scenario 2: To detect the same event occurring on different servers. For example, when a certain instance of an orchestration is firstly executed on server 1 and throwing a certain error and next to another instance of the same orchestration throws the same error, while the instance becomes executed on server 2, this will now easily be detected with event log data monitoring.
BizTalk360 brings all these data into a single console and on top of that provides a powerful capability to set alerts based on various thresholds.
You can also set how frequently you wanted to run the queries based on their business requirements such as the frequency of daily validations (ex every 15 mins, 1 hour etc), end of business day or even monthly events such as month-end processing. With these thresholds, the result from the query will be evaluated and in case of any threshold violation, you will be notified via notification channels/Email.
Event Log Details in Alerts
Event Log Details will be listed in alerts by enabling the option ‘Send Event Log details in Mail’ while creating the schedule.
Event Log data in the Data Monitoring Dashboard
Also, the information will be visible on the Data Monitoring dashboard, you can visualize the day calendar view. If you need to understand what happened for an execution, you can click on one of the entries in the day view of the dashboard and view the details as shown below.
- Maintenance is very simple, once after scheduling event log data monitoring, when you disable AEV for the environment, it will stop collecting Event Log data.
- And you don’t Worry about data growth, BizTalk360 purge policy will take care of it.
- Apart from monitoring BizTalk specific SQL server, you can also monitor other SQL servers simply by adding SQL server names for monitoring in the settings section.
Since the first time we organize the event in London (back in 2013), each year I have had the pleasure of participating in the biggest Microsoft Integration event as a speaker but this year I was able to be present in the two biggest events: London and in Redmond (Microsoft Campus)
This year my session was about: BizTalk Server Fast & Loud
In this session, I talked about a hardcore BizTalk topic that addressed the following question: How can you optimize/tuning your BizTalk environment for performance?
Optimizing your BizTalk Server installation is not an easy thing to do because it affects several layers and skills. This topic is very well documented by the product group but the problem is that it is very extensive and complex. This presentation will aim to guide you through the most important steps, operations or task you need to do or be aware in order to boost the performance of your BizTalk Server environment and that you can adjust or follow according to your needs because, depending on your infrastructure, this can be a straightforward operation or a very extensive and hard operation. But I will try to keep it as simple as possible so everyone can understand and follow.
In the links below you can find the resources that I used in both sessions:
BizTalk Server Fast & Loud Slides
Slides used in INTEGRATE 2017 LONDON:
Slides used in INTEGRATE 2017 USA:
BizTalk Server Fast & Loud Video
Like previous years, the event in London is recorded, so if for any reason you could not be present at these events, or if you want to review it again, you can now do it here:
Day 3 Session 5 – Sandro Pereira — BizTalk Server Fast & Loud
I hope you enjoy it and see you next year!