Hadoop : Comment réduire ses coûts HDInsight pour le développement

 

Comme on le voit dans ce diagramme, HDInsight est au c%u0153ur de la plateforme Big Data de Microsoft.

L’offre Azure Data Lake Analytics et l’offre HDInsight ont un positionnement similaire. Suivant que vous pr%u00e9f%u00e9rez utiliser un code portable tout en b%u00e9n%u00e9ficiant d’un service g%u00e9r%u00e9, d’une part ou que vous voulez %u00eatre dans un monde plus centr%u00e9 sur Microsoft d’autre part, vous choisirez plut%u00f4t HDInsight ou plut%u00f4t Azure Data Lake Analytics.

Dans le cas d’HDInsight, vous pouvez ex%u00e9cuter du code Hive, Pig, Java, Python, Scala, sur Hadoop, Storm ou Spark Cela n’est pas tr%u00e8s co%u00fbteux puisque vous pouvez cr%u00e9er les ressources de calcul n%u00e9cessaires uniquement pendant le temps d’ex%u00e9cution des batches, et ce de fa%u00e7on automatis%u00e9e. Si on prend par exemple 2 heures de batch pendant 30 j par mois, cela donne un co%u00fbt de moins de 500 pour un cluster avec 2 n%u0153uds principaux et 10 n%u0153uds de calcul :

NB : cela inclut du support qui permet de poser des questions sur votre code. En effet, supposons que vous ayez un probl%u00e8me avec une requ%u00eate HIVE par exemple; si c’est sur HDInsight, vous pouvez demander au support Microsoft.

 

Cela dit, si vous devez d%u00e9velopper sur cet environnement HDInsight, c’est moins optimal puisque l’environnement va tourner plus r%u00e9guli%u00e8rement (ex : 8 %u00e0 10 h / jour). De plus, on peut cr%u00e9er ou d%u00e9truire un cluster HDInsight qui va retrouver ses donn%u00e9es, mais il peut %u00eatre plus confortable de travailler sur une machine virtuelle que l’on personnalise un peu avec ses outils, qu’on %u00e9teint ou allume en fonction des besoins. Il n’est pas n%u00e9cessaire d’avoir un cluster complet en d%u00e9veloppement tant qu’on n’en n’est pas %u00e0 la phase d’optimisation.

On peut aussi vouloir tester des fonctionnalit%u00e9s nouvelles d’Hadoop.

Pour tout cela, il y a le bac %u00e0 sable (Sandbox) d’Hortonworks, disponible dans la marketplace d’Azure.

Ce mod%u00e8le de machine virtuelle comprend des tutoriels, exemples, des outils pour d%u00e9couvrir Hadoop, Spark et aussi l’utiliser comme VM de d%u00e9veloppement.

Au niveau du prix, cela co%u00fbte le prix de la VM. Le mod%u00e8le de licence est du Bring Your Own License. Voir https://azure.microsoft.com/en-us/marketplace/partners/hortonworks/hortonworks-sandbox/ pour plus d’informations.

Cette page contient un lien pour instancier une VM.

Il faut ensuite remplir un certain nombre de champs, comme pour la cr%u00e9ation d’une machine virtuelle standard. Dans les quelques copies d’%u00e9crans suivantes, on se place dans le cas de la cr%u00e9ation dans le mode %u00ab Resource Manager %u00bb.

Ce mode permet ensuite, lorsque la VM est cr%u00e9%u00e9e, de configurer des r%u00e8gles de pare-feu depuis le portail (dans le mode %u00ab classic %u00bb, il peut %u00eatre n%u00e9cessaire de passer par quelques lignes de commande).

Voici donc comment configurer les acc%u00e8s %u00e0 votre VM :

Depuis le panneau (%u00ab blade %u00bb) de la VM dans le portail portal.azure.com, cliquez sur le groupe de ressources qui vous donnera acc%u00e8s au NSG (Network Security Group) du vNet. L%u00e0, vous avez acc%u00e8s aux diff%u00e9rentes r%u00e8gles entrantes vers le r%u00e9seau

auxquelles vous pouvez ajouter deux r%u00e8gles :

Une pour autoriser votre ou vos adresses IP %u00e0 acc%u00e9der %u00e0 la VM et l’autre pour interdire toutes les autres adresses IP d’Internet. Remarquez le champ %u00ab Priority %u00bb.

Exemple :

et

 

Optionnellement, vous pouvez %u00e9galement donner un nom %u00e0 l’adresse IP publique de votre VM, comme suit :

Par exemple, si vous avez d%u00e9ploy%u00e9 en North Europe et que vous avez donn%u00e9 comme nom hdp23sandbox, vous pouvez ensuite naviguer vers http://hdp23dansbox.northeurope.cloudapp.azure.com:8888 et voir une page d’accueil qui contient entre autres des tutoriels :

Il est bien s%u00fbr aussi possible de se connecter %u00e0 votre VM en ssh.

Vous pouvez configurer l’acc%u00e8s aux comptes de stockage Azure en ajoutant des clefs valeurs dans le fichier /etc/hadoop/conf/hdfs-site.xml (sudo vi /etc/hadoop/conf/hdfs-site.xml)

– key: fs.azure.account.key.{storage account name}.blob.core.windows.net

– value: {{storage account name} account key (primary or secondary)}

Exemple:

Ce qui permet ensuite de travailler sur les fichiers du compte de stockage comme on travaillerait avec HDFS.

Pour rappel, la syntaxe est :

wasb://{container}@{storage account name}.blob.core.windows.net/[{dossier1/dossier2()}[/{file name}]

exemple :

qui correspond %u00e0

Vous pouvez instancier votre propre %u00ab sandbox %u00bb depuis cette page : https://azure.microsoft.com/en-us/marketplace/partners/hortonworks/hortonworks-sandbox/. Cliquez sur le bouton %u00ab Create Virtual Machine > %u00bb

@benjguin

Blog Post by: Benjamin GUINEBERTIERE

How to fix or configure the Signing Properties of a BizTalk Project with PowerShell

How to fix or configure the Signing Properties of a BizTalk Project with PowerShell

In the previous post it was provide a PowerShell script to fix or configure the Deployment Properties of a BizTalk project. However, and this is also nothing new, before deploying a BizTalk project we must first strongly sigh the assemblies involved in the project to give them an unique identification for allowing them to be […]
Blog Post by: Sandro Pereira

How to fix or configure the Deployment Properties of a BizTalk Project with PowerShell

How to fix or configure the Deployment Properties of a BizTalk Project with PowerShell

It is nothing new that before you can deploy a solution from Visual Studio into a BizTalk application, you must first set project properties, especially the Server and the Configuration Database. Otherwise two things may happen: Deployment will fail when you try do to deploy it through Microsoft Visual Studio or will you are redeploying […]
Blog Post by: Sandro Pereira

Is BizTalk360 a Pain-killer or a Vitamin?

When you are building a company, there is a famous analogy –either you build a solution that acts like a pain-killer tablet, which means it’s a must have; or you build a vitamin tablet which is a nice to have thing. To simply put, it’s as simple as “Must have vs. nice to have”. Every […]

The post Is BizTalk360 a Pain-killer or a Vitamin? appeared first on BizTalk360 Blog.

Blog Post by: Saravana Kumar

BizTalk Server: Teach me something new about Flat Files (or not) – Positional Files

BizTalk Server: Teach me something new about Flat Files (or not) – Positional Files

In the second post of this series we follow a simple walkthrough explaining the basic principles to create a flat file schema from a file delimited by symbols. Now it’s time to do the same thing, but this time translating an inbound positional flat-file. A positional flat file it’s normally a file that has fields […]
Blog Post by: Sandro Pereira

Send signed SOAP request without encrypting the header and body in BizTalk Server using WS-Security

Simply follow all the required steps mentioned in my
previous post
and then create a custom behaviour extension to disable the encryption by using
the following code in the AddBindingParameters function.

You can get the complete code for the behvior from this
link.

Buildthe downloaded solutionand GAC the DLL.

Update the machine.config for both 64 and 32 bit with the
following entry.

Send signed and encrypted SOAP request in BizTalk Server using WS-Security

To send the signed and encrypted SOAP request to
a web service from BizTalk server then follow these steps:

Step 1: Create send port (One Way/Solicit Response as per
requirement) select WCF-Custom in transport type.

Step 2: Select customBinding for Binding Type, by default customBinding
has textMessageEncoding and httpTransport, you can change it as per your
need.

Step 3: Add security

A Brief History of Cloud-Based Integration in Microsoft Azure

A Brief History of Cloud-Based Integration in Microsoft Azure

In conversations with students and other integration specialists, I’m discovering more and more how confused some people are about the evolution of cloud-based integration technologies. I suspect that cloud-based integration is going to be big business in the coming years, but this confusion will be an impediment to us all.

To address this I want to write a less technical, very casual, blog post explaining where we are today (November of 2015), and generally how we got here. I’ll try to refrain from passing judgement on the technologies that came before and I’ll avoid theorizing on what may come in the future. I simply want to give a timeline that anyone can use to understand this evolution, along with a high-level description of each technology.

I’ll only speak to Microsoft technologies because that’s where my expertise lies, but it’s worth acknowledging that there are alternatives in the marketplace.

If you’d like a more technical write-up of these technologies and how to use them, Richard Seroter has a good article on his blog that can be found here.

Way, way back in October of 2008 Microsoft unveiled Windows Azure (although it wouldn’t be until February of 2010 that Azure went “live”). On that first day, Azure wasn’t nearly the monster it has become.

It provided a service platform for .NET services, SQL Services, and Live Services. Many people were still very skeptical about “the cloud” (if they even knew what that meant). As an industry we were entering a brave new world with many possibilities.

From an integration perspective, Windows Azure .NET Services offered Service Bus as a secure, standards-based messaging infrastructure.

Over the years, Service Bus has been rebranded several times but the core concepts have stayed the same: reduce the barriers for building composite applications, even when their components have to communicate across organizational boundaries. Initially, Service Bus offered Topics/Subscriptions and Queues as a means for systems and services to exchange data reliably through the cloud.

Service Bus Queues are just like any other queueing technology. We have a queue to which any number of clients can post messages. These messages can be received from the queue later by some process. Transactional delivery, message expiry, and ordered delivery are all built-in features.

Sample Service Bus queue

I like to call Topics/Subscriptions “smart queues.” We have concepts similar to queues with the addition of message routing logic. That is, within a Topic I can define one or more Subscription(s). Each Subscription is used to identify messages that meet certain conditions and “grab” them. Clients don’t pick up messages from the Topic, but rather from a Subscription within the Topic. A single message can be routed to multiple Subscriptions once published to the Topic.

Sample Service Bus Topic and Subscriptions

If you have a BizTalk Server background, you can essentially think of each Service Bus Topic as a MessageBox database.

Interacting with Service Bus is easy to do across a variety of clients using the .NET or REST APIs. With the ability to connect on-premises applications to cloud-based systems and services, or even connect cloud services to each other, Service Bus offered the first real “integration” features to Azure.

Since its release, Service Bus has grown to include other messaging features such as Relays, Event Hubs, and Notification Hubs, but at its heart it has remained the same and continues to provide a rock-solid foundation for exchanging messages between systems in a reliable and programmable way. In June of 2015, Service Bus processed over 1 trillion (1,000,000,000,000) messages! (Starts at 1:20)

As integration specialists we know that integration problems are more complex than simply grabbing some data from System A and dumping it in System B.

Message transport is important but it’s not the full story. For us, and the integration applications we build, VETRO (Validate, Enrich, Transform, Route, and Operate) is a way of life. I want to validate my input data. I may need to enrich the data with alternate values or contextual information. I’ll most likely need to transform the data from one format or schema to another. Identifying and routing the message to the correct destination is certainly a requirement. Any integration solution that fails to deliver all of these capabilities probably won’t interest me much.

VETRO Diagram

So, in a world where Service Bus is the only integration tool available to me, do I have VETRO? Not really.

I have a powerful, scalable, reliable, messaging infrastructure that I can use to transport messages, but I cannot transform that data, nor can I manipulate that data in a meaningful way, so I need something more.

I need something that works in conjunction with this messaging engine.

Microsoft’s first attempt at providing a more traditional integration platform that provided VETRO-esque capabilities was Microsoft Azure BizTalk Services (MABS) (to confuse things further, this was originally branded as Windows Azure BizTalk Services, or WABS). You’ll notice that Azure itself has changed its name from Windows Azure to Microsoft Azure, but I digress.

MABS was announced publicly at TechEd 2013.

Despite the name, Microsoft Azure BizTalk Services DOES NOT have a common code-base with Microsoft BizTalk Server (on second thought, perhaps the EDI pieces share some code with BizTalk Server, but that’s about all). In the MABS world we could create itineraries. These itineraries contained connections to source and destination systems (on-premises & cloud) and bridges. Bridges were processing pipelines made up of stages. Each stage could be configured to provide a particular type of VETRO function. For example, the Enrich stage could be used to add properties to the context of the message travelling through the bridge/itinerary.

Stages of a MABS Bridges

Complex integration solutions could be built by chaining multiple bridges together using a single itinerary.

MABS message flow

MABS was our first real shot at building full integration solutions in the cloud, and it was pretty good, but Microsoft wasn’t fully satisfied, and the industry was changing the approach for service-based architectures. Now we want Microservices (more on that in the next section).

The MABS architecture had some shortcomings of its own. For example, there was little or no ability to incorporate custom components into the bridges, and a lack of connectors to source and destination systems.

Over the past couple of years the trending design architecture has been Microservices. For those of you who aren’t already familiar with it, or don’t want to read pages of theory, it boils down to this:

“Architect the application by applying the Scale Cube (specifically y-axis scaling) and functionally decompose the application into a set of collaborating services. Each service implements a set of narrowly related functions. For example, an application might consist of services such as the order management service, the customer management service etc.

Services communicate using either synchronous protocols such as HTTP/REST or asynchronous protocols such as AMQP.

Services are developed and deployed independently of one another.

Each service has its own database in order to be decoupled from other services. When necessary, consistency is between databases is maintained using either database replication mechanisms or application-level events.”

So the shot-callers at Microsoft see this growing trend and want to ensure that the Azure platform is suited to enable this type of application design. At the same time, MABS has been in the wild for just over a year and the team needs to address the issues that exist there. MABS Itineraries are deployed as one big chunk of code, and that does not align well to the Microservices way of doing things. Therefore, need something new but familiar!

Azure App Service is a cloud platform for building powerful web and mobile apps that connect to data anywhere, in the cloud or on-premises. Under the App Service umbrella we have Web Apps, Mobile Apps, API Apps, and Logic Apps.

Azure App Service

I don’t want to get into Web and Mobile Apps. I want to get into API Apps and Logic Apps.

API Apps and logic Apps were publicly unveiled in March of 2015, and are currently still in preview.

API Apps provide capabilities for developing, deploying, publishing, consuming, and managing RESTful web APIs. The simple, less sales-pitch sounding version of that is that I can put RESTful services in the Azure cloud so I can easily use them in other Azure App Service-hosted things, or call the API (you know, since it’s an HTTP service) from anywhere else. Not only is the service hosted in Azure and infinitely scalable, but Azure App Service also provides security and client consumption features.

So, API Apps are HTTP / RESTful services running in the cloud. These API Apps are intended to enable a Microservices architecture. Microsoft offers a bunch of API Apps in Azure App Service already and I have the ability to create my own if I want. Furthermore, to address the integration needs that exist in our application designs, there is a special set of BizTalk API Apps that provide MABS/BizTalk Server style functionality (i.e., VETRO).

What are API Apps?

This is all pretty cool, but I want more. That’s where Logic Apps come in.

Logic Apps are cloud-hosted workflows made up of API Apps. I can use Logic Apps to design workflows that start from a trigger and then execute a series of steps, each invoking an API App whilst the Logic App run-time deals with pesky things like authentication, checkpoints, and durable execution. Plus it has a cool rocket ship logo.

What are Logic Apps?

What does all this mean? How can I use these Azure technologies together to build awesome things today?

Service Bus review

Service Bus provides an awesome way to get messages from one place to another using either Queues or Topics/Subscriptions.

API Apps are cloud-hosted services that do work for me. For example, hit a SaaS provider or talk to an on-premises system (we call these connectors), transform data, change an XML payload to JSON, etc.

Logic Apps are workflows composed of multiple API Apps. So I can create a composite process from a series of Microservices.

Logic App review

But if I were building an entire integration solution, breaking the process across multiple Logic Apps might make great sense. So I use Service Bus to connect the two workflows to each other in a loosely-coupled way.

Logic Apps and Service Bus working together

And as my integration solution becomes more sophisticated, perhaps I have need for more Logic Apps to manage each “step” in the process. I further use the power of Topics to control the workflow to which a message is delivered.

More Logic Apps and Service Bus Topics provide a sophisticated integration solution

In the purest of integration terms, each Logic App serves as its own VETRO (or subset of VETRO features) component. Decomposing a process into several different Logic Apps and then connecting them to each other using Service Bus gives us the ability to create durable, long-running composite processes that remain loosely-coupled.

Doing VERTO using Service Bus and Logic Apps

Today Microsoft Azure offers the most complete story to date for cloud-based integration, and it’s a story that is only getting better and better. The Azure App Service team and the BizTalk Server team are working together to deliver amazing integration technologies. As an integration specialist, you may have been able to ignore the cloud for the past few years, but in the coming years you won’t be able to get away with it.

We’ve all endeavored to eliminate those nasty data islands. We’ve worked to tear down the walls dividing our systems. Today, a new generation of technologies is emerging to solve the problems of the future. We need people like you, the seasoned integration professional, to help direct the technology, and lead the developers using it.

If any of this has gotten you at all excited to dig in and start building great things, you might want to check out QuickLearn Training’s 5-day instructor-led course detailing how to create complete integration solutions using the technologies discussed in this article. Please come join us in class so we can work together to build magical things.