BizTalk Server Tip #30: Use ETW for high performance tracking

Use Event Tracking for Windows (ETW) for a high performance tracking and debugging of your BizTalk Applications. Since this method uses some high performance structures within Windows to host tracking the impact is near zero if you are not consuming the output and highly valuable if you are troubleshooting a problem. You can access to […]

The post BizTalk Server Tip #30: Use ETW for high performance tracking appeared first on BizTalk360 Blog.

Blog Post by: Ricardo Torre

Azure design Pattern

Azure design Pattern

Microsoft has delivered an ebook around cloud design pattern http://msdn.microsoft.com/en-us/library/dn568099.aspx You can download it as pdf, epub or mobi This book contains twenty-fourdesign patternsand ten relatedguidancetopics, this guide articulates the benefit of applying patterns by showing how each piece can fit into the big picture of cloud application architectures. It also discusses the benefits and […]
Blog Post by: Jeremy Ronk

BizTalk Summit 2014, London- Day 1 summary from tweets

One of the objective for us in the event is to promote the #msbts twitter hash tag and wanted to bring more people from BizTalk integration community to come and share valuable information in public. We are very pleased to see the adaption, about 420 tweets (which is a great number for a BizTalk community). […]

The post BizTalk Summit 2014, London– Day 1 summary from tweets appeared first on BizTalk360 Blog.

Blog Post by: Saravana Kumar

How to deploy a Python module to Windows Azure HDInsight

Introduction

In a previous post, I explained how to run Hive + Python in HDInsight (Hadoop as a service in Windows Azure).

The sample showed a Python script using standard modules such as hashlib. In real life, modules need to be installed on the machine before they can be used. Recently, I had to use the shapely, shapefile and rtree modules.

Here is how I did and why.

A quick recap on how HDInsight works

HDinsight is a cluster created on top of Windows Azure worker roles. This means that each VM in the cluster can be reimaged i.e. replaced by another VM with the same original bits on it. In terms of configuration, the “original bits” contain what was declared when the cluster was created (Windows Azure storage accounts, ). So installing something after the cluster was created can run until a node is reimaged.

NB: In practice, a node would be typically reimaged while the underlying VM host is rebooted because it gets security patches installed. This does not happen every day, but good practices in cloud development should take those constraints into account.

Install on the fly

So the idea is to install the module on the fly, while executing the script.

Python is flexible enough
to let you catch exception while importing modules.

So the top of the Python script looks like this:

import sys
import os
import shutil
import uuid
from zipfile import *

py_id = str(uuid.uuid4())

#sys.stderr will end up in Hadoop execution logs
sys.stderr.write(py_id + '\n')
sys.stderr.write('My script title.\n')
sys.stderr.flush()

# try to import shapely module. If it has already been install this will succeed
# otherwise, we'll install it on the fly
try:
    from shapely.geometry import Point
    has_shapely = True
except ImportError:
    has_shapely = False

if (has_shapely == False):
    # shapely module was not installed on this machine. 
    # let's install all the required modules which are brought near the .py script as zip files
    sys.stderr.write(py_id + '\n')
    sys.stderr.write('will unzip the shapely module\n')
    sys.stderr.flush()
    #unzip the shapely module files (a Python module is a folder) in the python folder
    with ZipFile('shapely.zip', 'r') as moduleZip:
        moduleZip.extractall('d:\python27')
    #unzip the rtree module files in the Python lib\site-packages folder
    with ZipFile('rtree.zip', 'r') as moduleZip:
        moduleZip.extractall('d:\python27\Lib\site-packages')
    #add a required dependency (geos_c.dll) and install the shapefile module (just one .py file, no zip required)
    try:
        sys.stderr.write(py_id + '\n')
        sys.stderr.write('trying to copy geos_c.dll to python27\n')
        sys.stderr.flush()
        shutil.copyfile('.\geos_c.dll', 'd:\python27\geos_c.dll')
        shutil.copyfile('.\shapefile.py', 'd:\python27\shapefile.py')
    except:
        sys.stderr.write(py_id + '\n')
        sys.stderr.write('could not copy geos_c.dll to python27. Ignoring\n')
        sys.stderr.flush()
        pass
    #now that the module is installed, re-import it ...
    from shapely.geometry import Point

# ... and also import the second module (rtree) which was also installed
from shapely.geometry import Polygon
import string
import time
import shapefile
from rtree import index

()

How to package the modules

So the next questions are:

  • where do the .zip files come from?
  • how to send them with the .py script so that it can find them when unzipping?

As the approach is to prepare the module so that it can be unzipped rather than installed, the idea is to install the module manually on a similar machine, then package it for that machine.

An HDInsight machine is installed has shown below:

Well, this was the head node, the worker node don’t look much different:

and this is Python 32 bits (on a 64-bit Windows Server 2008 R2 OS):

If you don’t have that kind of environment, you can just create a virtual machine on Windows Azure. The OS is available in the gallery:

Once you have created that VM, installed Python 2.7 32 bits on it and installed the required modules manually, you can zip them.

Then, you just have to send them with the .py python script. Here is what I have in my example (in PowerShell):

$hiveJobVT = New-AzureHDInsightHiveJobDefinition -JobName "my_hive_and_python_job" `
    -File "$mycontainer/with_python/my_hive_job.hql"

$hiveJobVT.Files.Add("$wasbvtraffic/with_python/geos_c.dll")
$hiveJobVT.Files.Add("$wasbvtraffic/with_python/shapely.zip")
$hiveJobVT.Files.Add("$wasbvtraffic/with_python/my_python_script.py")
$hiveJobVT.Files.Add("$wasbvtraffic/with_python/rtree.zip")
$hiveJobVT.Files.Add("$wasbvtraffic/with_python/shapefile.py")

In the HIVE job, the files must also be added:

(...)

add file point_in_polygon.py;
add file shapely.zip;
add file geos_c.dll;
add file rtree.zip;
add file shapefile.py;

(...)

INSERT OVERWRITE TABLE my_result
partition (dt)
SELECT transform(x, y, z, dt)
    USING 'D:\Python27\python.exe my_python_script.py' as
    (r1 string, r2 string, r3 string, dt string)
    FROM mytable
    WHERE dt >= ${hiveconf:dt_min} AND dt <= ${hiveconf:dt_max};

(...)

Benjamin (@benjguin)

Blog Post by: Benjamin GUINEBERTIERE

A simple example: how to call Python from Hive in HDInsight

Introduction

Hadoop framework distributes code execution automatically in a multi node cluster. This code is also distributed against the dataset. Code development in Hadoop can be done in Java and one has to implement a map function and a reduce function; both manipulate keys and values as inputs and outputs. At a higher level, there are two scripting languages that simplify the code: PIG is a specific scripting language, HIVE looks like SQL. So using HIVE is quite easy. It has a bunch of extension functions (called user defined functions) to transform data like regular expression tools and so on. A developer can add user defined functions, by developing them in Java. Another way to have a procedural logic that complements SQL Set-based language is to use a language like Python:

 

The goal of that post is to show an example of such a combination.

Here is how that could look on a small cluster. The work load is distributed on the different worker nodes:

At a worker node level, a Python process is created by core. Each process receives its part of the whole dataset:

Windows Azure comes with its Hadoop as a service called HDInsight. This allows to execute HIVE, PIG, and other Map/reduce jobs a few minutes after requesting the creation of a cluster. For HIVE, HDInsight comes with a sample table. Let’s run a HIVE + Python job against that hivesampletable table.

Hive and Python Script

In this example, we use a Python module to calculate the hash of a label in the sample table.

Hive is used to get the data, partition it and send the rows to the Python processes which are created on the different cluster nodes. Here is the code:

add file simple_sample.py;

SELECT TRANSFORM (clientid, devicemake, devicemodel)
    USING 'D:\Python27\python.exe simple_sample.py' AS 
    (clientid string, phoneLabel string, phoneHash string)
FROM hivesampletable
ORDER BY clientid LIMIT 50;

This can be read has: in the first 50 rows of hivesampletable table, select clientid, devicemake, devicemodel , pass them to the simple_sample.py python script that can be run with D:\Python27\python.exe. The script will send back columns clientid (a string), phoneLabel (a string) and phoneHash (a string).

Hive sends data to the simple_sample.py scripts. Here is the code of that script:

import sys
import string
import hashlib

while True:
    line = sys.stdin.readline()
    if not line:
        break

    line = string.strip(line, "\n ")
    clientid, devicemake, devicemodel = string.split(line, "\t")
    phone_label = devicemake + ' ' + devicemodel
    print "\t".join([clientid, phone_label, hashlib.md5(phone_label).hexdigest()])

This script expects stdin lines. It parses them, and obtains the columned passed by Hive: clientid, devicemake, devicemodel. From that columns, it deduces the resulting columns: clientid, phoneLabel, phoneHash. In order to calculate phoneHash, it uses an imported module (hashlib). In order to output the result, the python script writes it to stdout, separated by TAB.

Let’s run it with PowerShell

Here is a sample PowerShell script that

  • creates an HDInsight cluster
  • Runs the job
  • Gets the result
  • Removes the cluster

Before running the script, the HIVE and the Python script must have been copied to the the Windows Azure storage:

Here is the PowerShell script:

Import-Module azure
Add-AzureAccount

$Subscription = 'Azdem169A44055X'
$defaultStorageAccount = 'monstockageazure'
$clusterName = 'monclusterhadoop'
$clusterVersion='2.1'
$clusterAdmin = 'cornac'
$clusterPassword = 'LElzgqy#n87'

$passwd = ConvertTo-SecureString $clusterPassword -AsPlainText -Force
$clusterCredentials = New-Object System.Management.Automation.PSCredential ($clusterAdmin, $passwd)

Set-AzureSubscription -SubscriptionName $Subscription -CurrentStorageAccount $defaultStorageAccount
Select-AzureSubscription -Current $Subscription

$storageAccount1 = (Get-AzureSubscription $Subscription).CurrentStorageAccountName
$key1 = Get-AzureStorageKey -StorageAccountName $storageAccount1 | %{ $_.Primary }

New-AzureHDInsightClusterConfig -ClusterSizeInNodes 3 |
    Set-AzureHDInsightDefaultStorage -StorageAccountName "${storageAccount1}.blob.core.windows.net" -StorageAccountKey $key1 `
        -StorageContainerName $clusterName |
    New-AzureHDInsightCluster -Name $clusterName -Version $clusterVersion -Location "North Europe" -Credential $clusterCredentials

Use-AzureHDInsightCluster "monclusterhadoop"

$hiveJobVT = New-AzureHDInsightHiveJobDefinition -File "wasb://[email protected]/simple_sample.hql"
$hiveJobVT.Files.Add("wasb://[email protected]/simple_sample.py")
$startedHiveJobVT = $hiveJobVT | Start-AzureHDInsightJob -Credential $clusterCredentials -Cluster "monclusterhadoop"

$startedHiveJobVT | Wait-AzureHDInsightJob -Credential $clusterCredentials

Get-AzureHDInsightJobOutput -StandardError -JobId $startedHiveJobVT.JobId -Cluster "monclusterhadoop"
Get-AzureHDInsightJobOutput -StandardOutput -JobId $startedHiveJobVT.JobId -Cluster "monclusterhadoop"

Remove-AzureHDInsightCluster -Name $clusterName

Here is a sample execution result:

PS C:\benjguin\BigData_Hadoop\demos\simple> Import-Module azure
Add-AzureAccount


PS C:\benjguin\BigData_Hadoop\demos\simple> Import-Module azure
Add-AzureAccount

$Subscription = 'Azdem169A44055X'
$defaultStorageAccount = 'monstockageazure'
$clusterName = 'monclusterhadoop'
$clusterVersion='2.1'
$clusterAdmin = 'cornac'
$clusterPassword = 'LElzgqy#n87'

$passwd = ConvertTo-SecureString $clusterPassword -AsPlainText -Force
$clusterCredentials = New-Object System.Management.Automation.PSCredential ($clusterAdmin, $passwd)

Set-AzureSubscription -SubscriptionName $Subscription -CurrentStorageAccount $defaultStorageAccount
Select-AzureSubscription -Current $Subscription

$storageAccount1 = (Get-AzureSubscription $Subscription).CurrentStorageAccountName
$key1 = Get-AzureStorageKey -StorageAccountName $storageAccount1 | %{ $_.Primary }

New-AzureHDInsightClusterConfig -ClusterSizeInNodes 3 |
    Set-AzureHDInsightDefaultStorage -StorageAccountName "${storageAccount1}.blob.core.windows.net" -StorageAccountKey $key1 `
        -StorageContainerName $clusterName |
    New-AzureHDInsightCluster -Name $clusterName -Version $clusterVersion -Location "North Europe" -Credential $clusterCredentials



ClusterSizeInNodes    : 3
ConnectionUrl         : https://monclusterhadoop.azurehdinsight.net
CreateDate            : 03/03/2014 14:15:50
DefaultStorageAccount : monstockageazure.blob.core.windows.net
HttpUserName          : cornac
Location              : North Europe
Name                  : monclusterhadoop
State                 : Running
StorageAccounts       : {}
SubscriptionId        : 0fa85b4c-aa27-44ba-84e5-fa51aac32734
UserName              : cornac
Version               : 2.1.4.0.526800
VersionStatus         : Compatible

PS C:\benjguin\BigData_Hadoop\demos\simple> Use-AzureHDInsightCluster "monclusterhadoop"

$hiveJobVT = New-AzureHDInsightHiveJobDefinition -File "wasb://[email protected]/simple_sample.hql"
$hiveJobVT.Files.Add("wasb://[email protected]/simple_sample.py")
$startedHiveJobVT = $hiveJobVT | Start-AzureHDInsightJob -Credential $clusterCredentials -Cluster "monclusterhadoop"

$startedHiveJobVT | Wait-AzureHDInsightJob -Credential $clusterCredentials

Get-AzureHDInsightJobOutput -StandardError -JobId $startedHiveJobVT.JobId -Cluster "monclusterhadoop"
Get-AzureHDInsightJobOutput -StandardOutput -JobId $startedHiveJobVT.JobId -Cluster "monclusterhadoop"
Successfully connected to cluster monclusterhadoop


Cluster         : monclusterhadoop
ExitCode        : 0
Name            : Hive: simple_sample.hql
PercentComplete : map = 100%,  reduce = 100%
Query           : 
State           : Completed
StatusDirectory : b4328d2f-589c-412e-83e5-f8a544cb321c
SubmissionTime  : 03/03/2014 14:36:48
JobId           : job_201403031426_0003


Logging initialized using configuration in file:/C:/apps/dist/hive-0.11.0.1.3.5.0-03/conf/hive-log4j.properties
Added resource: simple_sample.py
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201403031426_0004, Tracking URL = http://jobtrackerhost:50030/jobdetails.jsp?jobid=job_201403031426_0004
Kill Command = "C:\apps\dist\hadoop-1.2.0.1.3.5.0-03\bin\hadoop.cmd" job  -kill job_201403031426_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-03-03 14:37:20,821 Stage-1 map = 0%,  reduce = 0%
2014-03-03 14:37:25,883 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:26,915 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:27,946 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:28,962 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:29,977 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:30,993 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:32,008 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:33,024 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.469 sec
2014-03-03 14:37:34,024 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.469 sec
2014-03-03 14:37:35,040 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 5.469 sec
2014-03-03 14:37:36,055 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.265 sec
2014-03-03 14:37:37,055 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.265 sec
2014-03-03 14:37:38,055 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.265 sec
MapReduce Total cumulative CPU time: 9 seconds 265 msec
Ended Job = job_201403031426_0004
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 9.265 sec   HDFS Read: 266 HDFS Write: 2684 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 265 msec
OK
Time taken: 36.86 seconds, Fetched: 50 row(s)

100004    Motorola Droid X    02a4198bedd37119dabcbb2e8fb4ec92
100015    Apple iPod Touch 4.3.x    d9bc8c98d6a6556656e774a64f7b8bb2
100015    Apple iPod Touch 4.3.x    d9bc8c98d6a6556656e774a64f7b8bb2
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100035    LG VS910    b4bfdffa3e288ed0283ae8c8a37c455e
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100036    Samsung SCH-i400    6b314786cda6123fc06eeb855825aea7
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100041    RIM 9650    d476f3687700442549a83fac4560c51c
100042    Apple iPhone 4.2.x    375ad9a0ddc4351536804f1d5d0ea9b9
100042    Apple iPhone 4.2.x    375ad9a0ddc4351536804f1d5d0ea9b9
100042    Apple iPhone 4.2.x    375ad9a0ddc4351536804f1d5d0ea9b9

Remove-AzureHDInsightCluster -Name $clusterName

Benjamin (@benjguin)

Blog Post by: Benjamin GUINEBERTIERE

BizTalk Server Tip #29: Develop adapter using WCF

When developing new adapters create a Custom WCF Channel or use the WCF LOB SDK as a reference starting point, this will allow you to create a scalable and easy to host adapter that can be used across other .NET solutions. This level of flexibility will make the adapter more likely to be reused somewhere […]

The post BizTalk Server Tip #29: Develop adapter using WCF appeared first on BizTalk360 Blog.

Blog Post by: Ricardo Torre

SharePoint 2013 SP1 Released

Overview
We’ve been hearing Q1 2014 as a release date for SharePoint 2013 SP1 for some time now, and most of us have been thinking we’d get that date at SharePoint Conference 2014.  The conference Yammer feed was just updated with the news that SharePoint 2013 SP1 has been released.
SP2013 SP1 Download Info
http://blogs.technet.com/b/stefan_gossner/archive/2014/02/26/service-pack-1-for-sharepoint-2013-is-now-available-for-download.aspx
Installation Tips
Ensure you […]
Blog Post by: Michael Gerety

BizTalk Server Tip #28: Avoid Orchestrations when possible

Use static routing, content based routing or itineraries to avoid using Orchestrations and use routing of failed messages for advance error handling since messaging doesn’t provide a rich error handling capability. This approach will give you the high performance of messaging and the power of the Orchestrations when necessary. When a high volume of messages […]

The post BizTalk Server Tip #28: Avoid Orchestrations when possible appeared first on BizTalk360 Blog.

Blog Post by: Ricardo Torre