As you may know, Azure Machine Learning can execute R scripts. You can interactively see the output console. But what about retrieving the result as part of a production call to the API generated by Azure ML?
Let’s test with a word cloud example in R. Mollie Taylor has posted one here (https://gist.github.com/mollietaylor/3671518) that we can reuse in Azure Machine Learning:
The details on how to create an Azure ML workspace, insert a dataset and an R script can be found here:
- http://azure.microsoft.com/en-us/documentation/articles/machine-learning-walkthrough-develop-predictive-solution/
for R, just use that module:
The input of the Web API is set to the input dataset of the R Script and the output is set to the R Device port. As a reminder, here is how the inputs and outputs are positioned in an R Script module:
the detail is available in the help documentation.
In our case the interesting ports to publish are the following:
and
After running the experiment, we can see the result in Azure ML Studio:
So, how could we retrieve the pictures from an API that is published that way:
Here is some sample script in Python that shows how to do it. The script is a modified version of the sample given in the API Help page for Batch Execution. The idea is to get the base64 encoded pictures from the output file and decode them out to local disk.
# -*- coding: utf-8 -*- # How this works: # # 1. Assume the input is present in a local file # 2. Upload the file to an Azure blob - you'd need an Azure storage account # 3. Call BES to process the data in the blob. # 4. The results get written to another Azure blob. # 5. Download the output blob to a local file # # Note: You may need to download/install the Azure SDK for Python. # See: http://azure.microsoft.com/en-us/documentation/articles/python-how-to-install/ import urllib2 import json import time from azure.storage import * import sys import base64 import json storage_account_name = 'a****obfuscated***4' storage_account_key = '/aV****obfuscated***vXA76w==' storage_container_name = 'benjguin' input_file = ur"C:\be****obfuscated***os\WordCloud\conventions.csv" output_file = ur'C:\be****obfuscated***os\WordCloud\myresults.csv' input_blob_name = 'conventions.csv' api_key = r'Cczx****obfuscated***WemQ==' url = 'https://ussouthcentral.services.azureml.net/workspaces/a7c****obfuscated***756/services/d328e03****obfuscated***5c2/jobs' uploadfile=True executeBES=True blob_service = BlobService(account_name=storage_account_name, account_key=storage_account_key) if uploadfile: print("Uploading the input to blob storage...") data_to_upload = open(input_file, 'r').read() blob_service.put_blob(storage_container_name, input_blob_name, data_to_upload, x_ms_blob_type='BlockBlob') input_blob_path = '/' + storage_container_name + '/' + input_blob_name debug_blob = blob_service.get_blob(storage_container_name, input_blob_name) if executeBES: print("Submitting the BES job...") connection_string = "DefaultEndpointsProtocol=https;AccountName=" + storage_account_name + ";AccountKey=" + storage_account_key payload = { "Input": { "ConnectionString": connection_string, "RelativeLocation": input_blob_path } } body = str.encode(json.dumps(payload)) headers = { 'Content-Type':'application/json', 'Authorization':('Bearer ' + api_key)} req = urllib2.Request(url, body, headers) response = urllib2.urlopen(req) result = response.read() job_id = result[1:-1] # remove the enclosing double-quotes url2 = url + '/' + job_id while True: time.sleep(1) # wait a second authHeader = { 'Authorization':('Bearer ' + api_key)} request = urllib2.Request(url2, headers=authHeader) response = urllib2.urlopen(request) result = json.loads(response.read()) status = result['StatusCode'] if (status == 0): print("Not started...") elif (status == 1): print("Running...") elif (status == 2): print("Failed...") break elif (status == 3): print("Cancelled...") break elif (status == 4): print("Finished!") result_blob_location = result['Result'] sas_token = result_blob_location['SasBlobToken'] base_url = result_blob_location['BaseLocation'] relative_url = result_blob_location['RelativeLocation'] url3 = base_url + relative_url + sas_token response = urllib2.urlopen(url3) with open(output_file, 'w') as f: f.write(response.read()) break outputdata=open(output_file) outputtxt=outputdata.read() outputdata.close() s=outputtxt.index('\"{') e=len(outputtxt) o1=outputtxt[s+1:e-3] jsonresult = json.loads(o1) i=1 for gd in jsonresult['Graphics Device']: fname = output_file + "." + str(i) + ".png" print 'writing png #' + str(i) + ' to ' + fname f = open(fname, 'wb') f.write(base64.b64decode(gd)) f.close() i += 1 print("Done!")
Here is a sample execution output:
Uploading the input to blob storage...
Submitting the BES job...
Running...
Running...
Running...
Running...
Running...
Running...
Running...
Finished!
writing png #1 to C:\be***obfuscated***os\WordCloud\myresults.csv.1.png
writing png #2 to C:\be***obfuscated***os\WordCloud\myresults.csv.2.png
Done!
The output sent back by Azure ML looks like this:
R Output JSON
"{"Standard Output":"RWorker pushed \"port1\" to R workspace.\r\nBeginning R Execute Script\n\n[1] 56000\r\nLoading objects:\r\n port1\r\n[1] \"Loading variable port1...\"\r\npng \r\n 2 \r\nnull device \r\n 1 \r\n","Standard Error":"R reported no errors.","visualizationType":"rOutput","Graphics Device":["iVBORw0K***(...)***RvX/wFzB5s8eym6ZgAAAABJRU5ErkJggg==","iVBORw0KGgo***(...)***dVorBuiQAAAABJRU5ErkJggg=="]}"
You can see the pictures
well, Python does:
The resulting files are:
and
R has tons of great data visualisation. Have a look at those blogs for instance:
Benjamin (@benjguin)
Blog Post by: Benjamin GUINEBERTIERE