Converting InfoPath to PDF in BizTalk

Hi all

So, the other day I had this requirement for a BizTalk pipeline component:

Take an InfoPath formula and convert it into a PDF that is to be sent out via email.
This seemed easy enough. I searched a bit, and found that three simple steps were
needed:

  1. Install this: 2007
    Microsoft Office Add-in: Microsoft Save as PDF 
  2. In my code, reference Microsoft.Office.InfoPath.dll and Microsoft.Office.InfoPath.FormControl.dll
  3. Write these lines of code:
 1: FormControl
formControl = new FormControl();

 2: formControl.Open(pInMsg.Data);

 3: string output
= Path.GetTempFileName();

 4: formControl.XmlForm.CurrentView.Export(output,
Microsoft.Office.InfoPath.ExportFormat.Pdf);

Of course, this would also mean some code that would read the pdf file back in and
then create the output message. But hey, that was just the price I had to pay.

BUT I was being naive As the more clever of my readers have probably all ready realized,
if something is called FORMcontrol, then it is for programs that have a UI. The code
crashed big time at runtime with some ActiveX exception 🙁

Then I remembered that I have a colleague who had previously told me that she had
done this at some point, so I emailed her for her code.

Unfortunately, her code involved taking the form, extracting the XSL from the XSN
file, perform a transformation on the XML using the XSL which will generate HTML and
then using some utility to convert this into PDF. This was more complex than I had
hoped, but I saw no other way. Unfortunately, her code had this line in it:

 1: StreamReader
stream = new StreamReader(XmlFormView.XmlForm.Template.OpenFileFromPackage("View1.xsl"));

which, as you might have guessed also requires a UI, in this case it is used in a
web application. So no go.

So, it seems that I will have to do a lot of dirty work myself 🙁

This turned into quite a list of subtasks:

  • Take the XML document that comes through the pipeline component
  • Take the value of the processing instruction called “mso-infoPathSolution” This processing
    instruction is always present in an InfoPath form and it looks something like this:

    <?mso-infoPathSolution
    solutionVersion="1.0.0.2" productVersion="12.0.0" PIVersion="1.0.0.0"
    href="http://path.to/form.xsn" name="urn:schemas-microsoft-com:office:infopath:MyForm:-myXSD-2009-09-21T15-43-10" ?>
  • Take the value of the href “attribute” that is in the value of the processing instruction.
    The href is a URI that points to the XSN that this XML is an instance of, you see.
  • Get the XSN file that is located at the URI.
  • Extract the XSL file that matches the view of the form you want to convert into PDF.
  • Perform the transformation
  • Convert into PDF

 

So I am now going from the few lines of code I was hoping for to a more complex solution
so lets look at the code:

First of all, I need the value of the processing instruction. This is easily done:

 1: private static string GetHrefFromXml(XmlDocument
infoPathForm)

 2: {

 3: XmlNode
piNode = infoPathForm.SelectSingleNode("/processing-instruction(\"mso-infoPathSolution\")");

 4: if (piNode
!= null && piNode is XmlProcessingInstruction)

 5: {

 6: var
pi = (XmlProcessingInstruction)piNode;

 7: string href
= pi.Value;

 8: int location
= href.IndexOf(Href);

 9: if (location
!= -1)

 10: {

 11: href
= href.Substring(location + Href.Length);

 12: href
= href.Substring(0, href.IndexOf("\""));

 13: return href;

 14: }

 15: throw new ApplicationException("No
href attribute was found in the procesing instruction (mso-infoPathSolution). Without
this, the location of the form cannot be detected and without the form no PDF can
be generated.");

 16: }

 17: throw new ApplicationException("Required
XML processing instruction (mso-infoPathSolution) not found. Without this, the location
of the form cannot be detected and without the form no PDF can be generated.");

 18: }

The most annoying part is, that the value of a processing instruction can be anything.
In this case, it appears to be a list of attributes like “normal” XML, but since this
is not guaranteed, there is no language support for getting the value of the href
“attribute”. So I chose to use string manipulation to get the value.

After getting the href, I need to get the XSN file from SharePoint Server, where the
form is published. This turned out to be a challenge also.

My first approach was quite simple:

 1: private static byte[]
GetFormByUrl(string href)

 2: {

 3: var
wc = new WebClient

 4: {

 5: Credentials
= CredentialCache.DefaultCredentials

 6: };

 7: return wc.DownloadData(href);

 8: }

This turned out to be something silly, though. What happens when SharePoint and Forms
Server get a request for the XSN file, it assumes some one is trying to fill out the
form. So what I got back was the HTML that the Forms Server was sending a user that
wanted to fill out the form. Then I thought I’d try to do this:

 1: private static byte[]
GetFormByUrl(string href)

 2: {

 3: HttpWebRequest
wr = (HttpWebRequest)HttpWebRequest.Create(href);

 4: wr.AllowAutoRedirect
= false;

 5: WebResponse
resp = wr.GetResponse();

 6: Stream
stream = resp.GetResponseStream();

 7: using (MemoryStream
ms = new MemoryStream())

 8: {

 9: byte[]
buffer = new byte[1024];

 10: int bytes
= 0;

 11: while ((bytes
= stream.Read(buffer,0, buffer.Length)) != -1)

 12: ms.Write(buffer,0,bytes);

 13: return ms.ToArray();

 14: }

 15: }

Basically, using an HttpWebRequest I could ask it to not redirect. This didn’t work
either, since what I then got back was some HTML that basically just said that the
page has moved. Bummer.

But then another colleague who apparently is better at searching than I am found out
that I can add a noredirect parameter to my request that will instruct SharePoint
to not redirect. This is different from my current approach because my current approach
instructs .NET to not follow redirects, whereas this new approach instructs SharePoint
to not ask me to redirect.

So I ended up with something as simple as this:

 1: private static byte[]
GetFormByUrl(string href)

 2: {

 3: string url
= href + "?noredirect=true";

 4: var
wc = new WebClient

 5: {

 6: Credentials
= CredentialCache.DefaultCredentials

 7: };

 8: return wc.DownloadData(url);

 9: }

Simple and beautiful 🙂

Now I have the XSN file and the next issue pops up, naturally; How do I get the XSL
extracted from the XSN file. The XSN file is just a cabinet file with another extension,
so I thought this must be easy. I found out it is not. I searched and searched and
ended up finding all sorts of weird stuff where people used p/invoke to do stuff and
what not. I am confused that Microsoft have not added at least extraction functionality
to the .NET framework, but they haven’t.

I ended up doing this:

 1: private static string ExtractCabFile(string cabFile)

 2: {

 3: string destDir
= CreateTmp(true, "");

 4:  

 5: var
sh = new Shell();

 6: Folder
fldr = sh.NameSpace(destDir);

 7: foreach (FolderItem
f in sh.NameSpace(cabFile).Items())

 8: fldr.CopyHere(f,
0);

 9: return destDir;

 10: }

This code assumes that the XSN file has been written to a temporary file with the
extension .CAB – this is very important, since the shell command will open up the
.CAB file with the default program, which is then the explorer. After that, all files
in the cabinet file is copied to “destDir” which is just a directory created in the
users Temp directory.

I am quite annoyed to have to go through all this, but that’s how things go sometimes.

So now I have found the href of the form, downloaded the form and extracted its files.
Time for the transformation:

 1: private static MemoryStream
PerformTransformation(XmlDocument xmldoc, string destDir, string view)

 2: {

 3: var
transform = new XslCompiledTransform();

 4: var
stream = new StreamReader(destDir + @"\"
+ view + ".xsl");

 5: XmlReader
xmlReader = XmlReader.Create(stream);

 6: transform.Load(xmlReader);

 7:  

 8: var
outputMemStream = new MemoryStream();

 9: transform.Transform(xmldoc, null,
outputMemStream);

 10: stream.Close();

 11: xmlReader.Close();

 12: outputMemStream.Seek(0,
SeekOrigin.Begin);

 13: return outputMemStream;

 14: }

So just a normal XSLT transformation, resulting in some HTML that is returned in a
stream.

After this, I need to convert it into PDF, which is really simple using a tool we
bought for this:

 1: private static byte[]
GetPdfFromHtml(Parameters param)

 2: {

 3: var
pdfConverter = new PdfConverter

 4: {

 5: LicenseKey
= "SomethingElse - You are not getting the correct
License Key :-)"

 6: };

 7:  

 8: byte[]
pdfBytes = pdfConverter.GetPdfBytesFromHtmlStream(param.HtmlStream, Encoding.UTF8,
param.DestDir.EndsWith(@"\") ? param.DestDir
: param.DestDir + @"\");

 9: return pdfBytes;

 10: }

We are using the ExpertPDF library
for this. The third parameter for the GetPdfBytesFromHtmlStream method call is the
directory where the cabinet file was extracted to, since this is where all images
used in the form are also kept and they are needed for the PDF to include them.

All in all; the component now works, but it turned out to be a lot more difficult
than I had hoped.

As a last detail, I added a property to my pipeline component that the developer can
use to decide which view to use for the transformation form XML to HTML.

The complete code for the pipeline component will not be available for download, since
this was done for a customer, but I might do something a bit smaller and simpler and
add it to my pipeline
component collection later on.

eliasen

BizTalk 2009 – Configuring High Receiving Throughput

While on a current project and having a need to tweak (as always) how well BTS is
processing these receives, I came across a Perf document on BTS 2009 Receiving.

This document below deals mainly with netTCP receive locations – oneway ports + oneway
Orchs.

Enjoy.

——

BizTalk Server 2009 Performance Optimization Guide

Brief Description

The BizTalk Server 2009 Performance Optimization Guide provides prescriptive guidance
on the best practices and techniques that should be followed to optimize BizTalk Server
performance.

http://www.microsoft.com/downloads/details.aspx?FamilyID=24660797-0C8F-4687-9D5F-B76D99B37EC2&displaylang=en

BizTalk 2006 R2 SP1 Beta hits


  Service
Pack 1
Beta


The BizTalk team are pleased to announce the availability of the beta release
of Service Pack 1 for BizTalk Server 2006 R2. We would like to offer you the opportunity
to download this early preview of the Service Pack and encourage you to test it out
and let us have any feedback before we release it to the BizTalk community.

Microsoft BizTalk Server 2006 R2 Service Pack 1 (SP1) is an update for BizTalk Server
2006 R2.  It includes fixes to issues that have been reported through our customer
feedback platforms, as well as internally discovered issues. To see a listing of the
customer-reported issues that are fixed in this service pack, go to http://go.microsoft.com/fwlink/?LinkId=164985. 
For a description of some of the other updates included in this service pack, see What’s
new in BizTalk Server 2006 R2 SP1
(http://go.microsoft.com/fwlink/?LinkId=163958). 

A guide for this service pack is also available on the download page.  This guide
contains important information to read before you install SP1.  It also provides
installation instructions and a section on troubleshooting installation problems.
Finally, it contains a section on known issues in this service pack release.

The service pack can be downloaded from here and
any feedback or issues you encounter can be reported here

Thank you in advance!

Regards

BizTalk Product Group

BizTalk Integration with SharePoint Whitepaper published

This whitepaper is typically centered around the BTS SharePoint Adapter and WSS V3.0/MOSS
2007.
(I’ll be posting details on SharePoint 2010 integration shortly 🙂 )

http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=dd4e843d-2121-4016-8391-d763d0ff0a08

BizTalk + SharePoint: 1+1=3: Integration Best Practices

Brief Description

The integration of Microsoft BizTalk Server 2009 and Microsoft Office SharePoint 2007
brings a whole new set of capabilities to end users. Microsoft Office SharePoint Server
gives BizTalk Server a “face,” providing human workflow features and dashboard functionality.

BizTalk Server 2006 R2 – Service Pack 1 announced

Microsoft today announced Service Pack 1 for BizTalk Server 2006 R2. Note that this is for the previous version of BizTalk Server, not the current version (BizTalk Server 2009). No doubt a service pack for BTS 2009 will follow in due course.

The service pack is currently at beta. It has only been announced, not released. You can download the beta if you log onto the Connect site.

See http://blogs.msdn.com/biztalkcrt/archive/2009/10/09/announcing-biztalk-2006-r2-sp1.aspx

There are a couple of new features. See http://msdn.microsoft.com/en-us/library/ee532481(BTS.20).aspx.

For a list of bug fixes, seehttp://support.microsoft.com/kb/974563.

Fixed a regression issue found in MsgBoxViewer 10.15 and implemented Query Timeout for .VBS queries

Fixed a regression issue found in MsgBoxViewer 10.15 and implemented Query Timeout for .VBS queries

Several customers and MS engineers reported me recently some very long collect statements scenarios using version 10.15 and sometimes infinite ones.



1) Some bugs were indeed identified in some VBS queries which might create looping scenarios in some VBS query execution .
These bugs were so fixed.



2) A long collect statement can occur also when .VBS queries (belonging mainly to the “Server Info” query category) are running too long time.


MBV 10.15 enumerates indeed in some .VBS queries some registry keys or registry values, and when targeting several remote servers of the BizTalk group it can be sometimes long.


I decided so to implement a timeout mecanism to stop a VBS, .BAT, or .CMD query after 30 secs by default.
This timeout value can be changed of course in the MBV UI via the “VBS/BAT/CMD Query Timeout” Global Option.



3) When you have a long query situation or infinite one, please do the following :



–  Note in the status bar of MBV UI  the pending query (it will  be also logged in the Status log file)


–  Kill MBV (as stopping via the UI the collect statement during a query execution will NOT work,  I don’t want to kill my query execution thread)


–  Keep the generated status log file and send it to me later (it is precious for me to know what happened before MBV was killed)


–  Restart MBV and then uncheck the queries which seem to run too long or infinite and then start a new collect statement (usually these queries should belong to the “Server Info” category) to exclude them from the  collect statement


 


Current build 10.15.7777 available on this blog contain both the VBS queries fix and the VBS Query Timeout implementation. 


Sorry for these introduced regressions and continue to send me your feedbacks or possible bugs found


JP.

SharePoint 2010 Upgrade Verification Tool

Hi folks, I recently came across a tool (or enhancements to stsadm)
that runs a series of rules against your farm to see if it passes some of the core
requirements for upgrading to ’a future release of SharePoint’ from WSS 3.0/MOSS 2007so
I’m guessing SP2010 🙂

http://technet.microsoft.com/en-us/library/dd793607.aspx

Check it out and let me know what you think – I haven’t run it yetlooking into it.

Have fun,

Mick.