Hi all

So, the other day I had this requirement for a BizTalk pipeline component:

Take an InfoPath formula and convert it into a PDF that is to be sent out via email.
This seemed easy enough. I searched a bit, and found that three simple steps were

  1. Install this: 2007
    Microsoft Office Add-in: Microsoft Save as PDF
  2. In my code, reference Microsoft.Office.InfoPath.dll and Microsoft.Office.InfoPath.FormControl.dll
  3. Write these lines of code:
 1: FormControl
formControl = new FormControl();

 2: formControl.Open(pInMsg.Data);

 3: string output
= Path.GetTempFileName();

 4: formControl.XmlForm.CurrentView.Export(output,

Of course, this would also mean some code that would read the pdf file back in and
then create the output message. But hey, that was just the price I had to pay.

BUT I was being naive As the more clever of my readers have probably all ready realized,
if something is called FORMcontrol, then it is for programs that have a UI. The code
crashed big time at runtime with some ActiveX exception 🙁

Then I remembered that I have a colleague who had previously told me that she had
done this at some point, so I emailed her for her code.

Unfortunately, her code involved taking the form, extracting the XSL from the XSN
file, perform a transformation on the XML using the XSL which will generate HTML and
then using some utility to convert this into PDF. This was more complex than I had
hoped, but I saw no other way. Unfortunately, her code had this line in it:

 1: StreamReader
stream = new StreamReader(XmlFormView.XmlForm.Template.OpenFileFromPackage("View1.xsl"));

which, as you might have guessed also requires a UI, in this case it is used in a
web application. So no go.

So, it seems that I will have to do a lot of dirty work myself 🙁

This turned into quite a list of subtasks:

  • Take the XML document that comes through the pipeline component
  • Take the value of the processing instruction called “mso-infoPathSolution” This processing
    instruction is always present in an InfoPath form and it looks something like this:

    solutionVersion="" productVersion="12.0.0" PIVersion=""
    href="http://path.to/form.xsn" name="urn:schemas-microsoft-com:office:infopath:MyForm:-myXSD-2009-09-21T15-43-10" ?>
  • Take the value of the href “attribute” that is in the value of the processing instruction.
    The href is a URI that points to the XSN that this XML is an instance of, you see.
  • Get the XSN file that is located at the URI.
  • Extract the XSL file that matches the view of the form you want to convert into PDF.
  • Perform the transformation
  • Convert into PDF


So I am now going from the few lines of code I was hoping for to a more complex solution
so lets look at the code:

First of all, I need the value of the processing instruction. This is easily done:

 1: private static string GetHrefFromXml(XmlDocument

 2: {

 3: XmlNode
piNode = infoPathForm.SelectSingleNode("/processing-instruction(\"mso-infoPathSolution\")");

 4: if (piNode
!= null && piNode is XmlProcessingInstruction)

 5: {

 6: var
pi = (XmlProcessingInstruction)piNode;

 7: string href
= pi.Value;

 8: int location
= href.IndexOf(Href);

 9: if (location
!= -1)

 10: {

 11: href
= href.Substring(location + Href.Length);

 12: href
= href.Substring(0, href.IndexOf("\""));

 13: return href;

 14: }

 15: throw new ApplicationException("No
href attribute was found in the procesing instruction (mso-infoPathSolution). Without
this, the location of the form cannot be detected and without the form no PDF can
be generated.");

 16: }

 17: throw new ApplicationException("Required
XML processing instruction (mso-infoPathSolution) not found. Without this, the location
of the form cannot be detected and without the form no PDF can be generated.");

 18: }

The most annoying part is, that the value of a processing instruction can be anything.
In this case, it appears to be a list of attributes like “normal” XML, but since this
is not guaranteed, there is no language support for getting the value of the href
“attribute”. So I chose to use string manipulation to get the value.

After getting the href, I need to get the XSN file from SharePoint Server, where the
form is published. This turned out to be a challenge also.

My first approach was quite simple:

 1: private static byte[]
GetFormByUrl(string href)

 2: {

 3: var
wc = new WebClient

 4: {

 5: Credentials
= CredentialCache.DefaultCredentials

 6: };

 7: return wc.DownloadData(href);

 8: }

This turned out to be something silly, though. What happens when SharePoint and Forms
Server get a request for the XSN file, it assumes some one is trying to fill out the
form. So what I got back was the HTML that the Forms Server was sending a user that
wanted to fill out the form. Then I thought I’d try to do this:

 1: private static byte[]
GetFormByUrl(string href)

 2: {

 3: HttpWebRequest
wr = (HttpWebRequest)HttpWebRequest.Create(href);

 4: wr.AllowAutoRedirect
= false;

 5: WebResponse
resp = wr.GetResponse();

 6: Stream
stream = resp.GetResponseStream();

 7: using (MemoryStream
ms = new MemoryStream())

 8: {

 9: byte[]
buffer = new byte[1024];

 10: int bytes
= 0;

 11: while ((bytes
= stream.Read(buffer,0, buffer.Length)) != -1)

 12: ms.Write(buffer,0,bytes);

 13: return ms.ToArray();

 14: }

 15: }

Basically, using an HttpWebRequest I could ask it to not redirect. This didn’t work
either, since what I then got back was some HTML that basically just said that the
page has moved. Bummer.

But then another colleague who apparently is better at searching than I am found out
that I can add a noredirect parameter to my request that will instruct SharePoint
to not redirect. This is different from my current approach because my current approach
instructs .NET to not follow redirects, whereas this new approach instructs SharePoint
to not ask me to redirect.

So I ended up with something as simple as this:

 1: private static byte[]
GetFormByUrl(string href)

 2: {

 3: string url
= href + "?noredirect=true";

 4: var
wc = new WebClient

 5: {

 6: Credentials
= CredentialCache.DefaultCredentials

 7: };

 8: return wc.DownloadData(url);

 9: }

Simple and beautiful 🙂

Now I have the XSN file and the next issue pops up, naturally; How do I get the XSL
extracted from the XSN file. The XSN file is just a cabinet file with another extension,
so I thought this must be easy. I found out it is not. I searched and searched and
ended up finding all sorts of weird stuff where people used p/invoke to do stuff and
what not. I am confused that Microsoft have not added at least extraction functionality
to the .NET framework, but they haven’t.

I ended up doing this:

 1: private static string ExtractCabFile(string cabFile)

 2: {

 3: string destDir
= CreateTmp(true, "");


 5: var
sh = new Shell();

 6: Folder
fldr = sh.NameSpace(destDir);

 7: foreach (FolderItem
f in sh.NameSpace(cabFile).Items())

 8: fldr.CopyHere(f,

 9: return destDir;

 10: }

This code assumes that the XSN file has been written to a temporary file with the
extension .CAB – this is very important, since the shell command will open up the
.CAB file with the default program, which is then the explorer. After that, all files
in the cabinet file is copied to “destDir” which is just a directory created in the
users Temp directory.

I am quite annoyed to have to go through all this, but that’s how things go sometimes.

So now I have found the href of the form, downloaded the form and extracted its files.
Time for the transformation:

 1: private static MemoryStream
PerformTransformation(XmlDocument xmldoc, string destDir, string view)

 2: {

 3: var
transform = new XslCompiledTransform();

 4: var
stream = new StreamReader(destDir + @"\"
+ view + ".xsl");

 5: XmlReader
xmlReader = XmlReader.Create(stream);

 6: transform.Load(xmlReader);


 8: var
outputMemStream = new MemoryStream();

 9: transform.Transform(xmldoc, null,

 10: stream.Close();

 11: xmlReader.Close();

 12: outputMemStream.Seek(0,

 13: return outputMemStream;

 14: }

So just a normal XSLT transformation, resulting in some HTML that is returned in a

After this, I need to convert it into PDF, which is really simple using a tool we
bought for this:

 1: private static byte[]
GetPdfFromHtml(Parameters param)

 2: {

 3: var
pdfConverter = new PdfConverter

 4: {

 5: LicenseKey
= "SomethingElse - You are not getting the correct
License Key :-)"

 6: };


 8: byte[]
pdfBytes = pdfConverter.GetPdfBytesFromHtmlStream(param.HtmlStream, Encoding.UTF8,
param.DestDir.EndsWith(@"\") ? param.DestDir
: param.DestDir + @"\");

 9: return pdfBytes;

 10: }

We are using the ExpertPDF library
for this. The third parameter for the GetPdfBytesFromHtmlStream method call is the
directory where the cabinet file was extracted to, since this is where all images
used in the form are also kept and they are needed for the PDF to include them.

All in all; the component now works, but it turned out to be a lot more difficult
than I had hoped.

As a last detail, I added a property to my pipeline component that the developer can
use to decide which view to use for the transformation form XML to HTML.

The complete code for the pipeline component will not be available for download, since
this was done for a customer, but I might do something a bit smaller and simpler and
add it to my pipeline
component collection later on.