Logic Apps: How to process CSV files

Logic Apps: How to process CSV files

One of the most ancient and common standards for message representation is using text files (Flat Files) like CSV (Comma Separated Values) or TXT files, many of which are custom-made for their systems. And do not be fooled and think these messages are outdated and rarely used. Still today, many existing integrations and new integrations are made based on Flat Files.

No matter if you are working with Logic Apps Consumption or Logic Apps Standard, there are only two ways to generate a Flat File Schema:

  • Using a BizTalk Server Visual Studio Project (Visual Studio 2019 or older, depending on the BizTalk Server version you are using), this is a good option if you have a BizTalk Server environment and if you are migrating BizTalk Server solutions.
  • Or using the Azure Logic Apps Enterprise Integration Tools. This is an add-in for the Integration Account project to support XML Schema creation, XML transform designer, and Flat File Schema generation for Visual Studio 2019.

In this blog post, we will address how to create a flat file schema using the Azure Logic Apps Enterprise Integration Tools – in this case, a CSV file – and how to process it using the Logic Apps Consumption. The biggest difference between using Logic App Consumption or Standard is that to process a flat file using Consumption, it is required to have an Integration Account, which is not needed in Standard.

How to create a Flat File Schema for a CSV file

Normally, CSV files contain structured data so that a comma or semicolon separates individual items in the file, and each record is on a new line. In the image below, the CSV file contains five columns:

  • First name
  • Last name
  • Birth year
  • City
  • And Zip Code

and on each row, there is a person’s data.

We are going to use this sample as our example.

To create a flat file schema for this CSV file, we need to:

  • Open Visual Studio 2019 and create a new project by selecting Create a new project option.
  • On the Create a new project window, search for the Integration Account template, select the Integration Account template, and click Next.
  • On the Configure your new project window, give a Project name and a Solution name and set the Location. Then click Create.
  • Once the Visual Studio project is created, right-click the project name and select the option Add > New Item…
  • On the Add New Item window, on the left tree, select Logic Apps and then select the Flat File Schema, give it a name, and click Add.
  • This will start the Logic Apps Flat File Schema Wizard. On the Welcome screen, click Next >.
  • On the Flat File Schema Information screen, set the following configurations and then click Next >:
    • On the Instance file property, click the Browse button to locate the flat file from which the schema will be generated.
    • On the Record name property, give a name to define the root node name of your XML message—for example, People.
    • On the Target namespace property, set your desired namespace or leave the default.
    • On the Code page property, select UTF-8 (65001) from the drop-down selection list.
  • Because this is a small message on the Select Document Data screen, leave the default selected data and click Next >.
  • We will separate or define what separates a person on this first Select Record Format screen. And in this case, it is the new line. So, select the option By delimiter symbol and click Next >.
  • On the Delimiter Record screen, the child delimiter property is already defined to be the new line ({CR}{LF}), so leave the default configuration and click Next >.
  • On the Child Elements screen, on the first line, set the following configuration:
    • Element Name: Person
    • Element Type:  Repeating record
  • And on the rest of the lines, set the Element Type to Ignore since all lines represent the Person we will define on the first line. And then click Next >.
  • On the Schema View screen, you notice our Schema representation is being created. Click Next >.
  • Now, back to the Select Document Data screen, you notice that only the first line is automatically selected, excluding the new line character, and this is what we want. We will be defining the structure of the person object. Click Next >.
  • A semicolon separates this document, so in the Select Record Format screen, select the option By delimiter symbol and click Next >.
  • On the Delimiter Record screen, change the child delimiter property to a semicolon (;), and click Next >.
  • Now you notice that we have all fields separated in the Child Elements screen. Give the following configuration and click Next >.
    • FirstName (string)
    • LastName (string)
    • BirthYear (integer)
    • City (string)
    • ZipCode (string)
  • On the Schema View screen, you notice our Schema representation is finished. Click Finish.
  • And the Schema will open inside Visual Studio.

Now that we have our schema created, we need to upload it to our Integration Account – if you don’t have it, you need to create an Integration Account inside the Azure Portal. To do that:

  • In the Azure Portal, access your Integration Account and select the Schemas option under Settings.
  • On the Schemas page, click in + Add and on the Add Schema panel on the Schema property, browse for the schema we just created and click Ok.
  • This will add the schema to your Integration Account, which can be used inside your Logic Apps.

Now we need to create a Logic App to process this CSV. To do that, we need to:

  • Create a Logic App and use the HTTP Request-Response template – of course, this can be adapted to your requirements.
  • Once the Logic App design is loaded, click Save and return to your Logic App page. From there, click Workflow settings under Settings.
  • On the Workflow settings page, under the Integration account section, select the integration account in which we add our previous flat file schema on the Select an Integration account property. And click Save.
  • Now back to our Logic App designer. After the When a HTTP request is received trigger, select Add an Action.
  • On the Choose an operation panel, search for Flat File and then choose the Flat File > Flat File Decoding action.
  • On the Flat File Decoding action, set the following configurations:
    • Content: Body from the When a HTTP request is received trigger.
    • Schema Name: Persons – the flat file we created earlier.
  • Now, select the Response action and set the following configurations:
    • Status Code: 200
    • Headers: Content-Type: text/xml
    • Body: Body from the Flat File Decoding action.
  • And finally, save the Logic App.

Now if we open Postman and send the following request, the expected response will be the same payload translated to XML, as you see in the picture below:

Hope you find this useful! So, if you liked the content or found it useful and want to help me write more content, you can buy (or help buy) my son a Star Wars Lego! 

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc.

He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.
View all posts by Sandro Pereira

Flat File Support inside Logic Apps

Flat File Support inside Logic Apps

We live in a REST era where JSON or XML is our text-based format default choice of text-based formats and widespread use by major corporations. However, one of the most ancient and common standards for message representation is using text files (Flat Files) like CSV (Comma Separated Values) or TXT files, many of which are custom-made for their systems. Do not be fooled and think these messages are outdated and rarely used. A good example is EDI messages, which are used extensively by large companies, so it is often necessary to transform text files into XML and vice versa.

Still today, many existing integrations and new integrations are made based on Flat Files.

What are Flat Files?

The standard definition states that a flat file is a collection of data stored in a two-dimensional database in which similar yet discrete strings of information are stored as records in a table. The table’s columns represent one database dimension, while each row is a separate record. Or in other words, it is a type of data storage file in which data is stored as plain text, often in a table-like structure with rows and columns. Each row represents a single record, while columns represent fields or attributes of the data. The information stored in a flat file is generally alphanumeric with little or no additional formatting. The structure of a flat file is based on a uniform format as defined by the type and character lengths described by the columns.

Flat files serve various purposes in software development, primarily for data storage, exchange, and processing tasks. They are widely used due to their simple structure, human readability, and ease of manipulation across different platforms and applications. For instance, flat files are commonly employed in data import and export operations, where applications or systems with diverse data storage mechanisms must communicate or transfer data.

A flat-file instance message is a text file that can contain three logical parts:

  • A header.
  • A body.
  • And a trailer.

In that order. Of course, both the header and the trailer are optional. The following example shows a flat-file instance message consisting of all three parts, with the body in bold type:

Sandro Pereira
Porto, Portugal
PO,1,BOOK,4415
TRANS-1

If you come from a Microsoft BizTalk Server background or if you are migrating a BizTalk Server project using these types of Flat-Files, you may know that the Flat file disassembler’s parsing allows you to specify:

  • The Header Schema in the Header schema design-time property of the flat file disassembler or the XMLNORM.HeaderSpecName message context property.
  • The Body Schema in the Document schema design-time property of the flat file disassembler or the XMLNORM.DocumentSpecName message context property.
  • And the Trailer Schema in the Trailer schema design-time property of the flat file disassembler or the XMLNORM.TrailerSpecName message context property.

However, Logic Apps only supports Body Schemas. You cannot have different schemas for Headers, Bodies, and Trailers. You can still process these types of flat-files messages but in a different approach with a single Schema dividing what is a header, body and trailer in different records (structures).

Flat-File Schema Types

Within a particular part of a flat-file instance message, different data items are grouped into records, which themselves can contain sub-records and, ultimately, individual data items known as fields. These records and fields are distinguished from each other using one of two different basic methodologies.

  • The first methodology, known as positional, defines each data item as a pre-established length, with pad characters being used to bring a shorter item of data up to its expected length.
  • The second methodology, known as delimited, uses one or more special characters to separate items of data from each other. This methodology avoids the need for otherwise superfluous pad characters. Still, it introduces some special considerations when the data itself contains the character or sequence of characters being used as a delimiter.

Positional Flat Files

Positional records within a flat-file instance message contain individual fields (items of data) that are each of a predefined length. The fields are parsed according to these lengths. For example, consider the following positional record from a flat-file instance message containing an id, country code, client name, and Country name:

01 PT Sandro Pereira                       Portugal

A reasonable definition for this record in a flat-file schema can be described as follows:

  • A positional record named Client contains the following fields:
    • An attribute named id that is left-aligned, three characters in length, with a zero character offset.
    • An element named countryCode that is left-aligned, three characters in length, with a zero character offset.
    • An element named name that is left-aligned, 37 characters in length, with a zero character offset.
    • An element named country that is left-aligned, and the length is until the end of the line.

Given these record and field definitions, the Flat file disassembler will produce the following XML equivalent of this record:


   PT 
   Sandro Pereira                       
   Portugal

There are several considerations related to positional records that will affect how the record is parsed when received and constructed when sent, including:

  • The character used to fill the unused portion of each field, known as the pad character.
  • An optional tag within the record can be used to distinguish the record from other similar records. Tags usually occur at the beginning of the record but are allowable anywhere within it. Positional records can be defined to have a tag or not have a tag, but once defined, the tag must be present or not, based on the definition.
  • How data is justified within a fixed length field relative to the accompanying pad characters.
  • Positional records nested within other positional or delimited records.
  • Positional records with field lengths specified as a specific number of bytes rather than a specific number of characters.

Notes:

  • Suppose your flat file contains both delimited and positional records. In that case, you must set the Structure property of the root node to Delimited and the Structure property of subordinate record nodes to either Delimited or Positional as appropriate.
  • Fields in positional records have a limit of 50000000 characters.

Delimited Flat Files

Delimited records within a flat-file instance message contain nested records and/or individual fields (items of data) that are separated by a predefined character or set of characters. The fields are parsed according to these separating delimiters. For example, consider the following delimited records from a flat-file instance message, which contain three client lines to add to our internal system hypothetically:

Sandro;Pereira;1978;Crestuma;4415
José;Silva;1972;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

A reasonable definition for this record in a flat-file schema can be described as follows:

  • A delimited repeating record named Client with child delimiter {CR}{LF}
  • And  delimited elements with child delimiter ;
    • firstName
    • lastName
    • birthYear
    • city
    • zipCode

Given these record and field definitions, the Flat file disassembler produces the following XML equivalent of these records.


   Sandro
   Pereira
   1978
   Crestuma
   4415


...

...

There are several considerations related to delimited records that will affect how the record is parsed when received and constructed when sent, including:

  • The character or characters are used to override the interpretation of delimiters so that they are treated as part of the data.
  • An optional tag at the beginning of the record can be used to distinguish the record from other similar records.
  • How data is justified within fields with minimum lengths relative to the accompanying pad characters.
  • Positional records nested within other delimited records.
  • How data is justified within a fixed length field relative to its accompanying pad characters.

Preservation and suppression of delimiters when flat-file messages are received and sent.

Notes:

  • Suppose your flat file contains both delimited and positional records. In that case, you must set the Structure property of the root node to Delimited and the Structure property of subordinate record nodes to either Delimited or Positional as appropriate.
  • Delimited fields in flat files have a limit of 50000000 characters.

How do Logic Apps process the text files (Flat Files)?

When building a logic app workflow in Azure Logic Apps, you can encode and decode flat files using the Flat File built-in connector actions and a flat file schema for encoding and decoding. You can use Flat File actions in multi-tenant Consumption logic app workflows and single-tenant Standard logic app workflows.

  • Inside Logic Apps Consumption, Integration Account is required to store the flat-files schemas and use the Flat File built-in connector.
  • Inside Logic Apps Standard, there is no need for having an Integration Account since Schemas are supported built-in. However, if you desire, you can still use the Integration Account.

While no Flat File triggers are available, you can use any trigger or action to feed the flat-file content into your workflow. For example, you can use a built-in connector trigger, a managed or Azure-hosted connector trigger available for Azure Logic Apps, or even another app, like the Request built-in trigger or File System trigger.

Flat File Encoding action

The Flat File Encoding action allows you to convert an XML message into a flat file. Note that this action does not validate the incoming XML message. For that, you need to use the XML Validation action.

In Logic Apps Consumption, this action allows the following inputs:

  • The Content property, where you specify the XML message you want to encode to flat-file.
  • The Schema Name property is where you choose the flat-file body Schema or the Document schema.
    • If the schema list is empty, either your logic app resource isn’t linked to your integration account or doesn’t contain any schema files.
  • And then we have the following optional properties:
    • Mode of empty node generation, where we specify the mode to use for empty node generation with flat file encoding. Possible values are ForcedDisabled, HonorSchemaNodeProperty, or ForcedEnabled.
    • And XML Normalization allows you to enable or disable XML normalization in flat file encoding. Possible values are Yes or No.

In Logic Apps Standard, this action allows the following inputs:

  • The Content property, where you specify the XML message you want to encode to flat-file.
  • In the Source property, we select either LogicApp or IntegrationAccount as your schema source.
  • The Name property is where you choose the flat-file body Schema or the Document schema.
    • If the schema list is empty, either your logic app resource isn’t linked to your integration account, your integration account doesn’t contain any schema files, or your logic app resource doesn’t contain any schema files.
  • And then we have the following optional properties:
    • Mode of empty node generation, where we specify the mode to use for empty node generation with flat file encoding. Possible values are ForcedDisabled, HonorSchemaNodeProperty, or ForcedEnabled.
    • And XML Normalization allows you to enable or disable XML normalization in flat file encoding. Possible values are Yes or No.

Flat File Decoding action

The Flat File Decoding action allows you to convert a flat-file message into an XML message. Note that this action does not validate the outcome XML message. For that, you need to use the XML Validation action.

In Logic Apps Consumption, this action allows the following inputs:

  • The Content property, where you specify the flat-file message you want to decode to XML.
  • The Schema Name property is where you choose the flat-file body Schema or the Document schema.
    • If the schema list is empty, either your logic app resource isn’t linked to your integration account or doesn’t contain any schema files.

In Logic Apps Standard, this action allows the following inputs:

  • The Content property, where you specify the flat-file message you want to decode to XML.
  • In the Source property, we select either LogicApp or IntegrationAccount as your schema source.
  • The Name property is where you choose the flat-file body Schema or the Document schema.
    • If the schema list is empty, either your logic app resource isn’t linked to your integration account, your integration account doesn’t contain any schema files, or your logic app resource doesn’t contain any schema files.

Unlike BizTalk Server, where this Syntax Transformations or Data translation typically happens inside Receive or Send Pipelines. Inside Logic Apps, they happen inside our business process, aka Logic App using the Flat File connector.

Hope you find this useful! So, if you liked the content or found it useful and want to help me write more content, you can buy (or help buy) my son a Star Wars Lego! 

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc.

He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.
View all posts by Sandro Pereira

BizTalk Server: What are Flat Files?

BizTalk Server: What are Flat Files?

Transformations are one of the most common components in the integration processes. They act as essential translators in the decoupling between the different systems to connect. We usually associate the transformations of documents with BizTalk Server Maps, but the reality is that there are two types of transformations:

  • Semantic Transformations: This type of transformation usually occurs only in BizTalk maps. Here the document maintains the same represented syntax (XML) but changes its semantics (data content). This type of transformation is typically one-way since when we add and aggregate small parts of the information that compose the document into another different document, we may miss important details for its reconstruction.
  • Syntax Transformations: This type of transformation occurs in the receive or send pipelines and aims to transform a document into another representation, e.g., CSV to XML. Here the document maintains the same data (semantics) but changes the represented syntax. i.e., we translate the document, but typically don’t modify the structure. Usually, this type of transformation is bidirectional. Since we still have the same semantic content, we can apply the same transformation logic and obtain the document in its original format. Common examples of these transformations are also conversions between HL7 and XML or EDI and XML.

Sometimes also called Data transformation and Data translation, for that order.

This blog is an introductory note for those taking the first steps in this technology.

What are Flat Files?

One of the most ancient and common standards for message representation is using text files (Flat Files) like CSV (Comma Separated Values) or TXT files, many of which are custom-made for their systems.

However, over time, XML and, nowadays, JSON have become the standard message format because of their widespread use by major corporations and open-source development efforts. However, do not be fooled and think these messages are outdated and rarely used. A good example is EDI messages, which are used extensively by large companies, so it is often necessary to transform text files into XML and vice versa.

In the context of Microsoft BizTalk Server, a flat-file instance message is a text file that can contain three logical parts:

  • A header.
  • A body.
  • And a trailer.

In that order. Of course, both the header and the trailer are optional. The following example shows a flat-file instance message consisting of all three parts, with the body in bold type:

Sandro Pereira
Porto, Portugal
PO,1,BOOK,4415
TRANS-1

For the flat file disassembler to correctly distinguish the header, the body, and the trailer of a flat-file instance message, you must create and configure a separate schema for each of them.

Flat File Message Headers

The Flat file disassembler’s parsing of the optional flat-file instance message header is controlled by the flat file schema that you have configured in the Header schema design-time property of the flat file disassembler or the XMLNORM.HeaderSpecName message context property. If you have not specified a schema using one of these two methods, the flat file disassembler assumes that the flat file instance message does not contain a header.

For outbound flat-file instance messages, you can configure the flat file assembler to produce a header by specifying the appropriate schema in its Header Specification Name design-time property or the XMLNORM.HeaderSpecName message context property. 

Data found in inbound flat-file instance message headers can be preserved and utilized in two different ways.

  • First, flat-file instance message headers can be saved in their entirety within the message context of the body for later restoration as the header of a corresponding outbound flat-file instance message. You can use the recipient pipeline’s Preserve header property to specify that the header should be preserved. And if a header is specified in the Flat file assembler, the preserved header will be used on the outbound message.
  • Second, individual data items from a flat-file instance message header can be copied to the message context associated with the flat-file message body by specifying property promotion for one or more of the fields in the corresponding schema.

Flat File Message Bodies

A flat-file instance message body, which is required, is what the Flat file disassembler processes into one or more XML instance messages. To know what data to expect in an inbound flat-file instance message body, you must configure the Flat file disassembler with the flat file schema corresponding to the body. You can specify the schema by using the Document schema design-time property of the flat file disassembler or the XMLNORM.DocumentSpecName message context property. Because flat file instance messages must have a body part, you must configure the appropriate schema using one of these two methods.

For outbound flat-file instance messages, the Flat file assembler can dynamically determine the appropriate flat-file schema for the body of the instance message. The Flat file assembler determines the appropriate schema from the message type, which is a combination of the target namespace and the root element’s name, both of which must be present in the XML version of the outbound message. Alternatively, you can explicitly configure the flat-file schema to be used by configuring the Document schema design-time property of the Fflat file assembler or the XMLNORM.DocumentSpecName message context property.

Data found in inbound flat-file instance message bodies can be copied to the corresponding message context by specifying property promotion in the flat-file schema being used by the Flat file disassembler to process the inbound instance message. Likewise, data in the message context can be copied back into outbound flat-file instance messages by specifying property demotion in the flat-file schema being used by the Flat file assembler to process the outbound message.

Flat File Message Trailers

As with flat-file instance message headers, the parsing of the optional flat-file instance message trailer by the Flat file disassembler is controlled by the flat file schema that you have configured in the Trailer schema design-time property of the flat file disassembler or the XMLNORM.TrailerSpecName message context property. If you have not specified a schema using one of these two methods, the Flat file disassembler will assume that the flat file instance message does not contain a trailer.

Unlike flat-file instance message headers, flat-file instance message trailers can neither be saved and restored as a single unit nor can they use property promotion to copy individual data items to the message context associated with the flat-file instance message body. However, a trailer can be added to an outbound flat file instance message by specifying the appropriate schema in the Trailer schema design-time property of the flat file assembler or the XMLNORM.TrailerSpecName message context property. The data that constitutes the variable portion of the trailer can be specified using property demotion from the message context of the flat-file instance message body or by specifying default or fixed values in the corresponding schema.

Flat-File Schema Types

Within a particular part of a flat-file instance message, different data items are grouped into records, which themselves can contain sub-records and, ultimately, individual data items known as fields. These records and fields are distinguished from each other using one of two different basic methodologies.

  • The first methodology, known as positional, defines each data item as a pre-established length, with pad characters being used to bring a shorter item of data up to its expected length.
  • The second methodology, known as delimited, uses one or more special characters to separate items of data from each other. This methodology avoids the need for otherwise superfluous pad characters, but introduces some special considerations when the data itself contains the character or sequence of characters being used as a delimiter.

Positional Flat Files

Positional records within a flat-file instance message contain individual fields (items of data) that are each of a predefined length. The fields are parsed according to these lengths. For example, consider the following positional record from a flat-file instance message containing an id, country code, client name, and Country name:

01 PT Sandro Pereira                       Portugal

A reasonable definition for this record in a flat-file schema can be described as follows:

  • A positional record named Client contains the following fields:
    • An attribute named id that is left-aligned, 3 characters in length, with a zero character offset.
    • An element named countryCode that is left-aligned, 3 characters in length, with a zero character offset.
    • An element named name that is left-aligned, 37 characters in length, with a zero character offset.
    • An element named country that is left-aligned, and the length is until the end of the line.

Given these record and field definitions, the Flat file disassembler will produce the following XML equivalent of this record:


    PT 
    Sandro Pereira                       
    Portugal

There are several considerations related to positional records that will affect how the record is parsed when received and constructed when sent, including:

  • The character used to fill the unused portion of each field, known as the pad character.
  • An optional tag within the record can be used to distinguish the record from other similar records. Tags usually occur at the beginning of the record but are allowable anywhere within it. Positional records can be defined to have a tag or not have a tag, but once defined, the tag must be present or not, based on the definition.
  • How data is justified within a fixed length field relative to the accompanying pad characters.
  • Positional records nested within other positional or delimited records.
  • Positional records with field lengths specified as a specific number of bytes rather than a specific number of characters.

Notes:

  • If your flat file contains both delimited and positional records, you must set the Structure property of the root node to Delimited and the Structure property of subordinate record nodes to either Delimited or Positional as appropriate.
  • Fields in positional records have a limit of 50000000 characters.

Delimited Flat Files

Delimited records within a flat-file instance message contain nested records and/or individual fields (items of data) that are separated by a predefined character or set of characters. The fields are parsed according to these separating delimiters. For example, consider the following delimited records from a flat-file instance message, which contain three client lines to add to our internal system hypothetically:

Sandro;Pereira;1978;Crestuma;4415
José;Silva;1972;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

A reasonable definition for this record in a flat-file schema can be described as follows:

  • A delimited repeating record named Client with child delimiter {CR}{LF}
  • And  delimited elements with child delimiter ;
    • firstName
    • lastName
    • birthYear
    • city
    • zipCode

Given these record and field definitions, the Flat file disassembler produces the following XML equivalent of these records.


        Sandro
        Pereira
        1978
        Crestuma
        4415


   ...

...

There are several considerations related to delimited records that will affect how the record is parsed when received and constructed when sent, including:

  • The character or characters are used to override the interpretation of delimiters so that they are treated as part of the data. 
  • An optional tag at the beginning of the record can be used to distinguish the record from other similar records. 
  • How data is justified within fields with minimum lengths relative to the accompanying pad characters. 
  • Positional records nested within other delimited records. 
  • How data is justified within a fixed length field relative to its accompanying pad characters. 

Preservation and suppression of delimiters when flat-file messages are received and sent.

Notes:

  • If your flat file contains both delimited and positional records, you must set the Structure property of the root node to Delimited and the Structure property of subordinate record nodes to either Delimited or Positional as appropriate.
  • Delimited fields in flat files have a limit of 50000000 characters.

How does the text files (Flat Files) are processed by BizTalk?

Internally, BizTalk “prefers” to use the message type XML. If messages are in XML format, BizTalk “offers” numerous automatisms that are very useful in these environments, such as message routing based on a particular field (promoted property), tracking and analysis of multidimensional values and dimensions with BAM (Business Activity Monitoring), or making logical decisions within orchestrations (business processes) using elements of the message.

If messaging is the foundation of BizTalk Server, the message schemas are the bedrock on which messaging is built. Fortunately, BizTalk supports converting text files to XML simply and intuitively using Flat File Schemas, which are simple XML schemas (XSD) with specific annotations. At first glance, this may seem strange because XML Schemas (XSD) are used to describe XML files. However, BizTalk uses them as metadata to describe XML documents and text files (flat files).

The trick is that all the necessary information, such as the delimiter symbols or the element size in a positional file, i.e., the definition of the rules of parsing (transformation rules), are embedded in the form of annotations in XML Schema (XSD), thereby simplifying the reuse of all these schemes in different parts of the process. The document can be translated back into a flat file at any point because the definition is declarative and symmetric.

Where can syntax transformations occur?

This type of transformation –  Syntax Transformations – can occur in receive or send pipelines. Usually, text files (Flat Files) are processed at runtime as follows:

  • The Flat Files are received by an adapter associated with a receive location (Folder in File System, for example).
  • A pipeline configured in the receive location will be responsible for transforming the Flat File into its equivalent XML.
  • One or more interested in the message, such as orchestration, will subscribe to the XML document, and this message will go through the business process. Note in a pure messaging scenario, orchestrations are unnecessary.
  • If and when necessary, BizTalk can send XML messages again as text files (Flat Files) by using another pipeline in the send ports, which will be responsible for transforming the XML into its equivalent, the Flat File.

As the image below shows:

The receive pipeline consists of four stages, being that syntax transformations may occur in two of them:

  • Decode Stage: This stage is used for components that decode or decrypt the message. The MIME/SMIME Decoder pipeline component or a custom decoding component should be placed in this stage if the incoming messages need to be decoded from one format to another. The syntax transformations can occur in this stage through a custom component.
  • Disassemble Stage: This stage is used for components that parse or disassemble the message. The syntax transformations should occur at this stage. In the example that will be demonstrated in this article, we will use the “Flat file disassembler” to transform a text file into XML.
  • Validate Stage: This stage is used for components that validate the message format. A pipeline component processes only messages that conform to the schemas specified in that component. If a pipeline receives a message whose schema is not associated with any component in the pipeline, that message is not processed. Depending on the adapter that submits the message, the message is either suspended or an error is issued to the sender.
  • Resolve Party Stage: This stage is a placeholder for the Party Resolution Pipeline Component.

Regarding the send pipelines, they consist of three stages, being that syntax transformations may also occur in two of them:

  • Pre-assemble Stage: This stage is a placeholder for custom components that should perform some action on the message before the message is serialized.
  • Assemble Stage: Components in this stage are responsible for assembling or serializing the message and converting it to or from XML. The syntax transformations should occur at this stage.
  • Encode Stage: This stage is used for components that encode or encrypt the message. Place the MIME/SMIME Encoder component or a custom encoding component in this stage if message signing is required. The syntax transformations can occur in this stage through a custom component.

Hope you find this useful! So, if you liked the content or found it useful and want to help me write more content, you can buy (or help buy) my son a Star Wars Lego! 

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc.

He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.
View all posts by Sandro Pereira

CSV Structure Validation Pipeline Component

CSV Structure Validation Pipeline Component

I’m back to another blog post about BizTalk Server! I know that my latest post has been about Azure Logic Apps, and you may count on seeing many more in the future. And the reason is that I work both on-premise with BizTalk Server and in the cloud with Azure Integration Services… but relax, I will continue to post many things about BizTalk Server. BizTalk Server is not dead. It is well alive, contrary to what many think!

Today I’m returning to one of my pet projects: the BizTalk Pipeline Components Extensions Utility Pack project that is available on GitHub!

For those who aren’t familiar with it, the BizTalk Pipeline Components Extensions Utility Pack project is a set of custom pipeline components (libraries) with several custom pipeline components that can be used in receive and sent pipelines. Those pipeline components provide extensions of BizTalk’s out-of-the-box pipeline capabilities.

CSV Structure Validation Pipeline Component

The CSV Structure Validation Pipeline Component is a pipeline component that can be used to validate the structure of a basic CSV or flat file component before being converted to an XML message. Of course, this same strategy can be used in more complex scenarios.

This is the list of properties that you can set up on the CSV Structure Validation pipeline component:

Property Name Description Sample Values
DelimiterChar Define what is the delimiter char inside a line. ;
NumberFieldsPerLine Number of fields expected per line 3

Note that this component takes as granted that the line delimiter is the CRLF (Carriage ReturnLine Feed).

If we take this example:

one;two;t
1;2;2

Then we need to configure the port in the following way:

If we receive an invalid file, then the component will raise an error suspending the message in the BizTalk Server Administration Console. For example, with the following error message:

  • Invalid format data in the document. Line number 3 has 2 fields, and it should be expected 3 fields

If you are wondering why create a Pipeline component to validate the structure of the CSV or flat-file document? Can we use instead the Flat-File Schema to do this structure validation?

And the answer is Yes and No! In many cases and with many requirements, we don’t need to create a custom pipeline component. Using a Flat-File schema can be used to address the goals but in other scenarios doing a CSV Validation with the Flat-File schema is not enough. However, I will leave that “discussion” to my next BizTalk Server Best practices, tips and tricks.

How to install it

As always, you just need to add these DLLs on the Pipeline Components folder that in BizTalk Server 2020 is by default:

  • C:Program Files (x86)Microsoft BizTalk ServerPipeline Components

In this particular component, we need to have this  DLL:

  • BizTalk.PipelineComponent.CSVFlatFileStructureValidation.dll

How to use it

Like all previous, to use the pipeline component, I recommend you to create generic or several generic pipelines that can be reused by all your applications and add the Message Archive Pipeline Component in the stage you desire. The component can be used in a stage of the receive and send pipelines.

Download

THIS COMPONENT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND.

You can download CSV Structure Validation Pipeline Component from GitHub here:

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc.

He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.
View all posts by Sandro Pereira

BizTalk Server Teach me something new about Flat Files (or not) Part II video and slides are available at Integration Monday

BizTalk Server Teach me something new about Flat Files (or not) Part II video and slides are available at Integration Monday

Last year I presented several sessions on Integration Monday community but I never had the chance, for several and different reasons, to proper highlight on my personal blog. The second session that I delivered last year was the second part of the topic BizTalk Server: Teach me something new about Flat Files (or not) Part II.

In that session I tried to address and solve the following questions:

  • How to suppress Empty Lines everywhere (end or middle)?
  • My positional flat-file contains data that does not match the length expected (and they are not filled with empty spaces). How can I handle that?
  • What if we don’t want to remove Headers but… in fact we want to deal with Headers and Trailers?
  • Do I need to always create a custom pipeline for dealing with Flat-Files? Or it is possible to create a Generic Pipeline?

That was the sixth session that I deliver for that community:

About my session

Session Name: BizTalk Server: Teach me something new about Flat Files (or not) Part II

BizTalk Server Teach me something new about Flat Files (or not) Part II

Session Overview: This is the second part of Sandro Pereira’s earlier presentation on flat files on Integration Monday. You can watch the video recording of the previous session here.

Despite over the year’s new protocols, formats or patterns emerged like Web Services, WCF RESTful services, XML, JSON, among others. The use of text files (Flat Files) as CSV (Comma Separated Values) or TXT, one of the oldest common patterns for exchanging messages, still remains today one of the most used standards in systems integration and/or communication with business partners.

While tools like Excel can help us interpret such files, this type of process is always iterative and requires few user tips so that software can determine where there is a need to separate the fields/columns as well the data type of each field. But for a system integration (Enterprise Application Integration) like BizTalk Server, you must reduce any ambiguity, so that these kinds of operations can be performed thousands of times with confidence and without having recourse to a manual operator.

You can see the recording session here: BizTalk Server: Teach me something new about Flat Files (or not) Part II.

About Integration Monday

Integration Monday is full of great sessions that you can watch and I will also take this opportunity to invite you all to join us next Monday.

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc. He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community. View all posts by Sandro Pereira

BizTalk Server: Teach me something new about Flat Files (or not) video and slides are available at Integration Monday

BizTalk Server: Teach me something new about Flat Files (or not) video and slides are available at Integration Monday

Last Monday I presented, once again, a session in the Integration Monday series. This time the topic was BizTalk Server: Teach me something new about Flat Files (or not). This was my fifth session that I deliver:

And I think will not be the last! However, this time was different for many aspects and in a certain way it was a crazy session… Despite having some post about BizTalk Server: Teach me something new about Flat Files on my blog, I didn’t have time to prepare this session (sent to a crazy mission for a client and also because I had to organize the integration track on TUGA IT event), I had a small problem in my BizTalk Server 2016 machine in which I had to switch to my BizTalk Server 2013 R2 VM, interrupted by the kids in the middle of the session because the girls wanted me to have dinner with them (worthy of being in this series)… but it all ended well and I think it was a very nice session with two great real case samples:

  • Removing headers from a flat file (CSV) using only the schema (without any custom pipeline component)
  • And removing empty lines from a delimited flat file, again, using only the schema (without any custom pipeline component)

For those who were online, I hope you have enjoyed it and sorry for all the confusion. And for those who did not have the chance to be there, you can now view it because the session is recorded and available on the Integration Monday website. I hope you like it!

Session Name: BizTalk Server: Teach me something new about Flat Files (or not)

BizTalk Server: Teach me something new about Flat Files

Session Overview: Despite over the year’s new protocols, formats or patterns emerged like Web Services, WCF RESTful services, XML, JSON, among others. The use of text files (Flat Files ) as CSV (Comma Separated Values) or TXT, one of the oldest common patterns for exchanging messages, still remains today one of the most used standards in systems integration and/or communication with business partners.

While tools like Excel can help us interpret such files, this type of process is always iterative and requires few user tips so that software can determine where there is a need to separate the fields/columns as well the data type of each field. But for a system integration (Enterprise Application Integration) like BizTalk Server, you must reduce any ambiguity, so that these kinds of operations can be performed thousands of times with confidence and without having recourse to a manual operator.

In this session we will first address: How we can easily implement a robust File Transfer integration in BizTalk Server (using Content-Based Routing in BizTalk with retries, backup channel and so on).
And second: How to process Flat Files documents (TXT, CSV …) in BizTalk Server. Addressing what types of flat files are supported? How is the process of transforming text files (also called Flat Files) into XML documents (Syntax Transformations) – where does it happen and which components are needed. How can I perform a flat file validation?

Integration Monday is full of great sessions that you can watch and I will also take this opportunity to invite you all to join us next Monday.

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc. He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community. View all posts by Sandro Pereira