A couple of weeks ago I published a post describing my Oslo based deployment framework for BizTalk.

Two parts were missing from that post – the actual MGrammar and the runtime that processes the source code files.

In this post I will go over the grammar I created for the framework; I will try to go over the complete grammar explaining the various steps, this is not intended to be a complete description of MGrammar (not that there’s a chance I could write one), but rather an overview by example; for more information on Oslo and ’M’ visit the Oslo Dev Centre on MSDN

It was important for me to create a solution that is completely usable, and indeed I have started to use this to generate the build scripts for my application, the price of which is that it might not be the best example code out there, but I hope you will find this useful.

Below is the complete grammar, after which I walk though it step by step; it might be useful to have anther look at the example source code I included in my previous post to better understand what I’m trying to achieve with the syntax –

   1:      module Sabra.BizTalk.Deployment
   2:      {
   3:  
   4:          language BTSDeploy
   5:          {
   6:              //main syntax is the entry point for the grammar - the first syntax to be parsed
   7:              syntax Main = app:AppDef 
   8:                                 "{" 
   9:                                 items:ApplicationItems 
  10:                                 "}" => Application[app,valuesof(items)];
  11:              
  12:              //application definition at root of source code
  13:              syntax AppDef = applicationKW name:ApplicationName => name;
  14:  
  15:              //application items supports including all possible items multiple times in any order
  16:              syntax ApplicationItems = items:(Add | Build | ImportBinding | Comment)* => {valuesof(items)};
  17:  
  18:              //now define the syntax for each item type - 
  19:              
  20:              //'Build' deinfes a solution or project to build during execution
  21:              //(due to limitations in our msbuild framework, runtime currently supports solutions only, but language should support both)
  22:              syntax Build = "build" path:Path";" => Build[path];
  23:              //binding to import into application
  24:              syntax ImportBinding = importKW bindingKW path:Path";" => ImportBinding{Path = path};
  25:  
  26:              //syntax of add further specified different add 'options'
  27:              syntax Add = addKW add:(Add_Reference | Add_Binding | Add_Assembly | Add_BTS_Assembly) => Add{valuesof(add)};
  28:              //each add option is defined next
  29:                          
  30:              //binding to add as resource to application. must specify environment name
  31:              syntax Add_Binding = bindingKW path:Path env:MultiWordName";" => Binding[path,env];
  32:              //defined a reference to another application, supports providing multiple applications in the same instruction
  33:              syntax Add_Reference = referenceKW ref1:ApplicationName refs:Add_AdditionalReferences*";" => Reference{ref1,valuesof(refs)};
  34:              syntax Add_AdditionalReferences = "," app:ApplicationName => app;
  35:              //add assembly defines an assembly to be added as a resource to the application
  36:              syntax Add_Assembly = assemblyKW path:Path details:AssemblyDetails";" => Resource[path,Details{details}];
  37:              //add biztalk assembly is similar to assembly, but allows specifiying any contained orchestrations
  38:              syntax Add_BTS_Assembly = "biztalk" assemblyKW path:Path orch:Orchestrations? details:AssemblyDetails";" => BizTalkAssembly[path,orch,Details{details}];
  39:              syntax Orchestrations  = withKW orchestrationsKW "{" type1:ApplicationName types:AdditionalOrchestrations* "}" => Orchestrations{type1,valuesof(types)};
  40:              syntax AdditionalOrchestrations = "," type:ApplicationName => type;
  41:              
  42:              //assembly details
  43:              syntax AssemblyDetails = ver:AssemblyVersion+ culture:Culture+ pkt:PublicKeyToken+=>{Version{valuesof(ver)},Culture{valuesof(culture)},PublicKeyToken{valuesof(pkt)}};
  44:              token AssemblyVersion = versionKW "=" version:(AnyDigit*"."AnyDigit*"."AnyDigit*"."AnyDigit*)=>version;
  45:              token Culture = cultureKW "=" culture:Word=>culture; 
  46:              //TODO: token should be 16 chars exactly
  47:              token PublicKeyToken = publicKeyTokenKW "="pkt:(AnyChar|AnyDigit)*=>pkt;   
  48:              
  49:              //keywords
  50:              @{Classification["Keyword"]}
  51:              token applicationKW = "Application";
  52:              @{Classification["Keyword"]}
  53:              token addKW = "add";
  54:              @{Classification["Keyword"]}
  55:              token bindingKW = "binding";
  56:              @{Classification["Keyword"]}
  57:              token referenceKW = "reference";
  58:              @{Classification["Keyword"]}
  59:              token importKW = "import";
  60:              @{Classification["Keyword"]}
  61:              token buildKW  = "build";
  62:              @{Classification["Keyword"]}
  63:              token assemblyKW = "assembly";
  64:              @{Classification["Keyword"]}
  65:              token biztalkKW = "biztalk";
  66:              @{Classification["Keyword"]}
  67:              token withKW = "with";
  68:              @{Classification["Keyword"]}
  69:              token orchestrationsKW = "orchestrations";
  70:              @{Classification["Keyword"]}
  71:              token versionKW = "version";        
  72:              @{Classification["Keyword"]}
  73:              token cultureKW = "culture";        
  74:              @{Classification["Keyword"]}
  75:              token publicKeyTokenKW = "publicKeyToken";        
  76:              
  77:              //definition of a comment, similar to c# syntax
  78:              @{Classification["Comment"]}
  79:              token Comment = "//" CommentLineContent*;
  80:              token CommentLineContent
  81:                  = ^(
  82:                       '\u000A' // New Line
  83:                    |  '\u000D' // Carriage Return
  84:                    |  '\u0085' // Next Line
  85:                    |  '\u2028' // Line Separator
  86:                    |  '\u2029' // Paragraph Separator
  87:                  );
  88:                  
  89:              //application name must start with a character and then include any character, digit or '.'
  90:              token ApplicationName = AnyChar+(AnyChar | AnyDigit | ".")*;
  91:              
  92:              //tokens use for definition of a file path
  93:              token Path = "\""PathRoot?FileSystemName("\\"FileSystemName)*"\"";
  94:              token PathRoot = AnyChar":\\";
  95:              token FileSystemName = (AnyChar | AnyDigit | Space | "-" | "_" | ".")+;
  96:  
  97:              //common token definitions
  98:              token AnyChar = "A".."Z" | "a".."z";
  99:              token AnyDigit = "0".."9";
 100:              token MultiWordName  = "\""a:(Word | Space)"\"" => a;
 101:              token Word = (AnyChar | AnyDigit | "-" | "_")+;
 102:  
 103:              //the interleave will ensure the language allows whitespace anywhere
 104:              interleave Whitespace = Tab | LF | CR | Space | Comment;
 105:              token LF = "\u000A";
 106:              token CR = "\u000D";
 107:              token Space = "\u0020";
 108:              token Tab = "\u0009";
 109:  
 110:             }
 111:      }

I’ve build the grammar top down and this is how I will walk through it –

First (1) I define my module, in a namespace like manner; this is the logical container for my language; I then(4) declare my language and give it a name.

The main constructs in mgrammar are syntax and token; I often heard the guys at Redmond explain that when it comes to languages you can think of syntax as being the sentence and tokens as being the words; I think this is a very clear explanations; there are a few rules relating to them as you can imagine, important ones to remember at this point are that syntaxes can contain other syntaxes as well as tokens (and literals), tokens can only contain other tokens (and literals); also – interleave does not apply to tokens .

The main syntax, and the entry point for any language is Main and you can see mine defined on line 7, and it looks like this –

//main syntax is the entry point for the grammar - the first syntax to be parsed
            syntax Main = app:AppDef 
                               "{" 
                               items:ApplicationItems 
                               "}" => Application[app,valuesof(items)];

’//’ is used for comments, just like in c#, so the first line will be ignored.

syntax is one of the few keywords that exist in mgrammar, no explanation needed;

Main is the name of the syntax, which allows is to be referred to (used) by other syntaxes; in this case, as I’ve mentioned, ’Main’ is also the entry point -the syntax the parser will start at; everything else should flow from here.

Now for – app:AppDef, but first – a note – in my mind there are two aspects to creating a language in mgrammar – there is the ’parsing aspect’ – you define your language so that it describes the rules to parse your source code; and there is the ’output aspect’ (or ’production aspect’) – this is where you define the output of your language – in ’M’ this is mgraph – so that it descries accurately the intent of any source code (and is easy-ish to work with at runtime)

Two are inevitably very mixed in any real-world work with ’M’ which can be confusing, and today I want to focus on the parsing aspect – firstly because it is the more important one in my view (there’s nothing to work with before you’ve declared a good syntax for your language), and secondly – because I suspect we’re going to see some changes to the production aspect in the near future.

Mainly production aspect ’stuff’ is defined after the ’=>’ operator as you can see in my syntax above, so for the time being just try to ignore that; there will be a bit more to ignore as you will see shortly.

AppDef is a name of a syntax declared somewhere else in the language ( line 13); it could also be defined in any imported languages, but I don’t have any, we will look at that in a second; app: is an alias assigned to this syntax which allows for it to be referenced in the production on the right side of the arrow operator; again – for the time being feel free to ignore any aliases, they have no impact on the parsing aspect.

So – my syntax main basically says we’re expecting to have in our source code something that matches the AppDef syntax, then an opening curly bracket then something that matches the ApplicationItems syntax and then closing curly bracket. simple.

Of course next the parser would look at the definition of AppDef and ApplicationItems, and so will we.

AppDef is defined in line 13 as an ApplicationKW followed by ApplicationName; these are defined in lines 51 and 90 respectively; lets look at the ApplicationKW definition-

            @{Classification["Keyword"]}
            token applicationKW = "Application";

ApplicationKW itself is a token with a fixed literal ‘Application’ – this is a very simple rule to follow, and in fact I could have simply included this literal in the syntax definition and not use this token at all (which is what I have done previously).

The reason I have separated it out to its own token is related to the preceding line in the grammar – the classification attribute allows me to mark this token as a “keyword” for my language, this would tell intellipad (and, presumably, any other editor that would learn how to work with mgrammar), that this token is a keyword and should be displayed as such; in intellipad this means it would be bolded in the editor, as you can see in the image below of my language open in intellipad –

Back to the main syntax’ components – ApplicationName, defined in line 51 states that an application name is composed of AnyChar followed by any number of AnyChar, AnyDigit or the literal ’.’ with AnyChar and AnyDigit defined in lines 98 and 99.

The ’+’ sign indicates the syntax or token it follows must exist at least once; the ’*’ sign indicates the syntax or token it follows can exist 0 or more times.

So – we have the definition of our application name, now lets look at what ApplicationItems says –

syntax ApplicationItems = items:(Add | Build | ImportBinding | Comment)* => {valuesof(items)};

This syntaxt tells the parser that an application can have any number of Add, Build, ImportBinding or Comment in any order.

Moving on we’ll look briefly at how ImportBinding looks like –

syntax ImportBinding = importKW bindingKW path:Path";" => ImportBinding{Path = path};

The importKW (which is the literal ’import’, look it up!) followed by the bindingKW (’binding’) and the syntax for Path.

I could have combined both literals import and binding to a single token and mark that as a keyword, but there are two benefits to splitting them up- firstly, by having two tokens I can have as many whitespaces as I want between them, which I think is what developers generally expect, and, secondly – the ‘binding’ keyword is re-used for the add binding syntax I’ll describe shortly.

I’ll skip the Path definition, you can follow it yourself if you wish to; so next we can look at another item in the application items list – Add:

syntax Add = addKW add:(Add_Reference | Add_Binding | Add_Assembly | Add_BTS_Assembly) => Add{valuesof(add)};

The Add syntax starts with the addKW (’add’) followed by one of the syntaxes for adding a reference, adding a binding, adding an assembly or adding a BizTalk assembly, but it only allows one; the add keyword (and therefore the entire add syntax) must be repeated as a whole to add multiple items to the application, as is suggested by the ApplicationItems syntax.

Lets look at a couple of these items; first – the syntax for add binding –

syntax Add_Binding = bindingKW path:Path env:MultiWordName”;” => Binding[path,env];

Here you can see the binding keyword being reused, as does the Path syntax; I’m then allowing a multi-word-name (which is essentially a string contained in double quotes) as the environment name for the added binding.

Quite simple, right? that’s the thing I love about mgrammar. let’s look at one more syntax –

syntax Add_BTS_Assembly = “biztalk” assemblyKW path:Path orch:Orchestrations? details:AssemblyDetails”;” => BizTalkAssembly[path,orch,Details{details}];

syntax Orchestrations = withKW orchestrationsKW “{” type1:ApplicationName types:AdditionalOrchestrations* “}” => Orchestrations{type1,valuesof(types)};

syntax AdditionalOrchestrations = “,” type:ApplicationName => type;

The Add_BTS_Assembly syntax should be very clear, the only thing I haven’t mentioned so far is the ? sign which indicates 1 or 0 appearances of the syntax/token it follows, I use this to allow a BizTalk assembly to optionally describe the orchestrations it contains so that, potentially, any instances of these could be terminated when undeploying the application.

The Orchestrations syntax, if exists, requires at least one orchestration to be specified (I’m reusing the ApplicationName token as the orchestration name) but allows additional orchestratiosn to be specified as well; I’ve used the same approach for the add reference syntax.

I hope this makes sense, and that it gives you a glimpse into a practical use of mgrammar, I am certainly excited about this stuff.

Soon I hope to post about the last missing piece of the puzzle – the runtime that uses the language definition to parse, and then execute, any source code provided; after that the whole thing is likely to find a spot on CodePlex, bare with me a little bit longer.