The latest CTP version of the Oslo SDK was released a few days ago. I spent some time yesterday installing it and having a look. From a functionality perspective, there is not very much difference to the October 2008 CTP released after last year’s PDC. However, if you open up the main assemblies in Reflector, it quickly becomes apparent that the code has undergone significant refactoring and improvement. The code looks much tidier and closer to production quality. For example, the October 2008 CTP contained generated parser/lexer code for MGrammar. In the latest version, this has been removed, and it appears that the parser code is now dynamically generated at runtime (the ‘preferred’ Oslo approach). There has been lots of tidying up done in terms of type and method names. Additional functionality has been added to manage various issues, and the entire code base looks tighter and better constructed.

In terms of new functionality, this has been discussed elsewhere. See, for example, http://www.alexthissen.nl/blogs/main/archive/2009/01/31/improvements-and-changes-to-oslo-sdk-and-repository-in-january-ctp.aspx. Most attention has been given to the ability to include actions on the RHS of token productions. In the work we have undertaken to date, we have come across at least one situation which requires this new feature, and which we could not properly address in the October 2008 CTP.

Perhaps the most intriguing aspect of the January 2009 CTP is the way this new feature is described in the Release Notes. I suspect that Microsoft has inadvertently let slip a feature that they didn’t mean to go public on. The Release Notes published on the web site state that:

“Any production in a token can now have a code action or a graph action (formerly known as term construction)! You can now specify a return type for a token definition in the case of code actions, similar to a syntax definition.”

In the October 2008 CTP, actions are limited only to MGraph expressions. An action is an optional implication of a production that controls the output of the MGraph abstract syntax tree created by the parser. As far as I can tell, this is still the case in the new CTP. Unlike many similar technologies, the CTP version of MGrammar does not support the inclusion of code statements as semantic actions. This was discussed by Clemens Szyperski (an Oslo architect) in a comment to the blog article at http://weblogs.asp.net/fbouma/archive/2008/11/05/designing-a-language-is-hard-and-m-won-t-change-that.aspx, and the suggestion appears to be that this is a deliberate strategy in order to ensure that MGrammar remains (relatively) simple to write and focussed only on composable DSL creation.

The Release Notes statement suggests that Microsoft is looking at including code actions in MGrammar. As I say, it would appear that this feature is not actually supported in the January 2009 CTP. If it is, there is certainly no documentation explaining how to use this feature. Interestingly, this may be the explanation for a feature within the October 2008 CTP which seems to have disappeared from the current version. In the previous CTP, the MGrammar assembly included code for parsing C# statements. The parser didn’t appear to be designed as a full-blown C# parser, but looked like it was designed to parse code statements and expressions. This code appears to be missing from the January 2009 CTP.

It is generally a fool’s errand to speculate on what is happening behind the scenes, and I certainly have no special insight or knowledge about Microsoft’s intentions. However, I can’t help wondering if Microsoft has accidently let us see that they are considering supporting code actions in MGrammar when it is released, and have built code to support this feature which they do not wish to make public at the current time. If this is the case, there is no guarantee that this feature will make it into the final release. For my part, I have been thinking about this issue for a few months now, and am undecided, myself, as to the desirability of supporting full-blown semantic actions in this fashion. The issue, I think, is about how useful this feature will really be in mainstream DSL creation. Does the reduced problem domain of a domain-specific language imply language simplicity that generally avoids the need to handle complex semantics at the parser level? Clearly, there is no fundamental reason why a DSL should not exhibit such complexity, but if the vast majority of DSLs are inherently simple, maybe it would be wiser to stick to the current labelled graph-only model employed by MGrammar parsers, and work around this restriction. As I say, I am undecided. It may be that the Oslo team are currently also uncertain of the best strategy. It would be interesting to hear views from the wider modelling community.

In a related issue, am I the only person to spot an uncanny philosophical resemblance between MGrammar and Labelled BNF (LBNF)? Did LBNF have any bearing on the Oslo team’s thinking? The mechanics of the Oslo approach are different, and IMHO generally superior, to LBNF, but some of the underlying thinking is similar, including the emphasis on creation and shaping of labelled graph ASTs.