March 4, 2021
Hot Topics:

Developing a TSQL Domain Specific Language with "M" and the Microsoft Oslo SDK

  • By Jeffrey Juday
  • Send Email »
  • More Articles »

Essence of TSQL SELECT

A complete introduction to TSQL is beyond the scope of this article. I'll only cover some basics from the perspective of the sample. The SELECT statement generated by the sample appears below.

  , Employee.LoginId 
  , Contact.LastName 
  , Manager.LoginId 

FROM HumanResources.Employee AS Employee  
INNER JOIN Person.Contact AS Contact ON Employee.ContactId = Contact.ContactId  
INNER JOIN HumanResources.Employee AS Manager ON Employee.ManagerId = Manager.EmployeeId

While there are many parts and options in a SELECT statement, I'm only employing the following:

  • The select list containing the fields being returned in the data.
  • The FROM clause containing the source tables.
  • INNER JOIN indicating how source tables are linked by their relationships. Inner joins specify that only results satisfying the expression after "ON" are returned in the result set. In the example above, only Employee information with a ContactId in the contact table and a ManagerId in the Employee table are included in the data.

Earlier I mentioned aliasing. Sometimes a table has a relationship between its key field and another field. In the example above, a Manager is also an Employee. So to distinguish between the Employee table related to the employee and the Employee table related to the Manager, the Manager Employee table is aliased.

Now that I've shown what the application does and shared some background on business problems the sample solves, I'm going to tackle the application internals starting with the "M" components.

MGrammar Revisited

MGrammar is the portion of the M language tasked with expressing parsing. MGrammar expresses patterns in text data and how to pull data from the parse text. The core of MGrammar are rules. There are three types of rules:
  • Tokens - think of tokens as the "words" in the program. In tokens, a developer describes specific patterns in the text.
  • Syntaxes are more like the sentences. Syntaxes arrange the tokens and other syntaxes and define the format of the data pulled from the text.
  • Interleaves define ignorable input.

There is more to M and MGrammar than rules. I'm going to explain some of these other features as I show how the "Simple SQL" has been implemented in M.

Specifying Keywords

There are 3 parts to the M portion of the sample application. As you may have observed in the DSL sample above, there are a number of keywords in the sample input text. Here is the MGrammar keyword section in the M code.
  @{ Classification["Keyword"]}       
          token INTERSECT = "INTERSECT"; 
          @{ Classification["Keyword"]}       
          token AT = "AT"; 
          @{ Classification["Keyword"]}       
          token DataFieldQualifier = "Data:";
          @{ Classification["Keyword"]}       
          token Alias = "AS";

Keywords are simply tokens. The Attributes on the keyword are directives for Intellipad. When developing in Intellipad, keywords will be highlighted in Intellipad.

I like to think of M development the way I think about other types of development. So, I like to partition my M application similar to the way I partition a .NET application written in C#. I like to assemble a collection of tokens and then compose the tokens into Syntaxes and then Syntaxes upon Syntaxes.

Partitioning Syntaxes and Controlling a Production

The first partition of Syntaxes appears below

syntax Main = s:SelectInputs+ =>  { s };
syntax SelectInputs = l:DataFields | l:Sets | l:Tables => l;
syntax DataFields = DataFieldQualifier d:((ListText))+ => Data[valuesof(d)];
syntax Sets = "(" t1:ContentText AT fld1:ContentText INTERSECT t2:ContentText AT fld2:ContentText ")" 
=> INTERSECT[t1,fld1,t2,fld2];
syntax TableReg = "(" reg:(ContentText) ")" => TableReg [reg];
syntax TableAlias = "(" alias:(ContentText Alias ContentText) ")" => TableAlias [alias];
syntax Tables =  t:TableAlias | t:TableReg => t;
interleave Ignore = Garbage;        
token Garbage = (' ' | '\n' | '\r')+;
token ContentText = ContentChar+;
token ContentChar = "a".."z"|"A".."Z"|"0".."9";
token ListChar = ContentChar | ".";
token ListText = ListChar+;

Syntaxes are divided into the following patterns:

  • Patterns identifying the tables are divided into the pattern signifying the Alias table, TableAlias Syntax and a pattern matching a regular table TableReg syntax.
  • Patterns matching the Data called DataFields.
  • Finally, a pattern matching the table joins called Sets.

Productions are encapsulated in the syntax appearing to the right of the "=>". Functions like "valuesof" change the shape of the data making it less hierarchal. "{}" and "[]" indicate unordered ("{}") or ordered ("[]") output.

Turning now to the .NET portion of the application, I'll show how the M code is utilized by the application.

Page 2 of 4

This article was originally published on May 5, 2009

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date