Microsoft & .NETDeveloping a TSQL Domain Specific Language with "M" and the Microsoft Oslo...

Developing a TSQL Domain Specific Language with “M” and the Microsoft Oslo SDK

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

My last article explained why a developer chooses a Domain Specific Language (DSL) and how an M DSL can be incorporated into a Windows application. The article was a starting point for a Visual Studio developer interested in DSLs, M, the Oslo SDK, and how all the components collaborate to render a solution.

In this article, I’m going to demonstrate some more advanced material. In particular, I’ll delve more into how a runtime can more effectively leverage the Oslo SDK data structures generated by M.


What is “Oslo”?

If you’ve been following my articles Oslo’s goals and technologies are familiar to you. If not, here is a brief summary. Oslo is composed of the following components displayed in Figure 1.

Oslo’s goal is to deliver a foundation for building and storing models of all types. Models are application metadata formatted for runtime consumption. Separate Microsoft initiatives aim to build runtimes and tooling into applications like Visual Studio that are Oslo model aware.


The M Language is composed of MSchema, MGraph, and MGrammar. A complete introduction to M is beyond the scope of this article. I introduced MSchema, MGraph, and MGrammar in my prior articles.


There is an underlying set of .NET Framework classes supporting all the functionality above. In my last article I demonstrated how some of these .NET Framework classes work. As I mentioned earlier, in this article I’m taking the demonstration a step further and delivering something more advanced.


Something “Advanced”

It’s no surprise that the best way to demonstrate some more advanced features is to do something, well, “advanced”. The sample application demonstrating more advanced features is depicted below.

The sample application is built on top of the January 2009 Oslo SDK CTP.


If you’re new to TSQL and unfamiliar with SELECT statements, later in the article I’ll dissect the SELECT statement.

As I mentioned earlier, one of the application components is a DSL written in M. Sample DSL code I call “Simple SQL” is below.

  (Employee)
(Contact AS Contact)
(Employee AS Manager)

(Employee AT ContactId INTERSECT Contact AT ContactId)
(Employee AT ManagerId INTERSECT Manager AT EmployeeId)

Data: Employee.NationalIdNumber
Employee.LoginId
Contact.LastName
Manager.LoginId



There are three parts to the “Simple SQL” DSL program structure:



  • A section declaring the tables. Tables must be between “()”. Repeated tables must use the “AS” statement to recast the table with a new name. A TSQL concept called aliasing.
  • An “INTERSECT” section indicating how the relational data is joined between table.
  • A “Data” section containing the fields and the table with the originating field.
Like many DSLs, the syntax is expressed using the domain’s “business language”, making development more approachable to a wider audience. As long as a business user understands some basic relational concepts they’ll be capable of writing “Simple SQL” DSL code.

There’s one more topic I need to cover before exploring the sample runtime anatomy.

Essence of TSQL SELECT


A complete introduction to TSQL is beyond the scope of this
article. I’ll only cover some basics from the perspective of the
sample. The SELECT statement generated by the sample appears
below.


SELECT
Employee.NationalIdNumber
, Employee.LoginId
, Contact.LastName
, Manager.LoginId

FROM HumanResources.Employee AS Employee
INNER JOIN Person.Contact AS Contact ON Employee.ContactId = Contact.ContactId
INNER JOIN HumanResources.Employee AS Manager ON Employee.ManagerId = Manager.EmployeeId



While there are many parts and options in a SELECT statement,
I’m only employing the following:



  • The select list containing the fields being returned in the
    data.

  • The FROM clause containing the source tables.

  • INNER JOIN indicating how source tables are
    linked by their relationships. Inner joins specify that only
    results satisfying the expression after “ON” are returned in the
    result set. In the example above, only Employee information with
    a ContactId in the contact table and a ManagerId in the Employee
    table are included in the data.


Earlier I mentioned aliasing. Sometimes a table has a
relationship between its key field and another field. In the
example above, a Manager is also an Employee. So to distinguish
between the Employee table related to the employee and the
Employee table related to the Manager, the Manager Employee
table is aliased.


Now that I’ve shown what the application does and shared some
background on business problems the sample solves, I’m going to
tackle the application internals starting with the “M”
components.


MGrammar Revisited


MGrammar is the portion of the M language tasked with expressing
parsing. MGrammar expresses patterns in text data and how to
pull data from the parse text. The core of MGrammar are rules.
There are three types of rules:

  • Tokens – think of tokens as the “words” in the program. In
    tokens, a developer describes specific patterns in the
    text.

  • Syntaxes are more like the sentences. Syntaxes arrange the
    tokens and other syntaxes and define the format of the data
    pulled from the text.

  • Interleaves define ignorable input.


There is more to M and MGrammar than rules. I’m going to
explain some of these other features as I show how the “Simple
SQL” has been implemented in M.


Specifying Keywords


There are 3 parts to the M portion of the sample application. As
you may have observed in the DSL sample above, there are a
number of keywords in the sample input text. Here is the
MGrammar keyword section in the M code.

@{ Classification[”Keyword”]}
token INTERSECT = “INTERSECT”;
@{ Classification[”Keyword”]}
token AT = “AT”;
@{ Classification[”Keyword”]}
token DataFieldQualifier = “Data:”;
@{ Classification[”Keyword”]}
token Alias = “AS”;

Keywords are simply tokens. The Attributes on the keyword are
directives for Intellipad. When developing in Intellipad,
keywords will be highlighted in Intellipad.


I like to think of M development the way I think about other
types of development. So, I like to partition my M application
similar to the way I partition a .NET application written in C#.
I like to assemble a collection of tokens and then compose the
tokens into Syntaxes and then Syntaxes upon Syntaxes.


Partitioning Syntaxes and Controlling a Production


The first partition of Syntaxes appears below



syntax Main = s:SelectInputs+ => { s };
syntax SelectInputs = l:DataFields | l:Sets | l:Tables => l;

syntax DataFields = DataFieldQualifier d:((ListText))+ => Data[valuesof(d)];
syntax Sets = “(” t1:ContentText AT fld1:ContentText INTERSECT t2:ContentText AT fld2:ContentText “)”
=> INTERSECT[t1,fld1,t2,fld2];
syntax TableReg = “(” reg:(ContentText) “)” => TableReg [reg];
syntax TableAlias = “(” alias:(ContentText Alias ContentText) “)” => TableAlias [alias];
syntax Tables = t:TableAlias | t:TableReg => t;
interleave Ignore = Garbage;
token Garbage = (‘ ‘ | ‘n’ | ‘r’)+;
token ContentText = ContentChar+;
token ContentChar = “a”..”z”|”A”..”Z”|”0″..”9″;
token ListChar = ContentChar | “.”;
token ListText = ListChar+;

Syntaxes are divided into the following patterns:



  • Patterns identifying the tables are divided into the pattern signifying the Alias table, TableAlias Syntax and a pattern matching a regular table TableReg syntax.

  • Patterns matching the Data called DataFields.

  • Finally, a pattern matching the table joins called Sets.


Productions are encapsulated in the syntax appearing to the
right of the “=>”. Functions like “valuesof” change the
shape of the data making it less hierarchal. “{}” and “[]”
indicate unordered (“{}”) or ordered (“[]”) output.


Turning now to the .NET portion of the application, I’ll show
how the M code is utilized by the application.

Run

Executing the parsing and generating data all starts behind the WPF Application “Run” button. The Run button code appears below.

  GridView view;
DBMediator med = new DBMediator();
GridViewLayoutBuilder viewBuild = new GridViewLayoutBuilder();
DataSet ds = null;
List configList = null;
MParserBuilder parser = new MParserBuilder();
MGraphGenerator graphGen = new MGraphGenerator();
MGraphIterator iterator = null;
MGraphTSQLQueryVisitor visitor = new MGraphTSQLQueryVisitor();
List visitors = new List();

parser.Build(_mgFile);

graphGen.Run(this._inputText.Text, parser.Parser);

visitor.Graph = graphGen.Builder;

visitors.Add(visitor);

iterator = new MGraphIterator(graphGen.Builder, graphGen.GraphRoot, visitors);

iterator.Traverse();

this.SQLCode.Text = visitor.TSQL;

//Now view the data
view = (GridView)this.SQLData.View;

ds = med.GetDataSet(this.SQLCode.Text);

configList = (new GridViewColumnConfigBuilder(ds.Tables[0].Columns)).GetColumnNames();

viewBuild.View = view;
viewBuild.SetColumns(configList);

this.SQLData.DataContext = ds;
this.SQLData.ItemsSource = ds.Tables[0].Rows;


Classes declared at the top of the code have the following responsibilities.



  • GridView handles rendering the DataSet data.
  • DBMediator accepts a SQL query and returns a DataSet.
  • GridViewLayoutBuilder changes the GridView to match the DataSet
  • MParserBuilder builds the Oslo DynamicParser from the M code
  • MGraphGenerator uses the DynamicParser to build a GraphBuilder object.
  • MGraphIterator navigates the GraphBuilder object, invoking a IGraphVisitor object at each node.
  • MGraphTSQLQueryVisitor is the implementation of IGraphVisitor and handles SELECT statement construction.

I’ll highlight the critical parts of each class beginning with the Iterator.


Navigating a Graph — Iterator

In my prior article I demonstrated how a developer can incorporate the M code. You can view that article here http://www.codeguru.com/columns/experts/article.php/ c16043/ . In the prior article I navigated an Oslo Data structure called GraphBuilder.

GraphBuilder is generated by the Oslo classes from the text input and the MGrammar loaded in the application. GraphBuilder is a hiearchal data structure somewhat matching the shape of the MGraph-like data generated in the Tree view of Intellipad. A sample from Intellipad appears below.

Also, the Iterator I built does not implement IEnumerable, so you can’t utilize the C# foreeach statement with the Iterator. The Iterator name refers to the pattern not the implementation.


MGraphIterator and a class implementing IMGraphIterator work like this. As viewed above, GraphBuilder is a tree-like data structure. MGraphIterator recursively traverses the nodes on the GraphBuilder invoking the Accept function on a list of IMGraphIterator based classes. A portion of the code appears below.

          public void Traverse ()
{
WalkTheNodes(_graph, _root,_visitors);
}

private void WalkTheNodes(System.Dataflow.GraphBuilder graph, object nodeOn, List<IMGRAPHVISITOR> visitors)
{
bool isNode = false;

foreach (object enumNext in graph.GetSuccessors(nodeOn))
{
isNode = graph.IsNode(enumNext);

if ((isNode)) //Skip the non-nodes
{
foreach (IMGraphVisitor visitor in visitors)
{
visitor.Accept(enumNext);
}

WalkTheNodes(graph, enumNext, visitors);//Recurse to next node
}

}
}


I envisioned making the Iterator/Visitor interaction a reusable pattern. Different M languages could produce different GraphBuilders, but I thought that every GraphBuilder would be traversed in a similar manner. So, I decided to defer GraphBuilder consumption to an implementation of an interface following the Visitor pattern.


Now I’ll show how I implemented a visitor for a GraphBuilder created from my “Simple SQL” DSL.

The Visitor

MGraphTSQLQueryVisitor implements the IGraphVisitor interface. Main parts of the Accept function appear below.
          public void Accept(object node)
{
List<OBJECT> vals;
string seqLabel = “”;
object seqLabelObj = null;
string tempVal = “”;
string originalVal = “”;

seqLabelObj = _graph.GetSequenceLabel(node);

if (seqLabelObj != null) //Sometimes label returns null value
{
seqLabel = seqLabelObj.ToString();

if (seqLabel == “TableReg”)
{
..
}

if (seqLabel == “TableAlias”)
{
..
}

if (seqLabel == “INTERSECT”)
{
..

}

if (seqLabel == “Data”)
{
vals = GetSuccessorsTo(node);

foreach (object obj in vals)
{
_selectFieldSection.Add( GetFirstSuccessorAtDepth(obj, 5));

}

}

}
}


As you can see above MGraphTSQLQueryVisitor executes different code depending on the visited Node. When a node contains a root with the first word in one of the projections, Accept retrieves the information in levels below the Node being visited.


MGraphTSQLQueryVisitor leverages a class called TSQLTableNameSchemaResolver. TSQLTableNameSchemaResolver is pseudo database-aware. Business users are probably not aware of Schema information. So TSQLTableNameSchemaResolver qualifies the utilized tables with the correct schema information. I envisioned this class querying system tables in the database to match tables entered in the DSL code.


Some internal collections store what is retrieved from GraphBuilder so that the SELECT statement can be assembled and executed. WPF application code handles data display.


Displaying Data


The sample is a WPF Application. So the GridView rendering classes I’ll review shortly use the WPF controls. A complete review of WPF GridView controls is beyond the scope of this article. Sources at the end of the article contain more WPF information.


DBAccess executes the generated TSQL, creating a DataSet from standard ADO.NET code.


Using the View attached to the SQLData Control on Window1, GridViewLayoutBuilder builds a GridView based on a ColumnConfig collection assembled by the GridViewColumnConfigBuilder class from column information in the DataSet. Code assembling the Grid columns appears below.

          public void SetColumns(List<COLUMNCONFIG> columnConfig)
{
GridViewColumn col = null;

foreach (ColumnConfig config in columnConfig)
{
col = new GridViewColumn();

col.Width = 50;
col.DisplayMemberBinding = new Binding(config.BindingPath);
col.Header = config.DisplayName;

View.Columns.Add(col);
}

}


GridViewColumnConfigBuilder formats the Binding path to point to the correct data inside the DataSet. WPF leverages the BindingPath to find the column data inside the DataSet.


Conclusion

In a prior article I showed how to incorporate a DSL written in M into a .NET application. This article tackled a more advanced task. In particular, it implemented a more advanced DSL called “Simple SQL” and demonstrated more sophisticated patterns for consuming data structures generated by the DSL and the Oslo SDK.

Sources

Oslo SDK documentation and Samples shipping with Oslo SDK CTP January 2009
MSDN – “Textual Domain Specific Languages for Developers – Part 2”
Software Patterns – Iterator
Software Patterns – Visitor
WPF sources – GridView Overview

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories