Introduction
Oddly enough, LINQ doesn’t define keywords for cross join, left join, or right join. As part of the LINQ grammar, you get join and group join. Joins can be equijoins or non-equijoins. An equijoin uses the join keyword and non-equal joins are contrived using where clauses. However, left, right, and cross joins are supported by LINQ (with a little nudge).
The two common joins are the inner join (or just join in LINQ) and the left join. Suppose you have two collections of data. One you will call the master or left collection, and the other you’ll call the detail or right collection. A left join is a join whereby all of the elements from the left collection are returned and only elements from the right collection that have a correlated value in the left sequence. Usually, the correlation is a key or some kind of unique identifier. Using another analogy, if the left collection is the parent and the right is the child, a left join is all parents but only children with parents. (A right join returns orphans but no childless parents. Gotta love these computer analogies.)
In this article, I will demonstrate the group join because that’s how you get to a left join. You also will see some code for LINQ to SQL that is pretty straightforward and my last article, “Search and Replace with Regular Expressions,” and my upcoming book, LINQ Unleashed: for C#, cover LINQ to SQL in detail. I won’t repeat that explanation here.
Defining a Group Join
A group join in LINQ is a join that has an into clause. The parent information is joined to groups of the child information. That is, the child information is coalesced into a collection and the child collection’s parent information occurs only once. (The difference between a join—really an inner join—and a group join is that inner joins repeat the parent information for each child.)
The fragment in Listing 1 assumes you have a collection of orders and a collection or order details. (You do. The final listing demonstrates how to get these datum from the Northwind Traders database using LINQ to SQL.) The code demonstrates a group join followed by an array to display the parent and a nested array to display the children of each parent.
Listing 1: A group join on the Northwind Traders Orders and Order Details tables.
Dim groupJoin = (From order In orders _ Group Join detail In details On _ order.OrderID Equals detail.OrderID _ Into child = Group _ Select New With { _ .CustomerID = order.CustomerID, _ .OrderID = order.OrderID, _ .OrderDate = order.OrderDate, _ .Details = child}).Take(5) Dim line As String = New String("-", 40) For Each ord In groupJoin Console.WriteLine("{0} on {1}", ord.OrderID, _ ord.OrderDate) Console.WriteLine(line) For Each det In ord.Details Console.WriteLine("Product ID: {0}", det.ProductID) Console.WriteLine("Unit Price: {0}", det.UnitPrice) Console.WriteLine("Quantity: {0}", det.Quantity) Console.WriteLine("Discount: {0}", det.Discount) Console.WriteLine() NextConsole.WriteLine(line) Next 'leftJoin.Write(Console.Out) Console.ReadLine()
The LINQ query starts with the anonymous variable groupJoin. (Any legal name will do here.) The clause From order in orders defines the range variable order on the collection orders. The range variable is like the iterator variable in a For loop. The clause Group Join detail in details defines the child range detail on the details sequence. The On..Equals clause describes the correlation in the equijoin. And, Into child = Group coalesces all of the child sequence data into a group. The last part Take(5) works like the TOP keyword in SQL. Take is an extension method that operates on sequences (which is what LINQ returns).
The result of the LINQ query as defined in Listing 1 is that you have a new object (called a projection) comprised of CustomerID, OrderID, and OrderDate, with a child sequence property, Details. Details is an attribute of the projection (the new type created with Select New With). The last part of the listing displays the outer data and then the grouped detail data.
Converting a Group Join to a Left Join
A group join is essentially a master detail in-memory relationship. A left join flattens out the data from the detail sequence and puts it on par with the master data. That is, where the group join has a nested detail property with its own properties, the left join will put the properties of the master and detail information as sibling properties.
The difference is that with a left join the right sequence may not have any data. You have to allow for nulls or LINQ would throw a null exception when it tried to access non-existent elements of the right sequence (Order Details in this example). You can convert a group join into a left join by adding an additional From clause and range variable on the Group and adding a call to the DefaultIfEmpty method on the group variable. The revised fragment in Listing 2 demonstrates. All of the code is provided in Listing 3.
Listing 2: A left join uses an additional From clause and range variable after the Group and invokes the DefaultIfEmpty method to handle missing children.
Dim leftJoin = (From order In orders _ Group Join detail In details On _ order.OrderID Equals detail.OrderID _ Into children = Group _ From child In children.DefaultIfEmpty _ Select New With { _ .CustomerID = order.CustomerID, _ .OrderID = order.OrderID, _ .OrderDate = order.OrderDate, _ .ProductID = child.ProductID, _ .UnitPrice = child.UnitPrice, _ .Quantity = child.Quantity, _ .Discount = child.Discount}).Take(5)
Notice that the projection in Listing 2 defines elements from Orders and Order Details as siblings in the new projected type. Here is the complete listing and some additional code for looking at the object state.
Listing 3: All of the code to reproduce the data and run the sample.
Imports System.Data.Linq Imports System.Data.Linq.Mapping Imports System.IO Module Module1 Public connectionString As String = _ "Data Source=BUTLER;Initial Catalog=Northwind;" + _ "Integrated Security=True" Sub Main() ' Use LINQ to SQL to get the data - context represents ' the database Dim orderContext As DataContext = New DataContext(connectionString) Dim detailsContext As DataContext = New DataContext(connectionString) ' generic table does the ORM association Dim orders As Table(Of Order) = orderContext.GetTable(Of Order)() Dim details As Table(Of OrderDetail) = orderContext.GetTable(Of OrderDetail)() Dim allDetails = From detail In details _ Select detail For Each d In allDetails Console.WriteLine(d.ProductID) Next Console.ReadLine() ' make sure we have some data orders.Write(Console.Out) details.Write(Console.Out) ' define the left join - a group join with a twist Dim leftJoin = (From order In orders _ Group Join detail In details On _ order.OrderID Equals detail.OrderID _ Into children = Group _ From child In children.DefaultIfEmpty _ Select New With { _ .CustomerID = order.CustomerID, _ .OrderID = order.OrderID, _ .OrderDate = order.OrderDate, _ .ProductID = child.ProductID, _ .UnitPrice = child.UnitPrice, _ .Quantity = child.Quantity, _ .Discount = child.Discount}).Take(5) leftJoin.Write(Console.Out) Console.ReadLine() End Sub Function WriteLine(ByVal obj As Object) As Object Console.WriteLine(obj) Return Nothing End Function <System.Runtime.CompilerServices.Extension()> _ Public Function Write(Of T)(ByVal obj As T, _ ByVal writer As TextWriter) If (TypeOf obj Is IEnumerable) Then Dim list As IEnumerable = obj For Each item In list Write(item, writer) Next End If Dim formatted = From info In obj.GetType().GetFields() _ Let value = info.GetValue(obj) _ Select New With {.Name = info.Name, _ .Value = IIf(value Is Nothing, "", value)} If (formatted.Count > 0) Then For Each one In formatted writer.WriteLine(one) Next Else Dim alternate = From info In obj.GetType().GetProperties() _ Let value = info.GetValue(obj, Nothing) _ Select New With {.Name = info.Name, _ .Value = IIf(value Is Nothing, "", value)} For Each one In alternate writer.WriteLine(one) Next End If writer.WriteLine() Return Nothing End Function End Module <Table(Name:="Orders")> _ Public Class Order <Column()> _ Public OrderID As Integer <Column()> _ Public CustomerID As String <Column()> _ Public EmployeeID As Integer <Column()> _ Public OrderDate As DateTime <Column()> _ Public ShipCity As String End Class <Table(Name:="Order Details")> _ Public Class OrderDetail <Column()> _ Public OrderID As Integer <Column()> _ Public ProductID As Integer <Column()> _ Public UnitPrice As Decimal <Column()> _ Public Quantity As Int16 <Column()> _ Public Discount As Single EndClass
Summary
A left join is generally of the records in one set and only those records in the other set that are correlated to the records in the first set. I use the word records out of habit synonymously with objects. (Although in your example, rows of database data were used.) Although LINQ has no left join key phrase, the left join is supported through a group join and the DefaultIfEmpty method.
DefaultIfEmpty provides a default object when there are no child objects. Default child objects are necessary because LINQ supports defining a projection from parent and child objects, but in the left join, again, there may be no child and the object would in effect be null.
See you next month, same bate time, same bat channel.
About the Author
Paul Kimmel is the VB Today columnist for www.codeguru.com and has written several books on object-oriented programming and .NET. Check out his upcoming book, LINQ Unleashed for C#, due in July 2008. Paul Kimmel is an Application Architect for EDS. You may contact him for technology questions at [email protected].
If you are interested in joining or sponsoring a .NET Users Group, check out www.glugnet.org. Glugnet opened a users group branch in Flint, Michigan in August 2007. If you are interested in attending, check out the www.glugnet.org web site for updates.
Copyright © 2008 by Paul T. Kimmel. All Rights Reserved.