Using T-SQL CROSS APPLY and OUTER APPLY
Part of my self-imposed job has always been to learn and adopt new technologies. As part of that process, I share what I learn by writing about it. The amount of information and content that is produced each year is much, much than anyone person can master, so I choose. Generally, I focus on OO technologies and languages, UML, design patterns, refactoring, and SQL. As a general rule, these areas have worked out pretty well for me. Unfortunately, the trade offs are that I may never master Ruby or really understand the difference between Ruby and Ruby on Rails. I am at peace with this decision.
That said, even though my personal focus is pretty much OO, I still miss stuff. SQL Server 2005 came out with the CROSS APPLY and OUTER APPLY operators and I have just started learning how to use APPLY in the last month or so. When I think I have it figured out, I am fortunate enough that some of you are interested in reading about my understanding of the technology.
From the MSDN help "the APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table acts as the left input." Hunh?! Technically, I understand table-valued function, left and right input, but this explanation doesn't really tell me about why I need this or when it's needed. The help documentation doesn't tell me what to do with APPLY. So, this article represents me processing and compartmentalizing APPLY, which will help me know when it should be pulled out of my toolbox and used.
This article is not a comprehensive definition of APPLY nor does it contain all scenarios. Really, the article represents one scenario, but one that is especially useful to me.
Understanding Single Table Hierarchies
A common problem is representing hierarchies of data in a database. A solution is to put all data in a hierarchy in a single table—like employees and managers—and have establish the relationships with keys. For example, AdventureWorks does this with the HumanResources.Employee and Person.Contact tables. Employee contains employees, which in turn includes managers and represents the hierarchical information, and Contact contains information such as the Employee's first and last name.
A classic way to represent a hierarchy is with genealogical relationships—mom and dad, grandpa and grandma, daughters and sons. For instance, any person has one set of parents, so two keys—on for mom and one for dad—and any row representing a person has two keys referring to that person's parents. Siblings would have identical parent identifiers, step-siblings would have at least one parent key in common, and aunts, uncles, cousins, and so forth could also be figured out.
To generate CROSS APPLY and OUTER APPLY, you can use the script in Listing 1 to create a simple table containing a single person with additional keys to that person's parents. Copy and paste the code into Microsoft SQL Server Management Studio (the Expression version of the same) and run the script.
Listing 1: A table named People that has columns that refer to other rows in the same table; the other rows represent "parents".
Use Genealogy Go --Create People table and insert values. CREATE TABLE People ( PersonID int NOT NULL, MotherID int NULL, FatherID int NULL, Name varchar(25) NOT NULL, CONSTRAINT PK_People PRIMARY KEY(PersonID), ) GO INSERT INTO People VALUES(1 , NULL, NULL, 'Jack Symons') INSERT INTO People VALUES(2 , NULL, NULL, 'Marvel Symons') INSERT INTO People VALUES(3 , NULL, NULL, 'Anna Kimmel') INSERT INTO People VALUES(4 , NULL, NULL, 'Frank Kimmel') INSERT INTO People VALUES(5 , 2, 1, 'Jacqueline Benavides') INSERT INTO People VALUES(6 , 3, 4, 'Gerald Kimmel') INSERT INTO People VALUES(7 , 5, 6, 'Kathy Hemenway') INSERT INTO People VALUES(8 , 5, 6, 'Daniel Kimmel') INSERT INTO People VALUES(9 , 5, 6, 'David Kimmel') INSERT INTO People VALUES(10 , 5, 7, 'Robert Benavides') INSERT INTO People VALUES(11 , 5, 7, 'Nicholas Benavides') INSERT INTO People VALUES(12 , 5, 6, 'James Kimmel') INSERT INTO People VALUES(13 , 5, 6, 'Paul Kimmel') INSERT INTO People VALUES(14 , NULL, NULL, 'Lori Kimmel') INSERT INTO People VALUES(15 , NULL, NULL, 'David Benavides') INSERT INTO People VALUES(16 , 14, 13, 'Alex Kimmel') INSERT INTO People VALUES(17 , 14, 13, 'Noah Kimmel') GO
If you mapped this out, one branch of the hierarchy would show Gerald Kimmel and Jacqueline Benavides as my parents. My PersonID is 13, my MotherID is 5, and FatherID is 6. Everyone with 5 and 6 for parent IDs are my siblings.
The challenge is this: "How can you write a single query that will correctly return a hierarchy of relationships against a single table?"
Using CROSS APPLY
It is worth stating that child, mother, and father can be retrieved based on the People table by using a self join (see Listing 2). By joining People.MotherID on People.PersonID and People.FatherID on People.PersonID, you can build the hierarchy in the preceding scenario.
Listing 2: Use multiple self joins to build the hierarchy results.
select p1.Name as MyName, p2.Name AS Mother, p3.Name As Father from people p1 left join people p2 on p1.MotherID = p2.PersonID left join people p3 on p1.FatherID = p3.PersonID