http://www.developer.com/
Adding Standard Deviation to LINQJanuary 5, 2009 If you have performed much in the way of basic data analysis using LINQ (Language Integrated Query), you probably have noticed that standard deviation is not included out of the box. Why Standard Deviation is missing is a mystery as LINQ does include the other basic aggregators such as min, max, avg, sum, and count. To construct a function to perform standard deviation over a set of numbers, you need to examine the math. Figure 1 shown below is the basic standard deviation function. Figure 1: The math behind the basic stand deviation function. In simpler terms, standard deviation is calculated as follows:
Create the Standard Deviation FunctionNow that you are familiar with the math, you can create a function that performs this math over a set of values of type double, as shown below. private double CalculateStdDev(IEnumerable<double> values) { double ret = 0; if (values.Count() > 0) { //Compute the Average double avg = values.Average(); //Perform the Sum of (value-avg)^2 double sum = values.Sum(d => Math.Pow(d - avg, 2)); //Put it all together ret = Math.Sqrt((sum) / values.Count()-1); } return ret; } LINQ Usage ExampleThis function performs the math shown above; it also confirms that the IEnumerable passed in contains elements. Before you create the LINQ statements to utilize this function, you will start with a basic query to calculate the average of the sub total and number of orders for each Customer ID, as shown below. var query = from rec in dc.SalesOrderHeaders group new { rec } by new { rec.CustomerID } into g select new { g.Key.CustomerID, Avg = g.Average(p => Convert.ToDouble(p.rec.SubTotal)), Count = g.Count() }; Next, you add the statements to utilize the function you just created and return standard deviation, as shown below. var query = from rec in dc.SalesOrderHeaders group new { rec } by new { rec.CustomerID } into g select new { g.Key.CustomerID, Avg = g.Average(p => Convert.ToDouble(p.rec.SubTotal)), Count = g.Count(), StdDev = CalculateStdDev(from o in dc.SalesOrderHeaders where o.CustomerID == g.Key.CustomerID select Convert.ToDouble(o.SubTotal)) }; This LINQ query will return the CustomerID, Order Sub Total Average, and the Number of Orders as in the first example. Additionally, it will now return the standard deviation. It does this through a call to the CalculateStdDev and through the use of a nested query. ConclusionYou have created the standard deviation function as well as an example to demonstrate the results in LINQ. It is important to note that using this function with LINQ will increase the number of calls to the database. Essentially, LINQ will execute the outer query to retrieve the overall results, and then it will return the order subtotal for each customer, one at a time, and execute the call to CalculateStdDev. This method is primarily geared towards relatively small results sets. If you are working on large results instead, it would be more efficient to create a store procedure or a table-valued function and use LINQ to return the results. About the AuthorChris Bennett is a manager with Crowe Horwath LLP in the Indianapolis office. He can be reached at chris.bennett@crowehorwath.com. |