Thursday, September 4, 2008

Lazy Loading Can Cause Performance Problems

A customer rushed into LINQ to SQL without fully understanding the consequences, and was having performance problems with their app. The problem: not being cognizant of what Lazy Loading was doing.

So here is a simple example, using the Northwind database.

Suppose you have a DTO like this:

public class DTO
{
public string CustomerID { get; set; }
public int OrderID { get; set; }
public DateTime? OrderDate { get; set; }
public int NumItems { get; set; }

public DTO (string customerID, int orderID, DateTime? orderDate, int numItems)
{
CustomerID = customerID;
OrderID = orderID;
OrderDate = orderDate;
NumItems = numItems;
}

public static ICollection GetDTO(string customerID)
{
ICollection dtos = new Collection();
using (NorthwindDataContext context = new NorthwindDataContext())
{
var query = from o in context.Orders
where o.CustomerID == customerID
orderby o.OrderDate descending
select o;

foreach (var q in query) dtos.Add(new DTO(q.CustomerID,
q.OrderID, q.OrderDate, q.Order_Details.Count));
}
return dtos;
}
}

Never mind the select o for the moment. Some history: the requirement for NumItems was a change to the original code. A second programmer added the NumItems property, added the numItems parameter to the constructor, and added the q.Order_Details.Count to the foreach statement. Why should that matter?

Because now the app is going to the database multiple times. Before the change, he was going to the database once, when evaluating var q in query:

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate],
[t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight],
[t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion],
[t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[CustomerID] = @p0
ORDER BY [t0].[OrderDate] DESC
-- @p0: Input NVarChar (Size = 5; Prec = 0; Scale = 0) [BOLID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8


but now the app goes to the database once for the var q in query and once each time in the loop

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate],
[t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight],
[t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion],
[t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[CustomerID] = @p0
ORDER BY [t0].[OrderDate] DESC
-- @p0: Input NVarChar (Size = 5; Prec = 0; Scale = 0) [BOLID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10970]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10801]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10326]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

and the app, unlike this example, was looping thousands of times.


Agreed, the first programmer shouldn't have coded select o -- it's the LINQ equivalent of SQL's SELECT *. But although he was returning many more columns than he needed to, he only did it once.


How should the second programmer have coded his query? Specify the count in the select clause, like this:


var query = from o in context.Orders
where o.CustomerID == customerID
orderby o.OrderDate descending
select new {o.CustomerID, o.OrderID, o.OrderDate,
o.Order_Details.Count};

foreach (var q in query) dtos.Add(new DTO(q.CustomerID,
q.OrderID, q.OrderDate, q.Count));


Then you only go to the database once.


SELECT [t0].[CustomerID], [t0].[OrderID], [t0].[OrderDate], (
SELECT COUNT(*)
FROM [dbo].[Order Details] AS [t1]
WHERE [t1].[OrderID] = [t0].[OrderID]
) AS [Count]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[CustomerID] = @p0
ORDER BY [t0].[OrderDate] DESC
-- @p0: Input NVarChar (Size = 5; Prec = 0; Scale = 0) [BOLID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8